Skip to content

MFU calculation is hardcoded to H100 peak FLOPS #5

Description

@NJX-njx

Problem

After PR #2 added support for non-Hopper GPUs, the MFU (Model FLOPS Utilization) metric is now misleading on any GPU that isn't an H100.

The peak FLOPS value is hardcoded at line 491:
\\python
H100_BF16_PEAK_FLOPS = 989.5e12
\\

On an A100 (312 TFLOPS BF16), the reported MFU would be ~3.2x too high. On an RTX 4090 (165 TFLOPS), it would be ~6x too high. This makes the MFU metric useless for comparing runs across different hardware.

Proposed Fix

Auto-detect the GPU via \ orch.cuda.get_device_capability()\ and look up the correct theoretical peak FLOPS from a table of known GPU architectures. Fall back to the H100 value for unknown GPUs.

This is a minimal change (~10 lines) that makes the logging output accurate on all supported hardware.

I'll submit a PR for this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions