MFU calculation is hardcoded to H100 peak FLOPS

## Problem

After PR #2 added support for non-Hopper GPUs, the MFU (Model FLOPS Utilization) metric is now misleading on any GPU that isn't an H100.

The peak FLOPS value is hardcoded at line 491:
\\\python
H100_BF16_PEAK_FLOPS = 989.5e12
\\\

On an A100 (312 TFLOPS BF16), the reported MFU would be ~3.2x too high. On an RTX 4090 (165 TFLOPS), it would be ~6x too high. This makes the MFU metric useless for comparing runs across different hardware.

## Proposed Fix

Auto-detect the GPU via \	orch.cuda.get_device_capability()\ and look up the correct theoretical peak FLOPS from a table of known GPU architectures. Fall back to the H100 value for unknown GPUs.

This is a minimal change (~10 lines) that makes the logging output accurate on all supported hardware.

I'll submit a PR for this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MFU calculation is hardcoded to H100 peak FLOPS #5

Problem

Proposed Fix

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

MFU calculation is hardcoded to H100 peak FLOPS #5

Description

Problem

Proposed Fix

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions