feat(train): GPU-aware MFU, warmup MFU fix, experiment integrity log by eli-labz · Pull Request #623 · karpathy/autoresearch

eli-labz · 2026-06-27T16:05:53Z

Three targeted improvements to train.py:

fix(train): GPU-aware peak FLOPS for accurate MFU on non-H100 GPUs (#547)

Add GPU_PEAK_FLOPS dict mapping (compute_cap) -> peak BF16 FLOPS
Covers V100, A100, RTX 3090/A10G, L4/L40S/RTX 4090, H100, B200, RTX 5090
Falls back to H100 value (989.5e12) for unknown GPUs
H100_BF16_PEAK_FLOPS remains as the resolved scalar for backward compat

fix(train): warmup off-by-one in steady_state_mfu accumulation (#556)

Training skips timing for steps 0..10 inclusive (11 steps, not 10)
Denominator corrected from (step - 10) to (step - 11)

feat(train): experiment integrity log to detect metric gaming (#599)

Add log_integrity() function using stdlib only (hashlib, datetime)
Writes one line per run to integrity.log binding val_bpb to:
- sha256[:16] of train.py and prepare.py source files
- sha256[:16] of first 1 MB of final model weights
- real optimizer step count from optimizer internal state (not loop var)
- wall-clock training seconds
Detection only, never raises, wrapped in try/except so it can never crash a run; integrity.log should be gitignored like results.tsv

Three targeted improvements to train.py: fix(train): GPU-aware peak FLOPS for accurate MFU on non-H100 GPUs (karpathy#547) - Add GPU_PEAK_FLOPS dict mapping (compute_cap) -> peak BF16 FLOPS - Covers V100, A100, RTX 3090/A10G, L4/L40S/RTX 4090, H100, B200, RTX 5090 - Falls back to H100 value (989.5e12) for unknown GPUs - H100_BF16_PEAK_FLOPS remains as the resolved scalar for backward compat fix(train): warmup off-by-one in steady_state_mfu accumulation (karpathy#556) - Training skips timing for steps 0..10 inclusive (11 steps, not 10) - Denominator corrected from (step - 10) to (step - 11) feat(train): experiment integrity log to detect metric gaming (karpathy#599) - Add log_integrity() function using stdlib only (hashlib, datetime) - Writes one line per run to integrity.log binding val_bpb to: - sha256[:16] of train.py and prepare.py source files - sha256[:16] of first 1 MB of final model weights - real optimizer step count from optimizer internal state (not loop var) - wall-clock training seconds - Detection only, never raises, wrapped in try/except so it can never crash a run; integrity.log should be gitignored like results.tsv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(train): GPU-aware MFU, warmup MFU fix, experiment integrity log#623

feat(train): GPU-aware MFU, warmup MFU fix, experiment integrity log#623
eli-labz wants to merge 1 commit into
karpathy:masterfrom
eli-labz:feat/train-improvements

eli-labz commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eli-labz commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant