Skip to content

feat(train): GPU-aware MFU, warmup MFU fix, experiment integrity log#623

Open
eli-labz wants to merge 1 commit into
karpathy:masterfrom
eli-labz:feat/train-improvements
Open

feat(train): GPU-aware MFU, warmup MFU fix, experiment integrity log#623
eli-labz wants to merge 1 commit into
karpathy:masterfrom
eli-labz:feat/train-improvements

Conversation

@eli-labz

Copy link
Copy Markdown

Three targeted improvements to train.py:

fix(train): GPU-aware peak FLOPS for accurate MFU on non-H100 GPUs (#547)

  • Add GPU_PEAK_FLOPS dict mapping (compute_cap) -> peak BF16 FLOPS
  • Covers V100, A100, RTX 3090/A10G, L4/L40S/RTX 4090, H100, B200, RTX 5090
  • Falls back to H100 value (989.5e12) for unknown GPUs
  • H100_BF16_PEAK_FLOPS remains as the resolved scalar for backward compat

fix(train): warmup off-by-one in steady_state_mfu accumulation (#556)

  • Training skips timing for steps 0..10 inclusive (11 steps, not 10)
  • Denominator corrected from (step - 10) to (step - 11)

feat(train): experiment integrity log to detect metric gaming (#599)

  • Add log_integrity() function using stdlib only (hashlib, datetime)
  • Writes one line per run to integrity.log binding val_bpb to:
    • sha256[:16] of train.py and prepare.py source files
    • sha256[:16] of first 1 MB of final model weights
    • real optimizer step count from optimizer internal state (not loop var)
    • wall-clock training seconds
  • Detection only, never raises, wrapped in try/except so it can never crash a run; integrity.log should be gitignored like results.tsv

Three targeted improvements to train.py:

fix(train): GPU-aware peak FLOPS for accurate MFU on non-H100 GPUs (karpathy#547)
- Add GPU_PEAK_FLOPS dict mapping (compute_cap) -> peak BF16 FLOPS
- Covers V100, A100, RTX 3090/A10G, L4/L40S/RTX 4090, H100, B200, RTX 5090
- Falls back to H100 value (989.5e12) for unknown GPUs
- H100_BF16_PEAK_FLOPS remains as the resolved scalar for backward compat

fix(train): warmup off-by-one in steady_state_mfu accumulation (karpathy#556)
- Training skips timing for steps 0..10 inclusive (11 steps, not 10)
- Denominator corrected from (step - 10) to (step - 11)

feat(train): experiment integrity log to detect metric gaming (karpathy#599)
- Add log_integrity() function using stdlib only (hashlib, datetime)
- Writes one line per run to integrity.log binding val_bpb to:
  - sha256[:16] of train.py and prepare.py source files
  - sha256[:16] of first 1 MB of final model weights
  - real optimizer step count from optimizer internal state (not loop var)
  - wall-clock training seconds
- Detection only, never raises, wrapped in try/except so it can never
  crash a run; integrity.log should be gitignored like results.tsv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant