fix(train,prepare): resolve open issues (#556, #547, #549, #552, #542) by aniruddhaadak80 · Pull Request #604 · karpathy/autoresearch

aniruddhaadak80 · 2026-06-13T02:25:28Z

This PR fixes multiple open issues in the repository:

Issue fix(train): warmup off-by-one in steady_state_mfu accumulation #556: Fixes the warmup off-by-one error in steady state MFU calculations by using step - 11 (since timing starts at step 11).
Issue fix: GPU-aware MFU with Blackwell (sm_100/sm_120) coverage #547: Implements dynamic GPU-aware peak FLOPS mapping including Blackwell architectures (sm_100 and sm_120) instead of statically defaulting to H100 SXM.
Issue fix(prepare): detect truncated shard downloads via Content-Length #549: Adds verification of Content-Length vs file size on disk in prepare.py to prevent storing truncated/corrupted shards.
Issue fix(prepare): use raw bytes for BPB to avoid U+FFFD inflation (#384) #552: Uses raw token byte sizes via enc.decode_single_token_bytes() to compute BPB without U+FFFD string encoding inflation.
Issue Minor cleanups: README typo and redundant .tmp check in prepare.py #542: Removes the redundant .tmp check in list_parquet_files() in prepare.py.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds multi-GPU DistributedDataParallel (DDP) support to the training script and introduces an automated “Critic Sandbox” workflow to run/score experiments and manage keep/discard decisions, alongside small robustness/formatting improvements in data preparation.

Changes:

Add DDP initialization, rank-aware seeding, gradient-sync control for grad accumulation, and MFU FLOPs estimation updates in train.py.
Introduce sandbox.py to compile-check, run, parse metrics, log results, and auto-commit/rollback experiments.
Improve prepare.py download integrity checking and tokenizer/token-bytes handling; update experiment loop docs in program.md.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.

File	Description
train.py	Adds DDP setup, world-size-aware batching/accumulation, and architecture-based FLOPs for MFU.
sandbox.py	New automation script to run experiments, log metrics, and manage git keep/discard.
program.md	Updates the experiment loop to use the new sandbox workflow.
prepare.py	Adds truncated-download detection, safer tensor loading, and refactors formatting/token-bytes computation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+# Peak BF16/FP16 Tensor Core FLOPS for different GPU architectures
+gpu_peak_flops = {
+    (8, 0): 312e12,   # A100 SXM/PCIe
+    (8, 6): 142e12,   # RTX 3090 / A10G
+    (8, 9): 330e12,   # L4 / L40 / RTX 4090
+    (9, 0): 989.5e12, # H100 SXM
+    (10, 0): 2250e12, # Blackwell sm_100 (B200 SXM)
+    (10, 2): 660e12,  # Blackwell sm_120 (RTX 5090 / Workstation)
+}
+peak_flops = gpu_peak_flops.get(cap, 989.5e12)


+def ddp_dataloader_wrapper(loader, rank, world_size):
+    for _ in range(rank):
+        next(loader)
+    while True:
+        yield next(loader)
+        for _ in range(world_size - 1):
+            next(loader)
+
+train_loader = ddp_dataloader_wrapper(train_loader, ddp_rank, ddp_world_size)


+        if ddp:
+            model._orig_mod.require_backward_grad_sync = (micro_step == grad_accum_steps - 1)


+tokens_per_fwdbwd = DEVICE_BATCH_SIZE * MAX_SEQ_LEN * ddp_world_size
+grad_accum_steps = max(1, TOTAL_BATCH_SIZE // tokens_per_fwdbwd)
+TOTAL_BATCH_SIZE = grad_accum_steps * tokens_per_fwdbwd


@@ -0,0 +1,105 @@
+#!/usr/bin/env python3


+
+The idea is that you are a completely autonomous researcher trying things out. The Sandbox handles the Git and execution discipline. If things work, the Sandbox keeps them. If they don't, the Sandbox discards them. If you feel like you're getting stuck in some way, review the code and brainstorm radically new directions.
+
+**Timeout**: The Sandbox handles the timeout. If a run crashes (OOM or bug) and the Sandbox rolled back, read `run.log` to diagnose, adapt your idea, and quickly try again. If it is fundamentally broken, just skip it—the Sandbox already logged "crash"—and move on to a new idea.


    with open(path, "rb") as f:
-        return torch.load(f, map_location=device)
+        return torch.load(f, map_location=device, weights_only=True)


This change mitigates the risk of insecure deserialization by restricting the classes that can be loaded via torch.load to a safe subset. This prevents potential arbitrary code execution from malicious files. Co-authored-by: aniruddhaadak80 <127435065+aniruddhaadak80@users.noreply.github.com>

…arpathy#549, karpathy#552, karpathy#542)

…and sharding

Copilot AI review requested due to automatic review settings June 13, 2026 02:25

Copilot AI reviewed Jun 13, 2026

View reviewed changes

aniruddhaadak80 and others added 5 commits June 13, 2026 09:37

feat: implement multi-gpu distributed data parallel (DDP)

d5cff00

feat: implement Actor-Critic sandbox loop orchestration

bbbd37d

fix(train,prepare): resolve open issues (karpathy#556, karpathy#547, k…

b367b16

…arpathy#549, karpathy#552, karpathy#542)

fix(train,prepare): address copilot reviews on DDP, Blackwell FLOPS, …

2cc9630

…and sharding

aniruddhaadak80 force-pushed the fix/resolve-open-issues branch from 102d953 to 2cc9630 Compare June 13, 2026 04:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(train,prepare): resolve open issues (#556, #547, #549, #552, #542)#604

fix(train,prepare): resolve open issues (#556, #547, #549, #552, #542)#604
aniruddhaadak80 wants to merge 5 commits into
karpathy:masterfrom
aniruddhaadak80:fix/resolve-open-issues

aniruddhaadak80 commented Jun 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if ddp:
		model._orig_mod.require_backward_grad_sync = (micro_step == grad_accum_steps - 1)


		The idea is that you are a completely autonomous researcher trying things out. The Sandbox handles the Git and execution discipline. If things work, the Sandbox keeps them. If they don't, the Sandbox discards them. If you feel like you're getting stuck in some way, review the code and brainstorm radically new directions.

		Timeout: The Sandbox handles the timeout. If a run crashes (OOM or bug) and the Sandbox rolled back, read `run.log` to diagnose, adapt your idea, and quickly try again. If it is fundamentally broken, just skip it—the Sandbox already logged "crash"—and move on to a new idea.

Conversation

aniruddhaadak80 commented Jun 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants