7.9% improvement in singular iterations on M4 Mac Mini (16GB) vs 2.8% in 126 iterations on H100 from Karpathy autoresearch #392

ediestel · 2026-03-23T07:34:48Z

ediestel
Mar 23, 2026

I applied an alternative optimization methodology to the autoresearch benchmark. Instead of iterative search, the method analyzes
the training script and hardware constraints to recommend changes directly.

Results

Metric	methodology	autoresearch (Karpathy)
Baseline val_bpb	1.855	0.998
Final val_bpb	1.708	0.970
Improvement	7.9%	2.8%
Iterations	2	126
Hardware	M4 Mac Mini 16GB	H100 80GB
MFU	22% → 51%	—

The key finding: infrastructure and architecture optimizations (batch size, torch.compile on MPS, ReLU² → SwiGLU) required no
iteration — the method identified them from the code and constraints in a single pass.

Full paper aavailable as PDF at https://drive.google.com/file/d/1ULb5Fx5Be-HKBbvnDtu0xad0-_3hceBY/view?usp=sharing

References: Discussion #43, Discussion #32.

For questions or discussion: diestel.research@gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

7.9% improvement in singular iterations on M4 Mac Mini (16GB) vs 2.8% in 126 iterations on H100 from Karpathy autoresearch #392

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

7.9% improvement in singular iterations on M4 Mac Mini (16GB) vs 2.8% in 126 iterations on H100 from Karpathy autoresearch #392

Uh oh!

ediestel Mar 23, 2026

Replies: 0 comments

ediestel
Mar 23, 2026