feat: add Apple Intel CPU + MPS support by amul3 · Pull Request #576 · karpathy/autoresearch

amul3 · 2026-05-04T02:12:55Z

Adds backend auto-detection (cuda > mps > cpu) so the training script runs on Apple Intel Macs without a discrete GPU and on Apple Silicon, in addition to the original NVIDIA CUDA path.

Key changes:

train.py: lazy/optional Flash-Attention 3 import (CUDA only); SDPA fallback for MPS/CPU; conditional torch.compile (CUDA only); conditional cuda.synchronize / cuda.manual_seed / max_memory_allocated; fp32 embeddings + rotary on non-CUDA backends to avoid slow/unsupported bf16 matmul; per-backend default DEPTH / DEVICE_BATCH_SIZE / TOTAL_BATCH_SIZE / WINDOW_PATTERN so the 5-minute budget actually completes on CPU.
prepare.py: dataloader buffers allocate on detected device; pin_memory only on CUDA; evaluate_bpb pulls device from the model rather than hardcoding cuda.
pyproject.toml: drop hard kernels dep + CUDA-only torch index; move kernels to an optional [cuda] extra; loosen torch to >=2.4 so CPU/MPS wheels resolve.
README: document the new platforms, install matrix, and per-backend defaults.

Adds backend auto-detection (cuda > mps > cpu) so the training script runs on Apple Intel Macs without a discrete GPU and on Apple Silicon, in addition to the original NVIDIA CUDA path. Key changes: - train.py: lazy/optional Flash-Attention 3 import (CUDA only); SDPA fallback for MPS/CPU; conditional torch.compile (CUDA only); conditional cuda.synchronize / cuda.manual_seed / max_memory_allocated; fp32 embeddings + rotary on non-CUDA backends to avoid slow/unsupported bf16 matmul; per-backend default DEPTH / DEVICE_BATCH_SIZE / TOTAL_BATCH_SIZE / WINDOW_PATTERN so the 5-minute budget actually completes on CPU. - prepare.py: dataloader buffers allocate on detected device; pin_memory only on CUDA; evaluate_bpb pulls device from the model rather than hardcoding cuda. - pyproject.toml: drop hard kernels dep + CUDA-only torch index; move kernels to an optional [cuda] extra; loosen torch to >=2.4 so CPU/MPS wheels resolve. - README: document the new platforms, install matrix, and per-backend defaults. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Apple Intel CPU + MPS support#576

feat: add Apple Intel CPU + MPS support#576
amul3 wants to merge 1 commit into
karpathy:masterfrom
amul3:feat/apple-intel-cpu-support

amul3 commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

amul3 commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant