feat(train): add optional PyTorch Profiler wrapper outputting LLM-readable summary by aniruddhaadak80 · Pull Request #618 · karpathy/autoresearch

aniruddhaadak80 · 2026-06-21T14:33:15Z

Resolves #118. When optimizing PyTorch models, the AI agent needs insight into actual CUDA kernel execution times and bottleneck locations (e.g. attention layers vs. feedforward blocks vs. communication ops). This change adds a fail-safe PyTorch Profiler integration that is activated when the environment variable AUTORESEARCH_PROFILE=1 is set. It profiles steps 15-20, computes average and total CUDA execution times, and formats the top 15 most expensive kernels/operations as a Markdown table written to profiler_summary.md, enabling the agent to easily identify and optimize hardware bottlenecks.

…eadable summary

feat(train): add optional fail-safe PyTorch Profiler generating LLM-r…

70c1687

…eadable summary

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(train): add optional PyTorch Profiler wrapper outputting LLM-readable summary#618

feat(train): add optional PyTorch Profiler wrapper outputting LLM-readable summary#618
aniruddhaadak80 wants to merge 1 commit into
karpathy:masterfrom
aniruddhaadak80:feat/issue-118-pytorch-profiler

aniruddhaadak80 commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aniruddhaadak80 commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant