Skip to content

feat(train): add optional PyTorch Profiler wrapper outputting LLM-readable summary#618

Open
aniruddhaadak80 wants to merge 1 commit into
karpathy:masterfrom
aniruddhaadak80:feat/issue-118-pytorch-profiler
Open

feat(train): add optional PyTorch Profiler wrapper outputting LLM-readable summary#618
aniruddhaadak80 wants to merge 1 commit into
karpathy:masterfrom
aniruddhaadak80:feat/issue-118-pytorch-profiler

Conversation

@aniruddhaadak80

Copy link
Copy Markdown

Resolves #118. When optimizing PyTorch models, the AI agent needs insight into actual CUDA kernel execution times and bottleneck locations (e.g. attention layers vs. feedforward blocks vs. communication ops). This change adds a fail-safe PyTorch Profiler integration that is activated when the environment variable AUTORESEARCH_PROFILE=1 is set. It profiles steps 15-20, computes average and total CUDA execution times, and formats the top 15 most expensive kernels/operations as a Markdown table written to profiler_summary.md, enabling the agent to easily identify and optimize hardware bottlenecks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feat: Expose PyTorch Profiler traces as LLM-readable summaries for MFU optimization

1 participant