Skip to content

[codex] Add SDK compiler perf attribution#686

Draft
drew-y wants to merge 1 commit into
mainfrom
drew/v-373-complete-hot-path-specialization-for-generic-trait-and
Draft

[codex] Add SDK compiler perf attribution#686
drew-y wants to merge 1 commit into
mainfrom
drew/v-373-complete-hot-path-specialization-for-generic-trait-and

Conversation

@drew-y

@drew-y drew-y commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Add compiler perf sessions with phase attribution for SDK and compiler compile paths.
  • Attribute SDK benchmark compile time across module loading, semantic analysis, optimization passes, codegen, Binaryen preparation/optimization, emit, and SDK finalization/WAT text emission.
  • Expand scripts/bench-v326.ts benchmark coverage with VTrace, web framework, VX, and scalar aggregate representative scenarios plus WAT counters for struct.new, array.new*, ref.cast, and call_ref.
  • Add an SDK regression test for [voyd:compiler:perf] summaries.

V-373 investigation outcome

V-410 blocked useful attribution for SDK-based benchmark runs, so this PR fixes that instrumentation first.

I investigated a local-handler scalar aggregate result specialization prototype. The root cause is that scalar aggregate call/result specialization rejects effectful metadata before local handler specialization can produce residual-free handled clones, so local effectful paths currently lose scalar aggregate replacement/direct ABI opportunities.

The prototype was removed because it failed the V-373 acceptance bar: optimized VTrace main changed from the expected 3825271 to 3644126. After removing the prototype, optimized VTrace returned 3825271 again. No dormant optimizer infrastructure is kept.

Benchmarks

Baseline: /tmp/v373-baseline.csv.
Final instrumented run: /tmp/v373-final.csv and /tmp/v373-final-perf.log.

node --conditions=development --import tsx scripts/bench-v326.ts compare /tmp/v373-baseline.csv /tmp/v373-final.csv showed identical wasm hashes, wasm/gzip sizes, and WAT counters across all scenarios. Representative final rows:

  • VTrace: 251.365 ms median, 60979 wasm bytes, 19825 gzip bytes, struct.new=772, array.new=196, ref.cast=893, call_ref=258.
  • Web route probe: 0.292 ms median, 131818 wasm bytes, 33499 gzip bytes, struct.new=1145, array.new=772, ref.cast=1631, call_ref=766.
  • VX main: 0.246 ms median, 30001 wasm bytes, 9954 gzip bytes, struct.new=207, array.new=223, ref.cast=412, call_ref=235.
  • Scalar aggregate particle: 0.513 ms median, 17658 wasm bytes, 6486 gzip bytes, struct.new=119, array.new=109, ref.cast=293, call_ref=162.

Perf summaries now include sdk.finalizeCompile; concurrent perf sessions are marked with overlapped: true because the underlying compiler counters/phases are global.

Validation

  • npx vitest run --config vitest.config.ts --testTimeout 30000 packages/sdk/src/__tests__/sdk-perf.test.ts
  • npm test (rerun with local-listener permissions after sandbox-only listen EPERM)
  • npm run typecheck
  • Agentic review loop completed: implementation-gap reviewer found no code gaps; correctness reviewer findings were fixed; fresh correctness re-review found no findings.

@linear

linear Bot commented Jun 22, 2026

Copy link
Copy Markdown

V-373

@drew-y

drew-y commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator Author

Additional V-373 optimizer investigation after the initial PR note:

I tried several narrower root-cause fixes and removed each prototype after benchmarking when it failed the acceptance bar.

  1. Safe local handled scalar-result composition: allowed residual-free handled clones to compose with scalar-result clones and only let local handler heap-object locals stay scalar when used as field lanes. VTrace correctness held (main=3825271), but representative optimized Wasm hashes, bytes, and WAT counters were identical to baseline. No real workload win.

  2. Handled call-argument producer path: allowed locally handled heap-object call producers to feed scalar aggregate call arguments with fallback materialization. Correctness held, but representative Wasm remained byte-identical. No real workload win.

  3. All direct scalar arguments, including non-receiver args: this finally changed VTrace shape, but in the wrong direction: VTrace wasm bytes +5.2%, gzip +1.8%, WAT text +9.9%, ref.cast +119, call_ref +63, compile +4.7%, and runtime +0.3%. Removed.

  4. Local handled receiver scalarization for exact direct method receivers: correctness held, but VTrace/web/VX representative Wasm stayed byte-identical while compile time rose. Removed.

  5. Constructor-following scalar-result freshness (Vec3(...) -> fresh aggregate producer): correctness held, but benchmarks regressed: VTrace runtime +4.8%, compile +8.3%, wasm/gzip slightly up; web route runtime +23.8%; VX runtime +11.2% despite tiny allocation-counter reductions. Removed.

Current conclusion: the goal was pursued beyond the initial implementation failure. The variants that affect real optimized output either do not improve runtime or regress size/compile/runtime, while the safe variants are optimized away to the same final Wasm shape. No scalar optimizer infrastructure is kept without representative benchmark value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant