Project-local instructions for Claude. Load this first.
WebGPU quantum circuit simulator. Runs in a browser tab. Target: piece one of
a six-level research ladder — statevector → MPS → kernel fusion → WebRTC
swarm → IBM hardware cross-verify → quantum chemistry. Each level is a set
of research-grade experiments (not just benchmarks): named seed, warmup,
trials, fidelity pass bar, honest negative results. The master doc is
RESEARCH.md. Per-level protocols live under
experiments/level-N-<slug>/protocol.md.
Communication mode: hero. Terse, bold, first-principles, attempt-first.
Scope-honest. See ~/.claude/skills/hero/SKILL.md.
Project skill: webgpu-q-research. See ~/.claude/skills/webgpu-q-research/SKILL.md.
The project is past the launchpad. All six chemistry-track phases are shipped (A through E5: foundation → 1D records → real molecules → HF SCF → MP2 → cc-pVDZ basis → CCSD → CCSD(T) → cc-pVDZ CCSD(T) on H₂O). Repo is public + CI-green. Path forward is ranked by what it costs vs what it unlocks, not by ladder position.
- ✓ L1 statevector, L2 MPS (incl. GPU MPS Phase 6 v1, χ ≤ 64), L3 kernel fusion (Tier B/C/D — 4.22× headline), L6 chemistry (full quantum-chemistry stack)
- ✓ DMRG with Lanczos + MPO; ITensor cross-checked at N = 8 to f64
- ✓ Phase B: TFIM/Heisenberg N = 128 in browser, validated vs Pfeuty/Bethe
- ✓ Phase C/D/E1-5: HF / MP2 / FCI / CCSD / CCSD(T) on H₂ → LiH → BeH₂ → H₂O → CH₄ in STO-3G; cc-pVDZ CCSD(T) on H₂O in 106 s
- ✓ Tier 1 bundle: DIIS, frozen-core, spherical-d, f/g/h, aug-cc-pVDZ, Schwarz screening
- ✓ Tier 2 stages 1–23: geometry optimization → DFT/LDA → GGA + hybrids
(BVWN5, BLYP, B3VWN5, B3LYP5) → HF + DFT analytical gradients → Lebedev
grids → CIS / TDA / TDDFT (full functional ladder) → oscillator strengths
→ dipole moments → Mulliken + Mayer-Wiberg analysis → triplet TDA/TDDFT
(full ladder via spin-polarized LSDA + B88 + LYP) → vibrational
frequencies + IR + Raman + thermo → polarizability + hyperpolarizability
→ UHF + ΔSCF ionization potentials + electron affinities → molecular SI
report page (
/molecule.html) - ✓ Reference-grounded validation green in CI (PySCF / FCI / ITensor / Pfeuty-Bethe gates); e2e browser benches across all levels
Ranked by ROI. One focused session ≈ a few hours.
| feature | status | unlocks |
|---|---|---|
| DFT (LDA + B3LYP + Lebedev grids) | ✓ | ~90% of all real chemistry |
| HF analytical gradients + BFGS | ✓ | geometry optimization |
| WebGPU port of (T) kernel | ✓ (39× on H₂O cc-pVDZ, single-run) | 10-100× speedup; cc-pVTZ CCSD(T) routine |
| EOM-CCSD (excited states) | ✓ (+ eigenvectors, oscillator strengths, spin classifier) | UV-vis, photochemistry |
| UHF + open-shell CCSD | ✓ (UHF stage 21, UCCSD stage 25) | radicals, transition metals |
| Density fitting (RI) | ✓ correctness + speedup (aux-basis 3-index DF now shipped: buildAuxBasisDFStreaming WASM, never builds the 4-index ERI; the old CD-DF 11-20× regression is retired) |
half memory + faster |
| WebGPU aux-basis DF integrals | ✓ (df-gpu.ts: s/p/d McMurchie–Davidson 3-index V + 2-index metric in WGSL f32, validated ~1e-4 rel; buildDFAuto auto-selects GPU in the d-regime) |
GPU integral build, 1.1-1.35× d-regime |
| Fully-GPU DF-HF SCF | ✓ (makeGpuDFJK + buildDFAuto: whole HF loop on GPU from a URL, no 4-index ERI; benzene cc-pVDZ 5-6× faster whole-loop vs WASM; level-0 aux → ~30 mHa screening; f32 JK floor ~6e-4 is element-precision) |
fast browser HF (screening) |
| Hybrid GPU/WASM DF — EXPERIMENTAL | ✓ but demoted (decision 2026-06-10): buildV3idxHybrid is chemistry-grade (DF-vs-exact ladder H₂O 0.19 / CH₂O 0.53 / C₂H₄ 0.36 mHa, e2e/df-accuracy-ladder) but the win is only ~1.3× on the integral BUILD in a medium band — WebGPU has no f64 so it can't touch the f64-bound J/K. f64 WASM is the recommended chemistry default; the GPU hybrid is fast=true opt-in, proof-of-mechanism only. |
marginal; not the default |
runRHFAuto / runRKSAuto entry points |
✓ (rhf-auto.ts: size-gated exact(small)/f64-DF(large)/hybrid-GPU(fast) with honest provenance, for both HF and DFT — runRKSDFT gained a useDF option; pure functionals ride the cheap DF J, hybrids the DF K; validated H₂O LDA 0.07 mHa / B3LYP5 0.02 mHa vs exact) |
one call, right method, attributed, HF+DFT |
| IP-EOM-CCSD / EA-EOM-CCSD | ✓ (stages 37–38, beyond original Tier 2 plan) | accurate IPs / EAs |
CCSDT (full triples), CASSCF (multi-reference), TD-DFT, MP2/CCSD gradients (Z-vector), PCM solvent, coupled-perturbed HF (NMR / polarizabilities). (WebGPU integral parallelization: DF 3-index/metric
- DF-JK now shipped — see Tier 2; full 4-index ERI on GPU still open.)
CASPT2 / NEVPT2, periodic DFT (k-points), spin-orbit / X2C, analytical CC gradients, QM/MM.
- Phase D (WebRTC swarm) — distributed 1D chain across browsers.
~3-5 sessions. Reuse
webgpu-p2p-evolution's relay. - E.1 — Verify Sycamore — 2D PEPS + Sycamore gates + distributed contraction. ~3-5 sessions on top of Phase D.
- E.2 — Fault-tolerant qubit — stabilizer sim + surface code + syndrome decoder + threshold curve. ~4-6 sessions.
- E.3 — Browser-native lattice QCD — 4D lattice + Wilson Dirac + fused CG solver. ~6-10 sessions.
WebGPU (T) → EOM-CCSD → DFT excited-state properties. ~5-6 sessions to "real chemistry tool in a browser tab, with speed." Every step ships a publishable artifact.
Unifying thesis: "every advanced physics simulation in the world ships as a URL". webgpu-q is the proof point; the chemistry track is its highest-leverage demonstration.
Headline numbers:
- L1 statevector: F ≥ 0.999999 vs CPU; 4-experiment ladder (E1–E4) green.
- L2 MPS / DMRG: TFIM & Heisenberg N=128 in browser, χ=32, validated to Pfeuty/Bethe limits at 1/N. ITensor cross-checked at N=8 to f64.
- L3 kernel fusion: 4.22× headline (Tier C, 8×8 cascade); Tier D plateau (3.78×) is the documented honest negative.
- L6 chemistry: HF (enforced ≤ 0.5 mHa vs PySCF general; ≤ 0.1 mHa H₂O cc-pVDZ
spherical-d) → MP2 → FCI (CH₄ to 0.76 mHa) → CCSD (enforced ≥ 95% correlation
capture, ~99% typical on H₂O/CH₄) → CCSD(T) (≤ 0.25 mHa vs FCI).
cc-pVDZ CCSD(T) on H₂O — CPU 116 s, GPU 13.8× median (5 warmup +
20 trials, M2 Pro; p10=28×, p90=10×, std/median 42% noisy).
Full DFT ladder (LDA/GGA/B3-hybrid) on RHF/UHF/RKS/UKS.
Full {α, α(ω), α(iω), C₆} response matrix.
EE/IP/EA-EOM-CCSD with eigenvectors, oscillator strengths, spin classifier.
DF engine = f64 WASM (recommended default).
runRHFAuto+ siblings are size-gated: small → exact ERI (f64), large → streaming aux-basis DF (f64 WASM,buildAuxBasisDFStreaming), with honest method/engine/precision provenance. The GPU paths are EXPERIMENTAL (decision 2026-06-10), not the chemistry default. WebGPU has no f64, so the chemistry-grade hybrid (buildV3idxHybrid,fast=true) can only put f32 on the insensitive s/p/d-aux columns (~8 µHa) and buys just ~1.3× on the integral BUILD in a medium band — it can't touch the f64-bound J/K, and loses at PAH scale. The fully-GPU f32-JK path (makeGpuDFJK, benzene 5-6× whole-loop) is ~30 mHa screening only. Both kept as proof-of-mechanism ("GPU in the browser") and as the seam where a real win lands IF df64 emulation ever makes the GPU JK chemistry-grade. GPU genuinely wins on the f32-tolerant tracks (statevector, kernel-fusion 4.22×, (T) 39×) — DF chemistry just isn't one (needs f64).
Live: https://webgpu-q.vercel.app — landing, /viz.html (4D
hyperscope), /molecule.html (SI report), /experiments/ (E1–E33+).
Standing preference: do NOT auto-deploy — deploy only when explicitly asked.
Validation surface (what's checked, not how many): reference-grounded
gates green in CI — bit-exact / sub-µHa vs PySCF, EOM-CCSD full-tensor
brute-force diffs (14×14 LiH), CCSD(T) sub-mHa vs FCI, ITensor N=8 and
Pfeuty/Bethe 1D limits, swarm partition-sum vs single-slab below 1e-12.
npx tsc --noEmit clean, npm run lint clean, vitest green, e2e browser
benches (e2e/) cover all levels + the swarm/acene series.
Honest negatives / open work (each its own session):
- IP-EOM-CCSD: PySCF-ported (2026-05-22), multi-electron-validated
(2026-06-23). σ_1 + σ_2 follow Tu-Wang-Li 2012 Eqs (8)-(9) with PySCF
eom_gccsd intermediates. The earlier R_2 satellite over-count (~60 eV on H₂)
was a structural bug (NOT a curve-fit patch — unlike EA) and is closed. IP was
the one EOM variant whose exact oracle stayed H₂-only (T̂²≈0, can't probe σ_2);
a NEW multi-electron oracle (
tests/chemistry/ip-eom-ccsd-bruteforce-lih.test.ts, LiH NSO=6, T̂²≠0, full 16-eigenvalue H̄ = e^{-T}He^{T} projection vs runIPEOMCCSD) matches to 4.97e-13 Ha — IP passed first try, confirming it carried no patch, only a too-weak verifier. Element-by-element H₂ diff < 1e-10 retained. - Phase D — swarm shipped (2026-05-22), all three steps:
- Step 1:
swarmMap(items, fn)primitive +BroadcastChannelTransport(same-origin multi-tab, no infra). - Step 2:
WebRTCTransportvia PeerJS broker (cross-machine, NAT- traversal via Google STUN; symmetric-NAT corporate networks may need TURN, documented). - Step 3: real-chemistry kernel —
chem-energyruns a molecule tile, swarm distributes H₂ bond-length scans (and any 1D parameter scan) across tabs. /swarm.html ships both prime-counting (Demo 1) and bond-scan (Demo 2) demos. - Step 4 (swarm × GPU, 2026-06-09): the kernel now runs
runRHFAutoper tile, so each worker tab auto-picks exact / hybrid-GPU DF and reports its own provenance.e2e/swarm-gpudistributes an N₂ cc-pVDZ batch across 2 tabs, every tilegpu+wasm, tracing N₂'s bond curve (min r=1.098 Å). GPU-accelerated chemistry-grade single-points split across the crowd — the project's two theses in one demo. - Step 5 (distributed DF-MP2 + honest-negative measurement, 2026-06-15):
the swarm's first collaborative single-molecule reduction — ONE molecule's
MP2 correlation energy
E_corr = Σ_i Σ_j Σ_ab …partitioned over the outer occupied indexi, each tab owns an i-slice, master sums the scalar partials (mp2-slicekernel +mp2EnergyDF(...,iRange)+reduceMP2Slices). Comm- optimal (spec in, one f64 out, one round);reduceMP2Slicesguards the deterministic-reference assumption (throws if per-tab E_HF disagree). Validated: partition-sum == single-machine to <1e-12 (tests/chemistry/mp2-slice), 2-tab e2e to <1e-9 (e2e/swarm-mp2-distributed). HONEST NEGATIVE — distributing the contraction barely speeds up one molecule (e2e/swarm-mp2-speedup, single-shot M2 Pro): H₂O 0.51×, benzene cc-pVDZ 1.10× (single 96s / 2-tab 87s). Breakdown: redundant SCF+DF setup S≈79s (82%, paid in full on every tab, on the critical path) vs splittable grind C≈17.5s (18%). speedup=(S+C)/(S+C/k) is pinned near 1 while S≫C; C≫S needs n≈600 whose DF tensor (~5 GB) won't fit a tab. The swarm's scaling axis is throughput (N independent molecules via chem-energy), NOT single-molecule wall-time. To speed up one molecule you'd have to parallelize the SCF+DF setup, not the correlation. - Step 6 (screening + honest multi-tab scaling, 2026-06-15): the throughput
axis demonstrated.
chem-energynow returns the HOMO–LUMO gap (eV) as a screening descriptor;e2e/swarm-screeningranks a 10-molecule library by gap, validated to give the IDENTICAL ranking distributed vs single-tab (that spec validates ranking correctness only — its timing is indicative).e2e/swarm-scalingis the honest measurement (warmed JIT per tab, even round-robin split, true parallelism, wall = slowest tab): 1→1.00×, 2→1.73× (87% eff), 3→2.02× (67%), 4→2.36× (59%) on the library, HF/cc-pVDZ, M2 Pro. Sub-linear because molecule costs are uneven (H₂ ≪ C₂H₄) so the tab holding the heavy molecules caps the win. Further efficiency would need cost-aware scheduling (big molecules first/alone) + a larger library. (Earlier screening "1.59×" was retracted — it was warmup-inflated + a master-heavy auto-distribution; the warmed/balanced 2-tab number is 1.73×.) - Step 7 (greedy-pull scheduler fix, 2026-06-15): rewrote
swarmMap's distribution from single-claim-per-worker (master-heavy: a worker grabbed ONE tile then idled while the master ran the rest via a timeout fallback) to a greedy pull queue — every peer, master included, pulls one tile at a time and requests another only after finishing, so a slow tile parks only its own puller while everyone else keeps draining (also auto-balances uneven tile costs). Two subtleties fixed during the rewrite: (a) a failing worker's tile is run by the MASTER, not requeued to the shared pool — requeueing let a persistently-failing worker re-pull and re-fail the same tile in a tight loop (livelock, hungswarm.test.ts); (b) the master does a one-time head-start yield + keeps yielding while remote workers pull, because local kernel compute blocks the single-threaded event loop and would otherwise drain the queue via microtasks before any remote macrotask is processed. Result: auto-distributed screening went 9/1 → 4/6 (master/other), 1.05× → 1.48×;tests/parallel/swarm(13 tests) green; distributed-MP2 reduction still bit-exact.
- Step 1:
- EE-EOM-CCSD: PySCF-ported (2026-05-21). σ_1 + σ_2 follow Wang-Tu-Wang 2014 Eqs (9)-(10) with PySCF eom_gccsd intermediates (EOM-Fvv/Foo/Wovvo with full t2 dressing + Wovoo / Wvvvo). EE's empirical stage-32c patch removed; brute-force LiH diff < 1e-10 Ha element-by-element. H₂ STO-3G now matches FCI to 8+ decimals.
- EA-EOM-CCSD: PySCF-ported + multi-electron-validated (2026-06-16). The
2026-06-16 audit surfaced that
ea-eom-ccsd.tscarried an empirical+½·E_corr·R₂σ_2 diagonal patch (stage-32e) curve-fit to the H₂ brute-force (the diagnostic-loop-trap anti-pattern) — and used BARE integrals where PySCF uses dressed Wvvvo/Wvovv. A NEW multi-electron oracle (tests/chemistry/ea-eom-ccsd-bruteforce-lih.test.ts, LiH NSO=6, T̂²≠0) measured the patch ~1 mHa wrong. σ is now a direct port of PySCF eom_gccsd.eaccsd_matvec onto the shared dressed intermediates (buildEOMIntermediates: Fvv/Foo/Fov/Wvvvv/Wovvo/Wvovv/Wvvvo) + the proper−½ Σ⟨kl||cd⟩ r_l^{cd} t_{ki}^{ab}term, matching the explicit H̄ projection to ~5e-13 Ha on LiH (machine precision). All three EOM variants (EE/IP/EA) are now patch-free PySCF ports with multi-electron (LiH, T̂²≠0) brute-force verifiers — IP's LiH oracle added 2026-06-23 (it passed first try; only EA ever carried an actual curve-fit patch, EE/IP did not). - ✓ Aux-basis DF (stage 31 proper) — done:
buildAuxBasisDFStreaming(WASM) +df-gpu.ts(WGSL s/p/d 3-index V + metric). No longer open. - DF-CCSD via B-tensor through spin-orbital ERI build.
- df64 (double-single) emulation on GPU-JK products to push past the f32
~6e-4 element-precision floor (Kahan on the sum was a no-op — see jk-df-gpu.ts).
Lower priority now: the hybrid path (
buildV3idxHybrid) already gives chemistry-grade GPU-accelerated DF via f64 JK, sidestepping the f32-JK floor. - f-functions in the WGSL 3-index kernel: would raise the GPU-carried aux fraction past the current ~91% (hybrid offloads f-aux to WASM) for a bigger GPU win — but no longer needed for accuracy (the hybrid is chemistry-grade).
- WASM (or GPU-side) merge kernel to replace the JS block-assembly in
buildHybridDFStreaming— THE lever to extend the GPU hybrid past medium molecules. The hybrid currently gates off at n²·nAux ≥ 12 M because the per-block f32-low + f64-f-aux merge is a JS triple-loop that loses to WASM-SIMD streaming at PAH scale (naphthalene >2× slower; honest negative, 2026-06-09). Large molecules use all-WASM streaming DF, which is excellent; the GPU hybrid is a medium-molecule optimization (chemistry-grade + 1.31× V-build). - Naphthalene/PAH-scale DF-HF is feasibility-demonstrated, NOT precision-
validated. The capstone (
e2e/naphthalene-capstone) asserts only a sane energy window (ERI never built); DF-vs-exact chemical-accuracy is validated only up to the size ladder where the exact 4-index ERI still fits a tab (H₂O→CH₂O→C₂H₄, n≤50,e2e/df-accuracy-ladder). To claim "chemistry-grade up to naphthalene" needs an external PySCF DF-HF reference for that geometry + a sub-mHa assertion (exact ERI is uncomputable in a tab there). Surfaced by the scientific-critic pass 2026-06-09. Don't conflate feasible with validated. - WGSL (T) kernel optimization to push 39× → 100× (no warmup+trials harness yet).
- UKS-TDDFT response α(ω) — only remaining {ref}×{response} cell.
- Z-vector for MP2 / CCSD analytical gradients.
- NMR shielding via magnetic-perturbation CPHF.
- Becke-partition weight derivatives in DFT gradients (~1e-3 Ha/Bohr residual).
- Spherical-d on TDA-DFT / DFT-gradient grid (refuses with clear error).
- Davidson eigensolver for large-basis CIS / TDDFT.
- Continuum representation for E17 σ_ion convergence.
- Degenerate-eigenvector orthogonalization in
eigGeneralWithVectors.
Permanent verifiers:
tests/chemistry/eom-ccsd-bruteforce-lih.test.ts— full 14×14 M_mine − M_exact diff after any σ_1/σ_2 change. Binary feedback.tests/chemistry/ip-eom-ccsd-bruteforce.test.tstests/chemistry/ea-eom-ccsd-bruteforce.test.ts
From RESEARCH.md. Every experiment enforces them.
- No
Math.random()in any experiment path. Every random draw uses a named seed fromexperiments/lib/seeds.tsviamulberry32(seed). - Every JSON artifact records: git SHA (when available),
navigator.userAgent,adapter.info, WebGPU limits, UTC ISO8601 timestamp, and echoes backprotocol,hypothesis,passBar,seed,warmup,trials. Seeexperiments/lib/env.ts → captureEnv(device, adapter). - Artifact shape locked:
{ meta, env, rows, status, diagnosis }. Don't add top-level keys without updatingexperiments/lib/runner.tsand the downstream dashboard.
performance.now()with a forced GPU sync before AND after — a mapped readback of a tiny buffer.queue.submitalone is non-blocking so raw timing is fiction. Harness:experiments/lib/runner.ts → timedRun.- Discard 5 warmup samples. Retain 20 trials. Report median, p10, p90, p99, std, IQR — never single-shot.
- If
std/median > 0.1on any cell, mark the artifact"status": "noisy".
- Use fidelity F = |⟨ψ_ref | ψ_test⟩|², not max|Δp|. Two states can share
a probability distribution and differ in phase — that kills any downstream
controlled gate. Use
experiments/lib/fidelity.ts → stateMetrics. - Pass bar for f32-amplitude GPU paths:
F ≥ 1 − 1e-5. - Pass bar for f64 MPS vs f64 statevector:
F ≥ 0.999(MPS has SVD truncation + accumulated Jacobi error, ~9 digits realistic at χ = 64). - Secondary: TVD, L1, L2, max|Δp|, ‖ψ_ref‖², ‖ψ_test‖² — always reported.
- If an experiment fails its pass bar, still commit the JSON with
"status": "fail"and a"diagnosis"naming the first failing cell and the smoking gun. Failures are the evidence. No silent rerunning until it passes. - Example (MPS canonical-form bug, 2026-04-22): brick-wall F = 0.25 at
depth 2. Diagnosis: "non-monotonic two-site gate order breaks
mixed-canonical invariant, local Frobenius norm ≠ global norm,
renormalization distorts." Fix:
_canonicalizeBond(q)before everyapplyTwoSite.
npm install
npm run dev # Vite dev server, http://localhost:5175
# experiments live at http://localhost:5175/experiments/
npm run test # Vitest, ~500 ms (one outlier 5 s for the MPS bug repro)
npm run test:watch # TDD loop
npm run typecheck # tsc --noEmit (strict, noUncheckedIndexedAccess on)
npm run lint # ESLint flat config, src/ tests/ experiments/
npm run build # → dist/
npm run test:e2e # Playwright, all 4 levels headless (~1.4 min on M2 Pro).
# Saves JSON artifacts to experiments/results/<date>/level-N/.
# Each level also reachable via window.__webgpuq.runLevelN()
# in devtools at /experiments/.
npm run test:e2e:headed # Same, with a visible browser window.src/
shaders/
single-qubit.wgsl # 1-q gate kernel, N/2 threads, 2×2 complex matrix via uniform
two-qubit.wgsl # controlled-U kernel, N/4 threads
gates.ts # H, X, Y, Z, S/Sdg, T/Tdg, Rx/Ry/Rz, P, matrixFloats()
quantum.ts # QuantumCircuit (GPU) + initGPU() with requiredLimits
cpu-reference.ts # CpuCircuit (Float64 TS reference, ground truth)
circuits.ts # bell, ghz, qft, deutschJozsa, randomCircuit builders
linalg.ts # ComplexMatrix, Jacobi complex SVD, matmul — Level 2
mps.ts # MPS class with canonical form + TEBD — Level 2
bench.ts # GPU vs CPU throughput sweep (pre-research harness)
main.ts # Legacy browser demo entrypoint
chemistry/ # Level 6: HF, MP2, CCSD, CCSD(T), DFT, CIS/TDA/TDDFT,
# properties, gradients, geom-opt, vibrational analysis
tests/ # Vitest unit tests (chemistry/, gates, linalg, mps, …)
experiments/
index.html # Research dashboard (run buttons, result tables)
runner.ts # Dashboard entry point — wires each level's run-all
lib/
seeds.ts # Named deterministic seeds (no Math.random)
runner.ts # timedRun harness + Artifact / ArtifactMeta schema
env.ts # captureEnv(device, adapter) → EnvBlock
fidelity.ts # stateMetrics, FIDELITY_PASS_BAR
stats.ts # stats() — median, p10/p90/p99, std, IQR
level-1-statevector/ # E1–E4 + run-all
level-2-mps/ # E5–E7, E18, E19 + run-all
level-3-fusion/ # E8–E13 shipped (Tiers A/B/C/D fusion)
level-6-chemistry/ # E16, E20–E31 shipped (H₂ → CCSD(T)/cc-pVDZ)
results/ # JSON artifacts, organized YYYY-MM-DD/level-N/
- Amplitudes stored as
vec2<f32>interleaved (re, im). Buffer =2^(N+3)B. - Single-qubit gate:
N/2threads, each processes the pair(i, j)where bitqis 0 and 1. Apply 2×2 complex matrix from uniform buffer. - Two-qubit (controlled-U):
N/4threads, index scattered around control- target bits, only control=1 is touched.
initGPU()MUST request the adapter's maxmaxBufferSizeandmaxStorageBufferBindingSizeviarequiredLimits. Default 128 MiB cap silently truncates N ≥ 25 dispatches.- No atomics needed — gate application is pair-local read / write, zero contention.
- Tensor storage:
tensors[i]is aComplexMatrixof shape(χ_L · 2, χ_R)— left-grouped. ElementT[l, s, r]at rowl·2 + s, colr. Single-qubit gates apply cleanly this way. - Statevector convention: qubit 0 is LSB of the index —
ψ[s_0 + 2·s_1 + 4·s_2 + …].mps.statevector()follows this for comparison withCpuCircuit.psi. - Two-site gate order within the 4×4:
i = s_lo · 2 + s_hi— siteqis the MSB within the pair. Controlled-U needs the right ordering; seebuildControlledMatrix4(U, controlIsLo). - Canonical form invariant (critical). Two-site TEBD needs
‖M‖_F² = ‖ψ‖², which requires left-canonical on sites[0..q−1]and right-canonical on[q+2..N−1]._canonicalizeBond(q)does the sweep. Cost: O(N · χ³) per two-site gate. Trivial at N ≤ 20, χ ≤ 64. - SVD is one-sided Jacobi on complex matrices: phase-align col q by e^(−iφ) so ⟨p, q⟩ is real, then apply the real Jacobi rotation. 60 sweep cap, TOL = 1e-14.
apply*returns void (mutates).statevector()refusesN > 24.- v1 constraint:
applyTwoSite/applyControlledrequire|c − t| = 1. Non-adjacent two-qubit gates need SWAP ladders (not yet implemented).
experiments/lib/runner.ts → timedRun(device, fn, cfg)is the only legitimate way to measure wall time on GPU paths. It owns the sync fence and the error-scope guards.Artifact<Row>is the JSON shape.emitArtifactlogs;downloadArtifactserves it as a download from a click handler.- Per-experiment logs use the
[artifact:protocol] status — diagnosisprefix on stdout so CI greps can find pass/fail without parsing JSON.
initGPU()MUST passrequiredLimitsformaxStorageBufferBindingSizeandmaxBufferSize. Default 128 MiB cap silently truncates large dispatches.atomicAddonly onu32. Not needed in statevector path (no contention).- No recursive function calls in WGSL. All shaders are single-pass.
- Uniform buffers must be aligned.
- Scope-honest. Most research tasks here = hours for a capable agent, not weeks. Attempt now; decompose only if truly large.
- Speculation labeled. "This should work" ≠ "tested". Benchmark > belief.
- Raw WGSL > framework. Dispatch ceremony is the enemy.
- Edge hardware underrated. The thesis is "no one has shipped this in a browser tab." Don't reinvent it; ship the numbers.
Discovered the hard way via E35/E36 EOM-CCSD bug: webgpu-q's differentiator is the browser/WebGPU layer. The chemistry methods themselves are textbook with peer-reviewed reference implementations (PySCF, libxc, ITensor). Re-deriving them from papers, as we did, produces bugs that take weeks to find. Going forward:
- Hand-write only the novel layer: WGSL shaders, WebGPU dispatch + sync, MPS browser memory bookkeeping, kernel fusion, research-grade harness.
- Port from references with proper Apache 2.0 attribution everything else: HF, MP2, CCSD, UCCSD, CCSD(T), EOM-CCSD, DFT functionals (libxc), gradients (Pulay), density fitting, integrals if vectorizable, basis-set tables (EMSL).
Migration framework in MIGRATION.md. Per-module
status table (🔴 hand-derived → 🟢 ported), priority order, attribution
recipe. LICENSE-PYSCF at root covers ported portions.
First scheduled port: eom-ccsd.ts σ_2 from PySCF
pyscf/cc/eom_rccsd.py. Closes the singlet-sector bug E35 surfaced
on H₂O / NH₃ / CH₄ / BeH₂ / LiH. Verifier is the LiH brute-force
diagnostic (tests/chemistry/eom-ccsd-bruteforce-lih.test.ts) — after
the port, M_mine − M_exact should collapse to numerical noise.
What our claims map to in current literature. Run this audit again before any release or paper draft.
- Chemical accuracy = 1 kcal/mol = 1.594 mHa (Pople pragmatic threshold). Our CCSD(T) vs FCI residuals (≤ 0.25 mHa) are sub-chemical; our GPU↔CPU |Δ| (≈ 10⁻¹⁰ Ha) is ~6 orders past chemical accuracy and characterizes f32 reduction noise, not method error.
- CCSD(T) is still the gold standard in 2025/2026 (multiple JCTC reviews). MAE ~0.2–0.3 kcal/mol at CBS for noncovalent interactions.
- AFQMC (Mahajan et al. JCTC Feb 2025, arXiv:2410.02885) now beats CCSD(T) at O(N⁶) vs O(N⁷). Tier 4 candidate "beyond CCSD(T)".
- EOM-CCSD literature accuracy vs FCI for singlet single-excitations is 0.1–0.2 eV (~3.7–7.4 mHa) typical, 0.3 eV conservative. Doubly-excited states: errors up to 1 eV. Our 10⁻⁵ Ha on H₂ STO-3G is algorithmic precision (T̂² = 0 for 2-electron systems makes EOM-CCSD ≡ FCI exactly there) — it validates the implementation, not the method on real systems.
- GMTKN55 best functionals (2024–2025): ωB97M(2) DH WTMAD2 = 2.19 kcal/mol (best ever), xrevDSD-PBEP86-D4 = 2.23, revDSD-PBEP86-D4 = 2.33. Best RSH: ωB97X-V. Best meta-GGA: SCAN-D3(BJ). We benchmark with B3LYP5 / BLYP / LSDA / B88 / LYP — textbook, not current SOTA. Modern functionals are in the Tier 3 row.
- MPS state-of-the-art: TeNPy / ITensor are the reference libraries. Production runs go to χ = 1000+. Our χ ≤ 64 is "browser-feasible"; the comparison Schollwöck 2011 still holds (χ scales with entanglement).
- WebGPU subgroups: out of WebGPU 1.0 spec (gpuweb#3950); coming later. Would unlock 2× reductions in fusion kernels (shuffle/add).
- FAIR / Zenodo DOI: standard for reproducible computational chemistry data publishing. We emit JSON artifacts with full env capture but don't mint DOIs. Tier 3+ research-publishing improvement.
- Browser-native quantum chemistry: as of 2026-05 web search, no published WebGPU + HF/DFT/CCSD(T) implementation exists outside this repo. Worth a paper if Phase D / hardware verify ever lands.
- Sibling:
/Users/ahmetbarisgunaydin2/Downloads/webgpu-dna/— Geant4-DNA port. Has its own CLAUDE.md. Level 6 chemistry cross-links here. kernelfusion.dev— umbrella theory.gpubench.dev— WebGPU bench harness reuse pattern.- Pan & Zhang 2021 (arXiv:2103.03074) — Sycamore tensor-network baseline.
- Karamitros 2011 — IRT chemistry, cross-link target.
- IBM Heron r2 (156q, 2025), Nighthawk (120q, Jan 2026) — E14 target.
- Schollwöck 2011 — MPS / DMRG review, χ-vs-error baseline.
- Vidal 2003 — iTEBD algorithm (what
applyTwoSiteimplements). - GMTKN55: Goerigk, Hansen, Bauer et al., PCCP 2017 — main DFT benchmark.
- Mahajan et al. JCTC 2025 — AFQMC beats CCSD(T) at O(N⁶).
- NIST CCCBDB — experimental reference IP, EA, vibrational data.
MIT (simulation). Research protocol and experiment artifacts: MIT.