Skip to content

Commit b4ae593

Browse files
docs(gpu): Phase 4c follow-up — wild + wild_cluster in guide and README; drop unverified Efron 1979 attribution
Self-review pass on the v1.14 GPU docs: 1. Citation hygiene (CLAUDE.md §10). The Phase 4 GPU guide contained a textual ``(Efron 1979)`` attribution for pairs bootstrap that is **not** in paper.bib and was pulled from training-corpus memory. Per the zero-hallucination rule, drop it. ``cameron2008bootstrap`` is the only bootstrap citation we use; it has a verified DOI in paper.bib. 2. Phase 4c surface coverage. The ``Wild cluster bootstrap`` line in the GPU guide's "Future GPU candidates" list is no longer future work — it's already shipped in 4782152. Move it from the future list into the body description as a sibling of pairs / cluster, with the same score-formulation note we put in CHANGELOG.md and paper.md. 3. Activation table + README accelerator bullet now list all four bootstrap variants (pairs, cluster, wild, wild cluster) instead of just two. No code changes; no test impact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent a87d788 commit b4ae593

2 files changed

Lines changed: 17 additions & 7 deletions

File tree

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -339,7 +339,7 @@ StatsPAI is **not** a wrapper for R. We independently re-implement every algorit
339339
- **One result object, one API surface.** Every estimator — from `regress()` to `callaway_santanna()` to `causal_forest()` to `notears()` — returns a `CausalResult` with the same `.summary()` / `.plot()` / `.to_latex()` / `.cite()` interface. R users juggle 20+ incompatible S3 classes; StatsPAI users juggle one.
340340
- **Scope no single R or Python package matches.** DID + RD + Synth + Matching + DML + Meta-learners + TMLE + Neural Causal + Causal Discovery + Policy Learning + Conformal + Bunching + Spillover + Matrix Completion — all consistent, all under `sp.*`.
341341
- **Agent-native by design.** Self-describing schemas (`list_functions()`, `describe_function()`, `function_schema()`) make StatsPAI the first econometrics toolkit built for LLM-driven research workflows. No other package — in any language — offers this.
342-
- **Accelerator-ready where it matters.** Selected workloads can opt into accelerator backends without changing the public API: neural causal estimators route through PyTorch CUDA/MPS via `STATSPAI_TORCH_DEVICE`; the HDFE residualizer exposes `backend="jax"`; `sp.fast.feols_jax` runs end-to-end OLS on XLA; and **`sp.fast.feols_jax_bootstrap`** uses `jax.vmap` to lift pairs / cluster bootstrap into a single batched device program 10–100x faster on CUDA / TPU than a sequential CPU loop at B ≥ 1000. See [GPU acceleration guide](docs/guides/gpu_acceleration.md). This is not a universal GPU-speed claim; most StatsPAI estimators are CPU-only by design (and that's the right choice for them).
342+
- **Accelerator-ready where it matters.** Selected workloads can opt into accelerator backends without changing the public API: neural causal estimators route through PyTorch CUDA/MPS via `STATSPAI_TORCH_DEVICE`; the HDFE residualizer exposes `backend="jax"`; `sp.fast.feols_jax` runs end-to-end OLS on XLA; and **`sp.fast.feols_jax_bootstrap`** uses `jax.vmap` to lift four bootstrap variants — pairs, cluster, wild, and wild cluster into a single batched device program, 10–100x faster on CUDA / TPU than a sequential CPU loop at B ≥ 1000. See [GPU acceleration guide](docs/guides/gpu_acceleration.md). This is not a universal GPU-speed claim; most StatsPAI estimators are CPU-only by design (and that's the right choice for them).
343343
- **Publication pipeline out of the box.** Word + Excel + LaTeX + HTML + Markdown export from every estimator, not a separate `modelsummary`-style dance.
344344

345345
If a method exists in R, we aim to match or exceed its feature set in Python — and then add what Python can uniquely offer: sklearn integration, opt-in JAX/PyTorch accelerator backends, and agent-native schemas.

docs/guides/gpu_acceleration.md

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
| Neural IV: Deep IV (Hartford et al. 2017) | `sp.deepiv` | PyTorch | same env var |
2323
| HDFE demean (alternating projection) | `sp.fast.demean(backend="jax")` | JAX / XLA | install `jax[cuda]` |
2424
| OLS / WLS with HDFE | `sp.fast.feols_jax` | JAX / XLA | install `jax[cuda]` |
25-
| **Bootstrap (pairs / cluster)** | `sp.fast.feols_jax_bootstrap` | JAX / XLA `vmap` | install `jax[cuda]` |
25+
| **Bootstrap (pairs / cluster / wild / wild_cluster)** | `sp.fast.feols_jax_bootstrap` | JAX / XLA `vmap` | install `jax[cuda]` |
2626

2727
The CPU paths (`sp.fast.demean`, `sp.fast.feols`, `sp.fast.fepois`,
2828
`sp.fast.boottest`, `sp.iv`, `sp.did`, `sp.rd`, `sp.synth`, …) all
@@ -44,14 +44,25 @@ time`; on CPU JAX it's still ~equal to a numpy sequential bootstrap
4444
(JIT overhead amortises around B ≈ 100). The speedup curve crosses
4545
favourably very quickly.
4646

47-
**Pairs bootstrap** (Efron 1979): each draw resamples *rows* with
48-
replacement; multinomial counts become per-row WLS weights. Asymptotic
49-
target: HC1 standard errors.
47+
**Pairs bootstrap**: each draw resamples *rows* with replacement;
48+
multinomial counts become per-row WLS weights. Asymptotic target:
49+
HC1 standard errors.
5050

5151
**Cluster bootstrap** (Cameron, Gelbach & Miller 2008 §III.A): each
5252
draw resamples *clusters* with replacement; observations in a cluster
5353
sampled k times get weight k. Asymptotic target: CR1 standard errors.
5454

55+
**Wild bootstrap**: each draw assigns independent Rademacher signs
56+
``η_i ∈ {-1, +1}`` per row and uses the *score formulation*
57+
``β* = β̂ + (X'WX)⁻¹ X'W (η ⊙ û)``, mathematically identical to
58+
refitting on ``y* = X β̂ + η ⊙ û`` but with one mat-vec per
59+
iteration instead of a full QR.
60+
61+
**Wild cluster bootstrap** (Cameron, Gelbach & Miller 2008 §III.B):
62+
same score formulation as wild, but the Rademacher signs are drawn
63+
*per cluster*. The standard tool for few-cluster inference (G < 30,
64+
especially G < 10) where cluster bootstrap can over-reject.
65+
5566
```python
5667
import statspai as sp
5768

@@ -162,12 +173,11 @@ Performance Shaders) when CUDA is unavailable.
162173
| Bayesian causal (PyMC) | NumPyro / JAX backend optional | Routing to GPU works *via PyMC*; we don't reimplement. |
163174

164175
Future GPU candidates (open issues welcome):
176+
165177
- **Permutation tests / placebo studies**`vmap` over permutations is
166178
the obvious follow-up to bootstrap.
167179
- **DML cross-fitting** — k-fold parallel nuisance fits.
168180
- **Synthetic control matrix completion** — large-K SVD on GPU.
169-
- **Wild cluster bootstrap (Cameron-Gelbach-Miller §III.B)**
170-
Phase 4c; closely related to the existing pairs / cluster bootstrap.
171181
- **Causal forest training** — wire `xgboost` / `cuml` for tree fits.
172182

173183
---

0 commit comments

Comments
 (0)