docs: correct sp.dml_panel + clusterSC citations

brycewang-stanford · brycewang-stanford · commit 7e864e1d515c · 2026-05-06T00:43:37.000-07:00
Two citation-correctness fixes:

1. sp.dml_panel (originally shipped in v1.7) was attributed to
   "Semenova &amp; Chernozhukov (2023) Econometrics Journal 26(2)" —
   that paper does not exist (the cited title belongs to Semenova
   &amp; Chernozhukov 2021 ECTJ 24(2) on CATE / debiased ML, unrelated
   to long-panel PLR with fixed effects). The estimator's actual
   reference is Clarke &amp; Polselli (2025) "Double Machine Learning
   for Static Panel Models with Fixed Effects" Econometrics
   Journal 29(1) 69-86, DOI 10.1093/ectj/utaf011, arXiv:2312.08174.
   Updated callsites: paper.bib (new clarke2025double entry),
   src/statspai/dml/panel_dml.py (module docstring + within-
   transform comment), src/statspai/dml/__init__.py (lazy-export
   tag), src/statspai/registry.py (FunctionSpec description +
   reference field), README.md (Long-panel Double-ML row), and
   the historical v1.7 CHANGELOG entry (annotated, not silently
   rewritten). No code logic, numerical path, API signature, or
   test changed — pure citation correction.

2. sp.synth(method='cluster') method-citations registry had
   "Rho, S., Yan, X. et al. (2025)"; the actual ClusterSC paper
   (arXiv:2503.21629) is Rho, Tang, Bergam, Cummings, Misra.
   src/statspai/synth/report.py "Yan, X." -&gt; "Tang, A." (paper.bib
   was already correct).

Also: tests/test_audit_citations.py CLI smoke test now accepts
exit code 1 (auditor ran successfully and emitted findings) in
addition to 0/2, with an explicit `Traceback` not-in-stderr
guard so a real crash still fails the test.

Refs verified via Crossref `10.1093/ectj/utaf011` (clarke2025double)
and arXiv 2503.21629 (rho2025clustersc) on 2026-05-06.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,41 @@
 
 All notable changes to StatsPAI will be documented in this file.
 
+## [Unreleased]
+
+### Docs — `sp.dml_panel` citation correction
+
+- ⚠️ **Docs-only correction** — `sp.dml_panel` (originally shipped in
+  v1.7) was attributed in its docstring, registry entry, README blurb,
+  and CHANGELOG release note to *"Semenova & Chernozhukov (2023)
+  Econometrics Journal 26(2), Debiased Machine Learning of Conditional
+  Average Treatment Effects and Other Causal Functions."* That
+  citation is **fabricated**: independent verification via Crossref
+  and the Oxford ECTJ issue TOC confirms no Semenova or Chernozhukov
+  paper appears anywhere in *Econometrics Journal* 26(2) (May 2023),
+  and the cited title in fact belongs to Semenova & Chernozhukov
+  **(2021)** *ECTJ* **24(2)** 264-289 (DOI 10.1093/ectj/utaa027) — a
+  paper on CATE / debiased ML for causal functions, unrelated to
+  long-panel PLR with fixed effects.
+- The estimator's actual reference is **Clarke, P. S. & Polselli, A.
+  (2025).** *"Double Machine Learning for Static Panel Models with
+  Fixed Effects."* *The Econometrics Journal* **29(1)** 69-86, DOI
+  [10.1093/ectj/utaf011](https://doi.org/10.1093/ectj/utaf011),
+  arXiv:2312.08174. The paper specifies the within-group / first-
+  difference transform, block-k-fold cross-fitting that allocates
+  each unit's full time series to a single fold, and cluster-robust
+  variance at the unit level — point-for-point match with the
+  StatsPAI implementation. Companion Stata package: `xtdml`.
+- Updated callsites: [`paper.bib`](paper.bib) (new
+  `clarke2025double` entry), [`src/statspai/dml/panel_dml.py`](src/statspai/dml/panel_dml.py)
+  (module docstring + within-transform comment), [`src/statspai/dml/__init__.py`](src/statspai/dml/__init__.py)
+  (lazy-export tag), [`src/statspai/registry.py`](src/statspai/registry.py)
+  (FunctionSpec description + reference field), [`README.md`](README.md)
+  (Long-panel Double-ML row), and the historical v1.7 entry below
+  (annotated, not silently rewritten). No code logic, numerical path,
+  API signature, or test changed — pure citation correction.
+- Refs verified via Crossref (DOI 10.1093/ectj/utaf011) and OpenAlex.
+
 ## [1.15.0] — 2026-05-05
 
 ### Docs — v1.14 GPU sprint follow-up
@@ -4468,12 +4503,19 @@ frontier estimators (`sp.mr_lap` etc.), one long-panel DML estimator
 
 ### Added — v1.7 long-panel DML (`src/statspai/dml/panel_dml.py`)
 
-- **`sp.dml_panel`** — Long-panel Double/Debiased ML (Semenova-
-  Chernozhukov 2023 simplified).  Absorbs unit (and optional time)
-  fixed effects via within-transform, cross-fits ML nuisance learners
-  with folds that **split units** (Liang-Zeger compatible), reports
-  cluster-robust SE at the unit level.  PLR moment for continuous or
-  binary treatment; empty-covariate fallback reduces to pure FE-OLS.
+- **`sp.dml_panel`** — Long-panel Double/Debiased ML for static panel
+  models with fixed effects (Clarke & Polselli 2025 simplified).
+  Absorbs unit (and optional time) fixed effects via within-transform,
+  cross-fits ML nuisance learners with folds that **split units**
+  (Liang-Zeger compatible), reports cluster-robust SE at the unit
+  level.  PLR moment for continuous or binary treatment;
+  empty-covariate fallback reduces to pure FE-OLS.
+  *(Citation corrected post-v1.15: the original v1.7 release note
+  attributed this estimator to a "Semenova-Chernozhukov 2023
+  Econometrics Journal 26(2)" paper that does not exist; the actual
+  reference is Clarke & Polselli (2025) ECTJ 29(1) 69-86, DOI
+  10.1093/ectj/utaf011, arXiv:2312.08174. See [Unreleased] for the
+  full audit.)*
 
 ### Added — dispatcher + registry wiring
 
diff --git a/README.md b/README.md
@@ -284,7 +284,7 @@ StatsPAI 1.6.0 is a **pure-additive** minor release pushing two competitive axes
 | **LLM × DAG (closed-loop)** | **`sp.llm_dag_constrained`** — iterate **propose → constrained PC → CI-test validate → demote** until convergence. Every kept edge carries `llm_score` + `ci_pvalue` + `source` ∈ `{required, forbidden, demoted, ci-test}`. `result.to_dag()` round-trips into `statspai.dag.DAG`. **`sp.llm_dag_validate`** audits any declared DAG edge-by-edge for spuriousness. **`sp.pc_algorithm(forbidden=, required=)`** injects background knowledge into PC (default `None` preserves prior contract). Family guide: [`docs/guides/llm_dag_family.md`](docs/guides/llm_dag_family.md). |
 | **Causal × Text (experimental)** | **`sp.text_treatment_effect`** — Veitch-Wang-Blei (2020 UAI) text-as-treatment ATE via embedding-projected OLS with HC1 SEs; hash embedder default (deterministic, dependency-free), lazy `sbert` optional. **`sp.llm_annotator_correct`** — Egami-Hinck-Stewart-Wei (2024) Hausman-style measurement-error correction for binary LLM-derived treatments; raises `IdentificationFailure` when the LLM has no information. Both subclass `CausalResult` and ship full agent-card metadata. Family guide: [`docs/guides/causal_text_family.md`](docs/guides/causal_text_family.md). |
 | **MR frontier (5 new)** | **`sp.mr_lap`** (Burgess-Davies-Thompson 2016 sample-overlap-corrected IVW). **`sp.mr_clust`** (Foley-Mason-Kirk-Burgess 2021 clustered MR via finite Gaussian mixture on Wald ratios, BIC-selected K). **`sp.grapple`** (Wang-Zhao-Bowden-Hemani 2021 profile-likelihood MR with joint weak-instrument + balanced-pleiotropy robustness). **`sp.mr_cml`** (Xue-Shen-Pan 2021 constrained maximum-likelihood MR with L0-sparse pleiotropy, MR-cML-BIC). **`sp.mr_raps`** (Zhao-Wang-Hemani-Bowden-Small 2020 *Annals of Statistics* robust adjusted profile score with Tukey biweight loss). `sp.mr(method='lap' \| 'clust' \| 'grapple' \| 'cml' \| 'raps')` dispatcher routes all five. 41 new tests in `tests/test_mr_frontier.py`. |
-| **Long-panel Double-ML** | **`sp.dml_panel`** — Semenova-Chernozhukov (2023) long-panel DML: absorbs unit (+ optional time) fixed effects via within-transform, cross-fits ML nuisance learners with unit-split folds (Liang-Zeger compatible), reports cluster-robust SE at the unit level. Empty-covariate fallback reduces to pure FE-OLS. 13 new tests. |
+| **Long-panel Double-ML** | **`sp.dml_panel`** — Clarke & Polselli (2025) DML for static panel models with fixed effects: absorbs unit (+ optional time) fixed effects via within-transform, cross-fits ML nuisance learners with unit-split folds (Liang-Zeger compatible), reports cluster-robust SE at the unit level. Empty-covariate fallback reduces to pure FE-OLS. 13 new tests. |
 | **Typed exception taxonomy** | `StatsPAIError` root + `AssumptionViolation` / `IdentificationFailure` / `DataInsufficient` / `ConvergenceFailure` / `NumericalInstability` / `MethodIncompatibility`, each carrying `recovery_hint`, machine-readable `diagnostics`, and a ranked `alternative_functions` list. Warning counterparts: `StatsPAIWarning` / `ConvergenceWarning` / `AssumptionWarning` plus a rich-payload `sp.exceptions.warn()` helper. Domain errors subclass `ValueError` / `RuntimeError` → existing `except` blocks keep working unchanged. 13 call-site migrations already shipped (DID, IV, matching, DML-IRM, synth, Bayesian DML). |
 | **Agent cards + registry** | `sp.agent_card(name)` and `sp.agent_cards(category=None)` return `pre_conditions` / `assumptions` / `failure_modes` (symptom + exception + remedy + alternative) / `alternatives` / `typical_n_min` for **36 flagship functions** (`regress`, `iv`, `did`, `callaway_santanna`, `rdrobust`, `synth`, `dml`, `dml_panel`, `causal_forest`, `metalearner`, `match`, `tmle`, `bayes_dml`, `bayes_did`, `bayes_iv`, `proximal`, `mr`, `qdid`, `qte`, `dose_response`, `spillover`, `multi_treatment`, `network_exposure`, `paper`, `llm_dag_constrained`, `llm_dag_validate`, `text_treatment_effect`, `llm_annotator_correct`, ...). `sp.recommend()` auto-consumes them: every recommendation now includes `agent_card` / `pre_conditions` / `failure_modes` / `alternatives` / `typical_n_min`, with an auto-warning when `n_obs < typical_n_min`. |
 | **Result-object agent hooks** | `CausalResult.violations()` / `EconometricResults.violations()` inspect stored diagnostics (pre-trend p, first-stage F, McCrary, rhat/ESS/divergences, overlap, SMD) and return flagged items with `severity` / `recovery_hint` / `alternatives`. `.to_agent_summary()` returns a JSON-ready structured payload (point estimate, coefficients, scalar diagnostics, violations, next-steps) alongside the existing prose `.summary()` and `tidy()` DataFrame. |
diff --git a/paper.bib b/paper.bib
@@ -4869,6 +4869,18 @@ @techreport{chernozhukov2022long
   note={arXiv:2112.13398}
 }
 
+@article{clarke2025double,
+  title={Double Machine Learning for Static Panel Models with Fixed Effects},
+  author={Clarke, Paul S. and Polselli, Annalivia},
+  journal={The Econometrics Journal},
+  volume={29},
+  number={1},
+  pages={69--86},
+  year={2025},
+  doi={10.1093/ectj/utaf011},
+  note={arXiv:2312.08174}
+}
+
 @article{semenova2021debiased,
   title={Debiased Machine Learning of Conditional Average Treatment Effects and Other Causal Functions},
   author={Semenova, Vira and Chernozhukov, Victor},
diff --git a/src/statspai/dml/__init__.py b/src/statspai/dml/__init__.py
@@ -41,7 +41,7 @@
     'dml_model_averaging',
     'model_averaging_dml',
     'DMLAveragingResult',
-    # v1.7 long-panel DML (Semenova-Chernozhukov 2023)
+    # v1.7 long-panel DML (Clarke & Polselli 2025)
     'dml_panel',
     'DMLPanelResult',
     # v1.13 sensitivity + diagnostics (Chernozhukov-Cinelli-Newey 2022)
diff --git a/src/statspai/dml/panel_dml.py b/src/statspai/dml/panel_dml.py
@@ -1,5 +1,6 @@
 """
-Long-panel Double/Debiased ML (Semenova-Chernozhukov 2023, simplified).
+Long-panel Double/Debiased ML for static panel models with fixed effects
+(Clarke & Polselli 2025, simplified).
 
 Estimates the causal effect of a (continuous or binary) treatment on an
 outcome from panel data while (i) absorbing unit and optional time
@@ -41,9 +42,10 @@
 
 References
 ----------
-Semenova, V. & Chernozhukov, V. (2023).
-"Debiased Machine Learning of Conditional Average Treatment Effects
-and Other Causal Functions." *Econometrics Journal*, 26(2).
+Clarke, P. S. & Polselli, A. (2025).
+"Double Machine Learning for Static Panel Models with Fixed Effects."
+*The Econometrics Journal*, 29(1), 69-86. DOI 10.1093/ectj/utaf011
+(arXiv:2312.08174). [@clarke2025double]
 
 Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C.,
 Newey, W. & Robins, J. (2018).
@@ -426,7 +428,7 @@ def dml_panel(
     Y_tilde = _within_transform(Y, unit_ids, time_idx_for_within, sample_weight=w_full)
     D_tilde = _within_transform(D, unit_ids, time_idx_for_within, sample_weight=w_full)
     # Covariates demeaned the same way so the nuisance learners work on
-    # within-variation only — matches Semenova-Chernozhukov 2023 §4.
+    # within-variation only — matches Clarke & Polselli (2025) §3.
     if covariates:
         X_tilde = np.column_stack([
             _within_transform(
diff --git a/src/statspai/registry.py b/src/statspai/registry.py
@@ -4163,16 +4163,16 @@ def _build_registry():
         reference="Ahrens, Hansen, Schaffer & Wiemann (2025). JAE 40(3):249-269. DOI 10.1002/jae.3103.",
     ))
 
-    # -- v1.7 long-panel DML (Semenova-Chernozhukov 2023) -------------- #
+    # -- v1.7 long-panel DML (Clarke & Polselli 2025) ------------------ #
     register(FunctionSpec(
         name="dml_panel",
         category="causal",
         description=(
-            "Long-panel Double/Debiased ML (Semenova-Chernozhukov 2023 "
-            "simplified). Absorbs unit (and optional time) fixed "
-            "effects via within-transform, cross-fits ML nuisance "
-            "learners with folds that split units, and reports "
-            "cluster-robust SE at the unit level. PLR moment "
+            "Long-panel Double/Debiased ML for static panel models with "
+            "fixed effects (Clarke & Polselli 2025, simplified). Absorbs "
+            "unit (and optional time) fixed effects via within-transform, "
+            "cross-fits ML nuisance learners with folds that split units, "
+            "and reports cluster-robust SE at the unit level. PLR moment "
             "(continuous or binary treatment)."
         ),
         params=[
@@ -4204,7 +4204,8 @@ def _build_registry():
         tags=["dml", "causal", "panel", "fixed_effects",
               "cluster_robust_se", "long_panel"],
         reference=(
-            "Semenova & Chernozhukov (2023) Econometrics Journal 26(2); "
+            "Clarke & Polselli (2025) Econometrics Journal 29(1) 69-86, "
+            "DOI 10.1093/ectj/utaf011; "
             "Chernozhukov et al. (2018); Cameron & Miller (2015)."
         ),
         pre_conditions=[
diff --git a/src/statspai/synth/report.py b/src/statspai/synth/report.py
@@ -210,7 +210,7 @@
         "[@sun2023multiple]"
     ),
     "cluster": (
-        "Rho, S., Yan, X. et al. (2025). "
+        "Rho, S., Tang, A. et al. (2025). "
         '"ClusterSC: Cluster-Aware Synthetic Control." '
         "arXiv:2503.21629. [@rho2025clustersc]"
     ),
diff --git a/tests/test_audit_citations.py b/tests/test_audit_citations.py
@@ -361,9 +361,20 @@ def test_cli_runs_without_crash_on_empty_tree(tmp_path):
     resolves roots relative to its own ``REPO_ROOT`` constant (not
     ``cwd``), so in CI it actually scans the real src/ and makes live
     arXiv calls — which can hit HTTP 429 rate limits on a cold runner
-    and would flip ``--strict`` into exit 1. Non-strict mode keeps
-    exit 0 whenever the script didn't crash, which is what this test
-    is really checking.
+    and would flip ``--strict`` into exit 1.
+
+    Accepted exits:
+
+    * ``0`` — clean run, no findings.
+    * ``1`` — ran successfully and emitted a report listing
+      mismatches / unresolved DOIs (regex-level surname false
+      positives such as treating ``"Form"`` / ``"Behavior"`` /
+      ``"SEs"`` as author surnames are a known auditor limitation,
+      not a crash).
+    * ``2`` — soft failure (rate limit, network) — acceptable.
+
+    Any other exit (including a traceback in ``stderr``) is a real
+    failure.
     """
     (tmp_path / "src").mkdir()
     (tmp_path / "docs").mkdir()
@@ -376,7 +387,10 @@ def test_cli_runs_without_crash_on_empty_tree(tmp_path):
         capture_output=True, text=True, check=False,
         cwd=tmp_path,
     )
-    assert result.returncode in (0, 2), (
+    assert "Traceback" not in result.stderr, (
+        f"auditor crashed with traceback:\n{result.stderr}"
+    )
+    assert result.returncode in (0, 1, 2), (
         f"unexpected exit {result.returncode}: {result.stderr}"
     )