Skip to content

Commit b5e340f

Browse files
feat(ml-causal): DML sensitivity + RATE + BLP/mediate fixes + DAG viz
Cross-cutting polish wave on the machine-learning + causal-inference module family so the package matches the 2024-2026 reporting frontier set by DoubleML, EconML, grf, and lmtp. New capabilities: - sp.dml_sensitivity / DMLSensitivityResult — Chernozhukov-Cinelli- Newey-Sharma-Syrgkanis (2022) "Long Story Short" DML-OVB sensitivity with RV_q, RV_{q,alpha}, scenario bias bounds, benchmark covariates, and a sensemakr-style bias-contour plot. Refs verified via NBER + arXiv. - sp.dml_diagnostics / DMLDiagnostics — overlap, score density, residual balance, orthogonality test in a 2x2 publication panel matching DoubleML defaults (Bach et al. 2024 JSS). - sp.cate_eval / CATEEvalResult — backbone-agnostic Yadlowsky 2025 RATE / AUTOC / Qini with closed-form influence-function SEs for any CATE array (meta-learners, BCF, conformal, neural). - sp.PolicyTreeResult — promote sp.policy_tree dict return to a rich result class (subclass of dict for back-compat) with influence- function SE on policy value, Graphviz-style plot_tree(), summary, to_latex, to_excel, cite (Athey-Wager 2021). - Causal discovery viz — every result class (LiNGAM/GES/FCI/ICP/PCMCI/ LPCMCI/DYNOTEARS) and the dict returns from sp.notears / sp.pc_algorithm (now DAGDict) expose to_networkx / to_dot / plot / edge_list. Module-level sp.causal_discovery.{to_networkx, to_dot, plot_dag, edge_list, shd} work on any adjacency matrix. - MediateSensitivityResult.plot() upgraded to publication-style ACME(rho) with coloured fill, baseline annotation, and rho-at-zero threshold. - OPE namespace dedup — sp.policy_learning.OPEResult is now an alias for the canonical sp.ope.estimators.OPEResult, so isinstance works across both entry points. ⚠️ Correctness fixes: - forest.CausalForest.best_linear_projection rewritten to use Semenova-Chernozhukov 2021 AIPW pseudo-outcome Gamma_i + HC1 SE. Previous implementation regressed plug-in CATE on X with naive OLS SEs (anti-conservative). Users should re-fit and report new HC1 numbers. - mediation.mediate no longer silently substitutes the point estimate for failed bootstrap replicates (which artificially shrunk SEs). Up to 5 retries per failure; remaining failures dropped with RuntimeWarning if >10% fail. model_info now exposes n_boot_successful / n_boot_failed / boot_failure_rate. Tests: 39 new (10 dtr, 7 qte, 22 ml_causal_polish), all green. The dtr/qte additions close two zero-coverage modules flagged in the v1.13 audit. Full ML+causal suite (307 tests) passes in 4m49s. Citations: 4 new entries added to paper.bib, each verified independently via NBER / arXiv / journal site: - chernozhukov2022long (NBER WP 30302; arXiv:2112.13398) - semenova2021debiased (Econometrics J 24(2): 264-289) - yadlowsky2025evaluating (JASA 120(549); arXiv:2111.07966) - bach2024doubleml (JSS 108(3)) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent b57e99d commit b5e340f

27 files changed

Lines changed: 2999 additions & 114 deletions

CHANGELOG.md

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,93 @@ All notable changes to StatsPAI will be documented in this file.
44

55
## [Unreleased]
66

7+
### Added — ML+causal polish
8+
9+
A cross-cutting polish wave on the machine-learning + causal-inference
10+
module family (DML / meta-learners / causal forests / causal discovery
11+
/ policy learning / mediation / OPE) so the package matches the
12+
2024–2026 reporting frontier set by DoubleML, EconML, grf, and lmtp.
13+
14+
- **DML-OVB sensitivity analysis** (`sp.dml_sensitivity`,
15+
`DMLSensitivityResult`) implementing the Chernozhukov–Cinelli–
16+
Newey–Sharma–Syrgkanis (2022) "Long Story Short" framework
17+
(NBER WP 30302; arXiv:2112.13398). Returns the robustness value
18+
RV_q (strength of confounder needed to shrink the estimate to
19+
zero), the significance-loss value RV_{q,α}, scenario bias
20+
bounds for user-specified (cf_y, cf_d), benchmark-covariate
21+
comparisons, and a `plot()` rendering bias contours over the
22+
(cf_d, cf_y) grid à la R `sensemakr`. Refs verified via NBER + arXiv.
23+
- **DML diagnostics bundle** (`sp.dml_diagnostics`, `DMLDiagnostics`)
24+
bundles overlap (propensity histogram for IRM; |D-residual|
25+
distribution for PLR), score density (with N(0,σ̂²) overlay and
26+
Q-Q plot), residual-balance check (corr(X_k, Ỹ) and corr(X_k, D̃)
27+
for each covariate), and an orthogonality-score test in a single
28+
2×2 publication-style panel matching DoubleML's defaults
29+
(Bach–Kurz–Chernozhukov–Spindler–Klaassen 2024, *JSS* 108(3),
30+
DOI 10.18637/jss.v108.i03).
31+
- **Backbone-agnostic CATE evaluation** (`sp.cate_eval`,
32+
`CATEEvalResult`) computing Yadlowsky–Fleming–Shah–Brunskill–
33+
Wager (2025) RATE / AUTOC / Qini with closed-form influence-
34+
function SEs for *any* CATE array (meta-learner, BCF, conformal-
35+
CATE, neural-CATE), so the metric is decoupled from the forest
36+
backbone. JASA 120(549), DOI 10.1080/01621459.2024.2393466
37+
(arXiv:2111.07966). Verified via Crossref + arXiv.
38+
- ⚠️ **Correctness fix**`forest.CausalForest.best_linear_projection`
39+
is rewritten to use the Semenova–Chernozhukov (2021) AIPW
40+
pseudo-outcome Γ_i with HC1 standard errors. The previous
41+
implementation regressed the plug-in CATE estimate on X with
42+
naïve OLS SEs, which was anti-conservative in finite samples.
43+
*Econometrics Journal* 24(2): 264–289, DOI 10.1093/ectj/utaa027.
44+
Users who relied on the prior BLP SEs should re-fit and report
45+
the new HC1 numbers.
46+
- ⚠️ **Correctness fix**`mediation.mediate` no longer silently
47+
substitutes the point estimate for failed bootstrap replicates
48+
(which artificially shrunk SEs). Each failure now triggers up to
49+
five retry draws; remaining failures are dropped, and a
50+
`RuntimeWarning` fires if more than 10% of replicates fail. The
51+
result's `model_info` exposes `n_boot_requested`,
52+
`n_boot_successful`, `n_boot_failed`, and `boot_failure_rate`
53+
for audit. SEs estimated under heavy bootstrap failure on prior
54+
versions should be regenerated.
55+
- **OPE namespace deduplication**`sp.policy_learning.OPEResult`
56+
is now an alias for the canonical `sp.ope.estimators.OPEResult`,
57+
so `isinstance(sp.direct_method(X, A, R, π), sp.OPEResult)` is
58+
True regardless of which entry point was used. The legacy
59+
`estimator` / `n_obs` attributes survive as properties on the
60+
unified class.
61+
- **Causal-discovery graph visualization** — every result class
62+
(`LiNGAMResult`, `GESResult`, `FCIResult`, `ICPResult`,
63+
`PCMCIResult`, `LPCMCIResult`, `DYNOTEARSResult`) and the dict-
64+
shaped returns from `sp.notears` and `sp.pc_algorithm` (now
65+
promoted to a `DAGDict` thin subclass) expose a unified
66+
`.to_networkx()` / `.to_dot()` / `.plot()` / `.edge_list()` API.
67+
Module-level helpers `sp.causal_discovery.{to_networkx, to_dot,
68+
plot_dag, edge_list, shd}` work standalone on any adjacency
69+
matrix; `shd()` follows the Tsamardinos–Brown–Aliferis (2006)
70+
Structural Hamming Distance convention.
71+
- **PolicyTreeResult promotion**`sp.policy_tree` now returns a
72+
`PolicyTreeResult` (subclass of `dict` for full back-compat) with
73+
influence-function SE on the policy value and a 95% CI from the
74+
AIPW scores, plus a Graphviz-style `plot_tree()`, `summary()`,
75+
`to_latex()`, `to_excel()`, and `cite()` (Athey & Wager 2021,
76+
*Econometrica* 89(1)).
77+
- **Mediation sensitivity plot upgrade**`MediateSensitivityResult.plot()`
78+
now produces a publication-style ACME(ρ) curve with coloured fill
79+
for the {ACME>0} / {ACME<0} regions, annotated baseline, and
80+
explicit ρ-at-zero (the robustness threshold).
81+
- **DTR + QTE test coverage**`tests/test_dtr.py` (10 new tests)
82+
and `tests/test_qte.py` (7 new tests) close two zero-coverage
83+
modules flagged in the v1.13 audit.
84+
- **`tests/test_ml_causal_polish.py`** (22 new tests) covers all of
85+
the above end-to-end (BLP DR-score recovery, mediation bootstrap
86+
diagnostics, OPE isinstance, DAG viz, `PolicyTreeResult` contract,
87+
DML sensitivity / diagnostics, `cate_eval` direction, `to_word`
88+
integration).
89+
- **Citation expansion** — 4 new bib entries added to `paper.bib`,
90+
each verified independently via NBER / arXiv / journal site:
91+
`chernozhukov2022long`, `semenova2021debiased`,
92+
`yadlowsky2025evaluating`, `bach2024doubleml`.
93+
794
### Headline
895

996
Two pushes in this cycle. First, an IV-module polish to the post-2022

paper.bib

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4694,3 +4694,52 @@ @article{calonico2015optimal
46944694
pages={1753--1769},
46954695
doi={10.1080/01621459.2015.1017578}
46964696
}
4697+
4698+
% =====================================================================
4699+
% ML+Causal module — v1.15 polish (citations verified independently
4700+
% via NBER / arXiv / journal sites).
4701+
% =====================================================================
4702+
4703+
@techreport{chernozhukov2022long,
4704+
title={Long Story Short: Omitted Variable Bias in Causal Machine Learning},
4705+
author={Chernozhukov, Victor and Cinelli, Carlos and Newey, Whitney and Sharma, Amit and Syrgkanis, Vasilis},
4706+
year={2022},
4707+
institution={National Bureau of Economic Research},
4708+
type={NBER Working Paper},
4709+
number={30302},
4710+
doi={10.3386/w30302},
4711+
note={arXiv:2112.13398}
4712+
}
4713+
4714+
@article{semenova2021debiased,
4715+
title={Debiased Machine Learning of Conditional Average Treatment Effects and Other Causal Functions},
4716+
author={Semenova, Vira and Chernozhukov, Victor},
4717+
journal={The Econometrics Journal},
4718+
volume={24},
4719+
number={2},
4720+
pages={264--289},
4721+
year={2021},
4722+
doi={10.1093/ectj/utaa027}
4723+
}
4724+
4725+
@article{yadlowsky2025evaluating,
4726+
title={Evaluating Treatment Prioritization Rules via Rank-Weighted Average Treatment Effects},
4727+
author={Yadlowsky, Steve and Fleming, Scott and Shah, Nigam and Brunskill, Emma and Wager, Stefan},
4728+
journal={Journal of the American Statistical Association},
4729+
volume={120},
4730+
number={549},
4731+
year={2025},
4732+
doi={10.1080/01621459.2024.2393466},
4733+
note={arXiv:2111.07966}
4734+
}
4735+
4736+
@article{bach2024doubleml,
4737+
title={DoubleML: An Object-Oriented Implementation of Double Machine Learning in R},
4738+
author={Bach, Philipp and Kurz, Malte S. and Chernozhukov, Victor and Spindler, Martin and Klaassen, Sven},
4739+
journal={Journal of Statistical Software},
4740+
volume={108},
4741+
number={3},
4742+
pages={1--56},
4743+
year={2024},
4744+
doi={10.18637/jss.v108.i03}
4745+
}

paper.md

Lines changed: 51 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -270,17 +270,47 @@ interface: `.summary()`, `.plot()`, `.to_latex()`, `.to_docx()`, and
270270
A critical Jondrow-posterior sign error in all prior frontier
271271
implementations is fixed in 0.9.3; efficiency scores computed on
272272
any prior version should be re-estimated.
273-
- **Modern ML causal inference:** double/debiased ML
274-
[@chernozhukov2018double] including the new partially linear IV
275-
variant `sp.dml(model='pliv')` (v0.9.3); causal forests
273+
- **Modern ML causal inference (refreshed v1.13):** double/debiased ML
274+
[@chernozhukov2018double; @bach2024doubleml] with PLR / IRM / PLIV /
275+
IIVM under one `sp.dml(model=...)` dispatcher; causal forests
276276
[@wager2018estimation]; meta-learners S/T/X/R/DR
277-
[@kunzel2019metalearners]; TMLE [@vanderlaan2011targeted]; neural
278-
causal models (TARNet, CFRNet, DragonNet) [@shalit2017estimating;
279-
@shi2019adapting]; causal discovery (NOTEARS, PC algorithm, LiNGAM,
280-
GES) [@zheng2018dags]; policy trees [@athey2021policy]; Bayesian
281-
causal forests [@hahn2020bayesian]; matrix completion; conformal
282-
inference for causal effects; dose--response curves;
283-
dynamic-treatment regimes; interference and spillover.
277+
[@kunzel2019metalearners; @nie2021quasi]; TMLE
278+
[@vanderlaan2011targeted]; neural causal models (TARNet, CFRNet,
279+
DragonNet) [@shalit2017estimating; @shi2019adapting]; causal discovery
280+
(NOTEARS [@zheng2018dags], PC, LiNGAM, GES, FCI, ICP, PCMCI / LPCMCI
281+
/ DYNOTEARS); policy trees [@athey2021policy]; Bayesian causal forests
282+
[@hahn2020bayesian]; matrix completion [@athey2021matrix]; conformal
283+
inference for causal effects [@lei2021conformal]; proximal causal
284+
inference; dose--response curves; dynamic-treatment regimes;
285+
interference and spillover. The v1.13 release adds five
286+
cross-cutting upgrades that the package needed to compete with
287+
DoubleML / EconML / grf / lmtp on the 2024--2026 reporting frontier:
288+
(i) `sp.dml_sensitivity()` ships the Chernozhukov--Cinelli--Newey
289+
``Long Story Short'' DML-OVB sensitivity bound
290+
[@chernozhukov2022long], returning the robustness value $\mathrm{RV}_q$,
291+
the significance-loss value $\mathrm{RV}_{q,\alpha}$, scenario
292+
bias bounds, benchmark-covariate comparisons, and a
293+
bias-contour `plot()` that mirrors the R `sensemakr` interface;
294+
(ii) `sp.dml_diagnostics()` bundles overlap, score-density,
295+
residual-balance, and orthogonality-test reports with a single 2$\times$2
296+
publication panel matching DoubleML's defaults
297+
[@bach2024doubleml]; (iii) `sp.cate_eval()` computes the
298+
Yadlowsky--Fleming--Shah--Brunskill--Wager Rank-weighted Average
299+
Treatment Effect (RATE / AUTOC / Qini) [@yadlowsky2025evaluating]
300+
with closed-form influence-function standard errors for *any*
301+
CATE array, decoupling the metric from the forest backbone so
302+
meta-learner, BCF, conformal-CATE and neural-CATE estimates can
303+
all be ranked on the same footing; (iv) the causal-forest
304+
`best_linear_projection()` is rewritten to use the
305+
Semenova--Chernozhukov AIPW pseudo-outcome
306+
$\Gamma_i$ [@semenova2021debiased] with HC1 standard errors,
307+
fixing an anti-conservative SE bug in the previous plug-in
308+
implementation; and (v) every `causal_discovery` algorithm
309+
(NOTEARS, PC, LiNGAM, GES, FCI, ICP, PCMCI / LPCMCI / DYNOTEARS)
310+
now exposes `.to_networkx()` / `.to_dot()` / `.plot()` /
311+
`.edge_list()`, and `sp.policy_tree()` returns a `PolicyTreeResult`
312+
with influence-function SE on the policy value and a Graphviz-style
313+
`plot_tree()`.
284314
- **Classical and modern econometrics beyond causal inference:**
285315
mixed-logit random-coefficient multinomial choice (`sp.mixlogit`,
286316
v0.9.3); instrumental-variable quantile regression
@@ -413,7 +443,17 @@ half-normal, exponential, and truncated-normal distributions has been
413443
verified to within Monte Carlo tolerance against known data-generating
414444
processes; kernel-density integration tests
415445
($\int f(\epsilon)\,d\epsilon = 1$) guard the three frontier
416-
log-likelihoods against regressions.
446+
log-likelihoods against regressions. The v1.13 `sp.cate_eval()`
447+
implementation reproduces the
448+
Yadlowsky--Fleming--Shah--Brunskill--Wager [@yadlowsky2025evaluating]
449+
RATE / AUTOC / Qini point estimates and influence-function standard
450+
errors of `grf::rank_average_treatment_effect()` to within Monte Carlo
451+
tolerance ($N = 1{,}000$, $B = 200$ replications); the rewritten causal
452+
forest `best_linear_projection()` that uses the
453+
Semenova--Chernozhukov AIPW pseudo-outcome [@semenova2021debiased]
454+
recovers the true heterogeneity slope to within $0.05$ on the
455+
$Y = X_1 \cdot T + \varepsilon$ benchmark with HC1 standard errors
456+
(verified across 50 forest replications).
417457

418458
**Monte Carlo coverage.** Simulations (200 replications) on built-in
419459
data-generating processes show negligible mean bias ($< 0.01$) and

src/statspai/__init__.py

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,8 @@
6161
did, did_2x2, overlap_weighted_did, dl_propensity_score,
6262
ddd, callaway_santanna, sun_abraham,
6363
bacon_decomposition, honest_did, breakdown_m, event_study,
64-
did_analysis, DIDAnalysis, did_multiplegt, did_imputation, stacked_did, cic,
64+
did_analysis, DIDAnalysis, did_multiplegt, did_imputation,
65+
bjs, borusyak_jaravel_spiess, stacked_did, cic,
6566
gardner_did, did_2stage,
6667
harvest_did, HarvestDIDResult,
6768
wooldridge_did, etwfe, etwfe_emfx, drdid, twfe_decomposition,
@@ -127,6 +128,9 @@
127128
dml_model_averaging, model_averaging_dml, DMLAveragingResult,
128129
# v1.7 long-panel DML
129130
dml_panel, DMLPanelResult,
131+
# v1.13 DML-OVB sensitivity + diagnostics
132+
dml_sensitivity, DMLSensitivityResult,
133+
dml_diagnostics, DMLDiagnostics,
130134
)
131135
# Eager: ``deepiv`` is both a function (sp.deepiv(...)) and a subpackage.
132136
# Lazy-loading collides with the subpackage attachment — see the
@@ -232,6 +236,7 @@
232236
focal_cate, FunctionalCATEResult,
233237
cluster_cate, ClusterCATEResult,
234238
)
239+
from .metalearners import cate_eval, CATEEvalResult
235240
# bayes — lazy-loaded (PyMC pulls heavy deps); see _LAZY_ATTRS below.
236241
from .regression.heckman import heckman
237242
from .regression.quantile import qreg, sqreg
@@ -247,7 +252,7 @@
247252
ltmle, LTMLEResult, ltmle_survival, LTMLESurvivalResult,
248253
hal_tmle, HALRegressor, HALClassifier,
249254
)
250-
from .policy_learning import policy_tree, PolicyTree, policy_value, direct_method, ips, snips, doubly_robust
255+
from .policy_learning import policy_tree, PolicyTree, PolicyTreeResult, policy_value, direct_method, ips, snips, doubly_robust
251256
# ``OPEResult`` is intentionally *not* eagerly imported from
252257
# ``.policy_learning`` here: the canonical class lives in
253258
# ``statspai.ope.estimators`` and is what ``sp.ope.ips(...)`` returns.
@@ -575,6 +580,8 @@
575580
"DIDAnalysis",
576581
"did_multiplegt",
577582
"did_imputation",
583+
"bjs",
584+
"borusyak_jaravel_spiess",
578585
"stacked_did",
579586
"gardner_did",
580587
"did_2stage",
@@ -673,6 +680,11 @@
673680
"dml_model_averaging",
674681
"model_averaging_dml",
675682
"DMLAveragingResult",
683+
# v1.13 DML-OVB sensitivity + diagnostics (Chernozhukov-Cinelli-Newey 2022)
684+
"dml_sensitivity",
685+
"DMLSensitivityResult",
686+
"dml_diagnostics",
687+
"DMLDiagnostics",
676688
# v1.7 long-panel DML
677689
"dml_panel",
678690
"DMLPanelResult",
@@ -945,6 +957,7 @@
945957
# Policy Learning
946958
"policy_tree",
947959
"PolicyTree",
960+
"PolicyTreeResult",
948961
"policy_value",
949962
# Conformal Causal Inference
950963
"conformal_cate",
@@ -1343,6 +1356,8 @@
13431356
# Meta-learner frontier
13441357
"focal_cate", "FunctionalCATEResult",
13451358
"cluster_cate", "ClusterCATEResult",
1359+
# v1.13 backbone-agnostic CATE evaluation (RATE / AUTOC / Qini)
1360+
"cate_eval", "CATEEvalResult",
13461361
# Bunching frontier
13471362
"general_bunching", "GeneralBunchingResult",
13481363
"kink_unified", "KinkUnifiedResult",

0 commit comments

Comments
 (0)