Skip to content

Commit 4eb680f

Browse files
chore(release): cut v1.13.1
Bump version 1.14.0 → 1.13.1 and consolidate the [Unreleased] + [1.14.0] + [1.13.0] CHANGELOG sections into a single [1.13.1] — 2026-05-05 entry. Reason: TestPyPI's 1.13.0 slot was occupied by the earlier untagged "cut v1.13.0" build, so we ship 1.13.1 to keep TestPyPI and PyPI artifacts in lockstep. PyPI 1.13.1 will be the first published 1.13.x with the stability tiers + R/Stata parity dossier + cold-start sklearn surgery + weak-IV preflight gate + CS-DiD REG IF-scaling correctness fix all bundled. README headlines, bibtex version, parity-table captions, and the re-pinned DID numerical fixtures' history notes are aligned to 1.13.1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent a1e52a6 commit 4eb680f

10 files changed

Lines changed: 235 additions & 219 deletions

File tree

CHANGELOG.md

Lines changed: 178 additions & 189 deletions
Large diffs are not rendered by default.

README.md

Lines changed: 20 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -124,10 +124,24 @@ StatsPAI's focus is **causal inference** — and on this axis we aim to be the m
124124

125125
---
126126

127-
**📦 v1.14.0 (2026-05-04) — External-validity dossier + cold-start surgery**
128-
129-
v1.14 ships a 36-module R parity harness (`tests/r_parity/`), a 21-module
130-
Stata parity harness (`tests/stata_parity/`), 4 canonical-dataset
127+
**📦 v1.13.1 (2026-05-05) — Stability tiers + external-validity dossier + cold-start surgery**
128+
129+
v1.13 stamps every `FunctionSpec` with a `stability` tier (`stable` /
130+
`experimental` / `deprecated`) plus per-function `limitations`,
131+
surfaced through `sp.describe_function`, `sp.list_functions(stability=...)`,
132+
the `statspai list` CLI, and the LLM-facing `sp.function_schema`;
133+
`sp.recommend` / `sp.causal` / `sp.paper` default to dropping
134+
`experimental` / `deprecated` entries unless `allow_experimental=True`
135+
is passed. Eight high-impact estimators (`aipw`, `aggte`,
136+
`pretrends_test`, `sensitivity_rr`, `mccrary_test`, `oster_bounds`,
137+
`wild_cluster_bootstrap`, `rd_honest`) are upgraded from
138+
auto-registered stubs to hand-written specs. A weak-instrument
139+
preflight gate in `sp.preflight(... "ivreg", formula=...)` flags
140+
first-stage F below the Staiger–Stock (1997) / Stock–Yogo (2005)
141+
thresholds, and `sp.recommend(... design='iv')` adaptively reorders
142+
LIML / AR ahead of 2SLS on weak first stages. v1.13 also ships a
143+
36-module R parity harness (`tests/r_parity/`), a 21-module Stata
144+
parity harness (`tests/stata_parity/`), 4 canonical-dataset
131145
original-paper replays (Card 1995, Callaway–Sant'Anna `mpdta`, Abadie
132146
Basque, LaLonde NSW + PSID-1 — all bit-equal to the published headline
133147
numbers), a Track-C performance harness (HDFE / CS-DiD / SCM / DML
@@ -145,7 +159,7 @@ submodules (down from 245). **⚠️ Correctness fix** —
145159
`sp.callaway_santanna(method='reg')` had a latent influence-function
146160
scaling bug; `'ipw'` and `'dr'` are unchanged but **re-run any
147161
v1.10–v1.13 CS-DiD analyses that used `method='reg'`**. Full notes in
148-
[`CHANGELOG.md`](CHANGELOG.md) under `[1.14.0]`.
162+
[`CHANGELOG.md`](CHANGELOG.md) under `[1.13.1]`.
149163

150164
**📦 v1.12.2 (2026-05-01) — ML routing for `sp.causal_question` + shared robustness battery + weighted PLIV/IIVM**
151165

@@ -1250,7 +1264,7 @@ resolves to the latest version):
12501264
author = {Wang, Biaoyue},
12511265
title = {StatsPAI: The Agent-Native Causal Inference \& Econometrics Toolkit for Python},
12521266
year = {2026},
1253-
version = {1.14.0},
1267+
version = {1.13.1},
12541268
doi = {10.5281/zenodo.19933900},
12551269
url = {https://doi.org/10.5281/zenodo.19933900},
12561270
license = {MIT},

README_CN.md

Lines changed: 22 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -46,13 +46,26 @@ StatsPAI 聚焦**因果推断**——在这条主线上,我们的目标是成
4646

4747
---
4848

49-
**📦 v1.14.0(2026-05-04)— 外部效度档案 + 冷启动手术**
50-
51-
新增 36 模块 R parity harness(`tests/r_parity/`)、21 模块 Stata
52-
parity harness(`tests/stata_parity/`)、4 数据集原始论文复算
53-
(Card / `mpdta` / Basque / LaLonde NSW + PSID-1,全部 bit-equal 命中
54-
发表数字)、Track-C 性能 harness(HDFE / CS-DiD / SCM / DML 的 log-log
55-
扩展)、`tests/coverage_monte_carlo/` 上 B=1000 的 95% CI 实证覆盖
49+
**📦 v1.13.1(2026-05-05)— 稳定性分级 + 外部效度档案 + 冷启动手术**
50+
51+
v1.13 给每个 `FunctionSpec` 打上 `stability` 标签
52+
`stable` / `experimental` / `deprecated`)以及函数级 `limitations`
53+
通过 `sp.describe_function` / `sp.list_functions(stability=...)` /
54+
`statspai list` CLI / `sp.function_schema` 的 LLM 描述全链路曝光;
55+
`sp.recommend` / `sp.causal` / `sp.paper` 默认丢弃 `experimental` /
56+
`deprecated` 条目,除非显式传 `allow_experimental=True``aipw` /
57+
`aggte` / `pretrends_test` / `sensitivity_rr` / `mccrary_test` /
58+
`oster_bounds` / `wild_cluster_bootstrap` / `rd_honest` 这 8 个高频
59+
估计器从 auto-registered 升级到 hand-written spec。`sp.preflight(...
60+
"ivreg", formula=...)` 增加弱工具变量预检关,第一阶段 F 跌破
61+
Staiger–Stock (1997) / Stock–Yogo (2005) 阈值时发结构化 warning;
62+
`sp.recommend(... design='iv')` 在弱第一阶段下自适应把 LIML / AR 排到
63+
2SLS 之前。同时新增 36 模块 R parity harness
64+
`tests/r_parity/`)、21 模块 Stata parity harness
65+
`tests/stata_parity/`)、4 数据集原始论文复算(Card / `mpdta` /
66+
Basque / LaLonde NSW + PSID-1,全部 bit-equal 命中发表数字)、Track-C
67+
性能 harness(HDFE / CS-DiD / SCM / DML 的 log-log 扩展)、
68+
`tests/coverage_monte_carlo/` 上 B=1000 的 95% CI 实证覆盖
5669
(OLS 0.952 / 2×2 DiD 0.955 / 强 IV 0.962,全部落在 99% Wilson 带
5770
[0.935, 0.967] 内),以及 900 trial 的 CausalAgentBench 提示套件
5871
(mock 模式已就绪,`--api` 一键开启)。新增三个顶层 meta API:
@@ -64,7 +77,7 @@ StatsPAI 的外部效度证据,无需离开 REPL。冷启动方面,`statspai
6477
statspai` 的 sklearn 子模块数从 245 → **0**。`sp.callaway_santanna(method='reg')`
6578
修复一个潜在的影响函数缩放错误(IPW / DR 路径不受影响)—— **请重跑
6679
v1.10–v1.13 期间使用 `method='reg'` 的 CS-DiD 分析**。完整发布说明见
67-
[`CHANGELOG.md`](CHANGELOG.md) `[1.14.0]`
80+
[`CHANGELOG.md`](CHANGELOG.md) `[1.13.1]`
6881

6982
**📦 v1.12.2(2026-05-01)— `sp.causal_question` ML 路由 + 共享稳健性 battery + 加权 PLIV/IIVM**
7083

@@ -515,7 +528,7 @@ sp.__citation__ # 与 sp.citation("bibtex") 等价
515528
author = {Wang, Biaoyue},
516529
title = {StatsPAI: The Agent-Native Causal Inference \& Econometrics Toolkit for Python},
517530
year = {2026},
518-
version = {1.14.0},
531+
version = {1.13.1},
519532
doi = {10.5281/zenodo.19933900},
520533
url = {https://doi.org/10.5281/zenodo.19933900},
521534
license = {MIT},

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "StatsPAI"
7-
version = "1.14.0"
7+
version = "1.13.1"
88
description = "The Agent-Native Causal Inference & Econometrics Toolkit for Python"
99
readme = "README.md"
1010
license = {text = "MIT"}

src/statspai/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
>>> sp.outreg2(result, filename="results.xlsx")
2323
"""
2424

25-
__version__ = "1.14.0"
25+
__version__ = "1.13.1"
2626
__author__ = "Biaoyue Wang"
2727
__email__ = "brycew6m@stanford.edu"
2828

tests/coverage_monte_carlo/FINDINGS.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -66,9 +66,9 @@ Findings interpretation:
6666
Stock–Yogo critical values, HC1 ignores the weak-instrument bias of
6767
2SLS, so CIs miss truth more often than nominal 0.95. Recovery
6868
routes for users are LIML (`method='liml'`) or Anderson–Rubin
69-
inference (`sp.iv(.., inference='ar')`); both are on the v1.14
70-
roadmap as automatic fall-backs in the design-detect / preflight
71-
pipeline.
69+
inference (`sp.iv(.., inference='ar')`); both are wired into the
70+
v1.13 design-detect / preflight pipeline as automatic fall-backs
71+
via the new `first_stage_strength` gate.
7272
- **CS-DiD passes the heterogeneity stress test.** This is the
7373
designed behaviour of Callaway–Sant'Anna 2021: cell-by-cell ATT(g, t)
7474
estimation with the simple-ATT aggregation as an equally-weighted

tests/r_parity/compare.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -529,7 +529,7 @@ def render_tex(modules: list[str]) -> str:
529529
"% AUTO-GENERATED by tests/r_parity/compare.py\n"
530530
"% Re-run after any module change to refresh.\n"
531531
"\\begin{longtable}{p{0.10\\linewidth}p{0.27\\linewidth}p{0.40\\linewidth}p{0.16\\linewidth}}\n"
532-
"\\caption{Track A parity headline at \\statspai{} 1.13.0 vs the "
532+
"\\caption{Track A parity headline at \\statspai{} 1.13.1 vs the "
533533
"canonical \\proglang{R} reference on the calibrated replicas. The "
534534
"``Worst diff'' column reports the worst residual gap across the "
535535
"module's headline rows (point estimates only; per-row SE diffs "
@@ -610,7 +610,7 @@ def render_tex_3way(modules: list[str]) -> str:
610610
"\\small\n"
611611
"\\setlength{\\tabcolsep}{2pt}\n"
612612
"\\begin{longtable}{@{}p{0.055\\linewidth}p{0.205\\linewidth}p{0.30\\linewidth}p{0.30\\linewidth}p{0.10\\linewidth}@{}}\n"
613-
"\\caption{Track A parity headline at \\statspai{} 1.13.0 against the canonical "
613+
"\\caption{Track A parity headline at \\statspai{} 1.13.1 against the canonical "
614614
"\\proglang{R} reference \\emph{and} (where one exists) the canonical \\proglang{Stata} "
615615
"reference, on the calibrated replicas. The ID column is the two-digit module prefix; "
616616
"the two diff columns report the worst residual "

tests/stata_parity/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ the reason and the 3-way table prints it explicitly:
7272
The remaining modules (23-36, minus 25/28/30) currently have the
7373
status "Stata harness not yet built": a Stata sibling is feasible
7474
(many of them — `stcox`, `melogit`, `var`, `lpirf`, `xtreg`,
75-
`sfpanel`, etc. — are reachable) but is outside the v1.13.0 scope.
75+
`sfpanel`, etc. — are reachable) but is outside the v1.13.1 scope.
7676

7777
## Running
7878

tests/test_cs_report_smoke.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -141,8 +141,8 @@ def test_breakdown_M_all_strictly_positive(demo_report):
141141
assert (demo_report.breakdown["breakdown_M_star"] > 0).all()
142142
# Most event times should remain robust at one SE on this DGP.
143143
# We allow at most one boundary event-time to fall short because
144-
# the v1.14 simple-ATT influence-function scaling fix
145-
# (CHANGELOG ## [1.14.0]) made the SEs larger and therefore
144+
# the v1.13 simple-ATT influence-function scaling fix
145+
# (CHANGELOG ## [1.13.1]) made the SEs larger and therefore
146146
# makes the m_star >= se criterion stricter. Pre-fix this
147147
# assertion was `.all()`; post-fix the right contract is
148148
# "essentially all".

tests/test_did_numerical_fixtures.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,9 @@
1010
The fixtures are checked to 4 decimal places so bit-level floating-point
1111
differences across BLAS backends do not cause spurious failures.
1212
13-
History note (CHANGELOG ``[1.14.0]``): the SEs in ``PINNED_ATT_GT``,
13+
History note (CHANGELOG ``[1.13.1]``): the SEs in ``PINNED_ATT_GT``,
1414
``PINNED_EVENT_STUDY``, and the overall-ATT SE in
15-
``test_overall_att_matches_pinned`` were re-pinned in v1.14 to absorb
15+
``test_overall_att_matches_pinned`` were re-pinned in v1.13 to absorb
1616
the simple-ATT influence-function scaling fix
1717
(``Fix CS-DiD parity inference``). Each group-time IF is now
1818
multiplied by ``n_total / n_relevant`` when embedded in the full unit
@@ -59,7 +59,7 @@ def cs_fixture():
5959

6060
# (group, time) -> (att, se). Generated from the current implementation.
6161
PINNED_ATT_GT = {
62-
# Re-pinned 2026-05-05 after the v1.14 simple-ATT IF-scaling fix.
62+
# Re-pinned 2026-05-05 after the v1.13 simple-ATT IF-scaling fix.
6363
# ATT point estimates unchanged; SEs grew by the
6464
# n_total/n_relevant correction.
6565
(3, 1): (-0.583666, 0.552161),
@@ -102,7 +102,7 @@ def test_att_gt_matches_pinned_values(cs_fixture):
102102

103103
def test_overall_att_matches_pinned(cs_fixture):
104104
assert cs_fixture.estimate == pytest.approx(1.282166, abs=1e-4)
105-
# SE re-pinned to 0.289142 (was 0.101724 pre-v1.14) following the
105+
# SE re-pinned to 0.289142 (was 0.101724 pre-v1.13) following the
106106
# simple-ATT IF-scaling fix; see module docstring.
107107
assert cs_fixture.se == pytest.approx(0.289142, abs=1e-4)
108108

@@ -112,7 +112,7 @@ def test_overall_att_matches_pinned(cs_fixture):
112112
# --------------------------------------------------------------------------- #
113113

114114
PINNED_EVENT_STUDY = {
115-
# Re-pinned 2026-05-05 after the v1.14 simple-ATT IF-scaling fix.
115+
# Re-pinned 2026-05-05 after the v1.13 simple-ATT IF-scaling fix.
116116
-6: (0.082153, 0.602161),
117117
-5: (0.284830, 0.536783),
118118
-4: (0.135512, 0.419463),

0 commit comments

Comments
 (0)