chore(release): bump to v1.15.2 — MCP strict-JSON + dual-track replicate + packaging hardening

brycewang-stanford · claude · brycewang-stanford · commit 34981eb3c4b0 · 2026-05-17T15:46:59.000-07:00
Patch release on top of v1.15.1 with no estimator numerical change.
Three independent hardening tracks land together:

- sp.agent.mcp_server is now strict-JSON-clean over the wire — native
  NaN / ±Infinity floats are walked to null before serialisation so
  RFC 8259 parsers (Claude Desktop included) never see a token they
  reject. Covered by tests/test_mcp_nan_inf.py.
- sp.replicate graduates four canonical replications — Card (1995),
  Abadie-Diamond-Hainmueller (2010), Lalonde (1986) / DW (1999), Lee
  (2008) — from single-track stubs to full classic + modern recipes
  on bundled real CSVs with pinned golden numbers.
- Release packaging tightens — wheel smoke tests fail loudly on
  ImportError, py.typed ships in the wheel, the result _repr_html_
  path escapes user-controlled cells (notebook XSS-safety), explicit
  formulaic dependency, modern PEP 639 license metadata, and a new
  [text] extra wires sentence-transformers for sp.causal_text.

Bumps pyproject.toml + __init__.py + CITATION.cff + docs/index.md
+ README BibTeX banner. README / README_CN / MIGRATION grow a
1.15.2 banner / migration section. CHANGELOG.md gains the [1.15.2]
entry.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/.gitignore b/.gitignore
@@ -191,6 +191,7 @@ audit_report.json
 # Paper JSS and JOSS submissions 
 Paper-JSS/
 Paper-JOSS/
+Paper-AgentBench/
 
 # Temp files
 MORNING_REPORT_2026-04-29.md
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,140 @@ All notable changes to StatsPAI will be documented in this file.
 
 ## [Unreleased]
 
+## [1.15.2] — 2026-05-17
+
+### Headline
+
+Patch release on top of v1.15.1. **No estimator numerical paths change.**
+Three independent hardening tracks land together:
+
+1. **Agent-native infrastructure** — `sp.agent.mcp_server` is now strict-
+   JSON-clean over the wire (no `NaN` / `Infinity` literals reaching
+   Claude Desktop / RFC 8259 parsers), agent schema metadata extraction
+   is more complete, and a new `text` extra makes
+   `sentence-transformers` an explicit opt-in for the v1.6 `causal_text`
+   surface instead of a soft import surprise.
+2. **`sp.replicate` dual-track** — Card (1995), Abadie-Diamond-
+   Hainmueller (2010), Lalonde (1986) / DW (1999), and Lee (2008)
+   replications are promoted from single-track stubs to full
+   **classic + modern** recipes that ship with the original public-
+   domain CSVs and pinned golden numbers.
+3. **Release packaging** — wheel smoke tests now **fail loudly** on
+   import error, `py.typed` ships in the wheel for downstream
+   `mypy --strict` consumers, the result `_repr_html_` path escapes
+   user-controlled cell content (notebook XSS-safety), and the
+   `formulaic` dependency that `sp.spatial` parses formulas with is
+   declared explicitly instead of relying on `linearmodels`'s
+   transitive resolution.
+
+### Added — `text` optional extra
+
+- New `[project.optional-dependencies] text = ["sentence-transformers>=2.2.0"]`
+  in [`pyproject.toml`](pyproject.toml). The v1.6 `sp.causal_text` MVP
+  used to lazy-import `sentence_transformers` and raise an opaque
+  `ImportError` on first call. `pip install statspai[text]` now wires
+  the dependency explicitly; the lazy import still triggers a clear
+  pointer to the extra when missing.
+
+### Added — `sp.replicate` dual-track guides for four canonical papers
+
+- **Card (1995)** — proximity-to-college IV for returns to schooling.
+- **Abadie, Diamond & Hainmueller (2010)** — California Proposition 99
+  synthetic-control.
+- **Lalonde (1986) / Dehejia-Wahba (1999)** — NSW + PSID-1 (MatchIt
+  subset, `n=614`) propensity-score matching.
+- **Lee (2008)** — US Senate RD (`n=1390`) bandwidth-selected jump.
+
+  Each entry now ships a **classic track** (the estimator the paper
+  used: 2SLS for Card, weighted synthetic-control for Abadie, naive
+  OLS / adjusted OLS / 1:1 NN PSM for Lalonde, local-linear CCT RD
+  for Lee) **and a modern track** (DML PLR + entropy balancing,
+  bias-corrected robust RD, multi-method synth-compare). Real CSVs
+  land under [`src/statspai/datasets/data/`](src/statspai/datasets/data/)
+  (`card_1995.csv`, `california_prop99.csv`, `lalonde_matchit.csv`,
+  `lee_2008_senate.csv`) and `sp.datasets.nsw_lalonde` /
+  `sp.datasets.lee_2008_senate` gain a `simulated=False` real-data
+  branch with published-paper benchmarks exposed via `df.attrs`. The
+  Lalonde classic track now reproduces DW (1999) Table 3-4 within a
+  $5 drift tolerance; the Lee CCT track returns Conv `7.414` and
+  bias-corrected robust `7.507` (SE `1.741`, `h=17.754`), matching the
+  R `rdrobust` reference. All 13 BibTeX keys cited across the four
+  entries are verified in [`paper.bib`](paper.bib).
+
+### Fixed — MCP wire format is strict-JSON-clean
+
+- [`sp.agent.mcp_server`](src/statspai/agent/mcp_server.py) used to
+  serialise responses with `json.dumps(..., default=_json_default)`,
+  which **does not** intercept native Python `float('nan')` /
+  `float('inf')` — those become the non-standard literals `NaN` /
+  `Infinity` in the JSON output, which RFC 8259 parsers (including
+  Claude Desktop's `JSON.parse`) reject with errors like
+  `"No number after minus sign"`. Responses now pass through
+  `_clean_floats` (recursively replaces `NaN` / `±Infinity` with
+  `null` across `dict` / `list` / `tuple` containers) and serialise
+  with `allow_nan=False`, so the server can never emit a JSON token a
+  strict parser refuses. Covered by 273-line regression suite
+  [`tests/test_mcp_nan_inf.py`](tests/test_mcp_nan_inf.py).
+- Agent schema metadata extraction (`sp.function_schema`,
+  `sp.describe_function`) now surfaces more signature detail for
+  registry entries built from auto-introspection.
+- Stability-tier audit (`scripts/stability_audit.py`) accounts for
+  evidence files more precisely; new
+  [`tests/test_agent_schema.py`](tests/test_agent_schema.py) locks
+  the schema metadata fields agents rely on.
+
+### Fixed — result HTML escaping (notebook XSS-safety)
+
+- [`CausalResult._repr_html_`](src/statspai/core/results.py) (and the
+  surrounding rich-display helpers) now route every user-derived cell
+  through `html.escape`. Previously, any string column whose contents
+  contained `<` / `>` / `&` / `"` would interpolate raw into the
+  rendered HTML, opening a path for notebook XSS when a result was
+  displayed in Jupyter / VS Code / nbviewer. New regression test:
+  [`tests/test_results_html_escape.py`](tests/test_results_html_escape.py).
+
+### Fixed — release packaging hygiene
+
+- `pyproject.toml` bumps the build requirement to
+  `setuptools>=77.0.0` and migrates `license = {text = "MIT"}` to the
+  modern PEP 639 `license = "MIT"` + `license-files = ["LICENSE"]`
+  pair. Drops the deprecated `License :: OSI Approved :: ...`
+  classifier path implicitly.
+- [`MANIFEST.in`](MANIFEST.in) now includes `src/statspai/py.typed`
+  and the sdist test fixtures so `pip install --no-binary :all:` and
+  `mypy --strict` both behave correctly on the published artifacts.
+- [`.github/workflows/build-wheels.yml`](.github/workflows/build-wheels.yml)
+  and [`.github/workflows/ci-cd.yml`](.github/workflows/ci-cd.yml):
+  wheel smoke tests now **fail the job** on `ImportError` instead of
+  swallowing it as a warning. Releases that silently ship a broken
+  wheel are no longer possible from a green CI run.
+- New explicit dependency `formulaic>=0.6.0` in `dependencies`
+  (`sp.spatial.*` parses Wilkinson formulas through it; relying on
+  `linearmodels`'s transitive resolution broke when downstream users
+  pinned older `linearmodels`).
+
+### Docs
+
+- JOSS submission [`paper.md`](paper.md) is rewritten for the Scott
+  Rozelle review pass — tighter scope statement, cleaner schema
+  description, explicit AI-use disclosure, 12 May 2026 submission
+  date. Cited bibliography entries in [`paper.bib`](paper.bib) are
+  refreshed to match.
+- README / README\_CN add the hero banner image
+  ([`docs/logo/readme-1.png`](docs/logo/readme-1.png)).
+- Track-C performance comparison table
+  ([`tests/perf/results/perf_table.tex`](tests/perf/results/perf_table.tex))
+  switches to `\scriptsize` with package-name macros and a
+  direction-aware "Winner" column; log-log figure regenerated to match.
+
+### Internal
+
+- Perf-benchmark harness factored — new
+  [`tests/perf/_common.py`](tests/perf/_common.py) shared utilities;
+  `tests/perf/05_feols_jax_bootstrap_bench.py` rewritten on top.
+- Full-suite validation snapshot refreshed in
+  [`test_results_full_suite.md`](test_results_full_suite.md).
+
 ## [1.15.1] — 2026-05-07
 
 ### Headline
diff --git a/CITATION.cff b/CITATION.cff
@@ -16,8 +16,8 @@ authors:
     given-names: "Biaoyue"
     affiliation: "Stanford REAP, Stanford University"
     email: "brycew6m@stanford.edu"
-version: "1.14.0"
-date-released: "2026-05-04"
+version: "1.15.2"
+date-released: "2026-05-17"
 license: MIT
 # Zenodo *concept* DOI — always resolves to the latest archived version.
 # For a specific-version DOI (e.g. for replication packages), follow the
diff --git a/MIGRATION.md b/MIGRATION.md
@@ -7,6 +7,43 @@ Internal version-to-version migrations are at the top; the long-form
 
 <a id="sp-rdrobust-bwselect-cct-r-parity-opt-in"></a>
 
+## v1.15.1 → v1.15.2 — strict-JSON MCP wire, dual-track replicate, packaging
+
+**No estimator numerical path changes.** Three classes of consumers
+should take note:
+
+- **`sp.agent.mcp_server` clients** (Claude Desktop / Codex / any
+  RFC 8259-strict JSON parser). v1.15.1 could leak the non-standard
+  literals `NaN` / `Infinity` / `-Infinity` into responses whenever an
+  estimator surfaced a degenerate float (`np.nan` standard errors on a
+  singular covariate, `inf` log-likelihood on a saturated model, etc).
+  v1.15.2 walks all containers before `json.dumps` and serialises with
+  `allow_nan=False`, replacing those values with `null`. **Action**:
+  none — strict parsers that previously failed now succeed; lenient
+  parsers see `null` where they used to see `NaN`. Update your
+  downstream JSON Schema if it explicitly typed those fields as
+  `number` (they should be `["number", "null"]`).
+
+- **`sp.causal_text` users.** The MVP relied on a soft import of
+  `sentence-transformers`. v1.15.2 adds an explicit
+  `pip install statspai[text]` extra. The lazy import path is
+  preserved, but the `ImportError` message now points at the extra
+  instead of suggesting a bare `pip install sentence-transformers`.
+
+- **`sp.replicate` users.** Entries for Card (1995), Abadie-Diamond-
+  Hainmueller (2010), Lalonde (1986) / DW (1999), and Lee (2008) now
+  return classic + modern recipes computed on the bundled real CSVs
+  instead of single-track simulated stubs. If you were pinning to the
+  v1.15.1 simulated numbers in CI, switch to the published-paper
+  benchmarks now exposed via `df.attrs['paper_original']` (see
+  `sp.datasets.nsw_lalonde(simulated=False)` and
+  `sp.datasets.lee_2008_senate(simulated=False)`).
+
+Existing `sp.rdrobust` / `sp.nbreg` / `sp.xtnbreg` / `sp.menbreg`
+call sites carry over unchanged from v1.15.1.
+
+---
+
 ## v1.15.0 → v1.15.1 — `sp.rdrobust(bwselect='cct')` R-parity opt-in
 
 **No breaking change.** `sp.rdrobust` keeps `bwselect='mserd'` (StatsPAI's
diff --git a/README.md b/README.md
@@ -128,6 +128,25 @@ StatsPAI's focus is **causal inference** — and on this axis we aim to be the m
 
 ---
 
+**📦 v1.15.2 (2026-05-17) — Strict-JSON MCP wire + dual-track replicate guides + release-packaging hardening**
+
+Patch release on top of v1.15.1 with **no estimator numerical change**.
+Three independent hardening tracks land together: (1) `sp.agent.mcp_server`
+now produces strict-JSON-clean output — native `NaN` / `±Infinity` floats
+are walked to `null` before serialisation so RFC 8259 parsers
+(including Claude Desktop) never see a token they reject; (2)
+`sp.replicate` graduates four canonical replications — Card (1995),
+Abadie-Diamond-Hainmueller (2010) California Prop 99, Lalonde (1986) /
+DW (1999), and Lee (2008) Senate RD — from single-track stubs to full
+**classic + modern** recipes on bundled real CSVs with pinned golden
+numbers; (3) release packaging tightens — wheel smoke tests fail loudly
+on `ImportError`, `py.typed` ships in the wheel, the result
+`_repr_html_` path escapes user-controlled cells (notebook XSS-safety),
+and a new `[text]` extra makes `sentence-transformers` an explicit
+opt-in for `sp.causal_text`. Install with
+`pip install --upgrade statspai`. Full notes in
+[`CHANGELOG.md`](CHANGELOG.md) under `[1.15.2]`.
+
 **📦 v1.15.1 (2026-05-07) — R-parity RD opt-in + negative-binomial implementation notes**
 
 Patch release preparing the PyPI cut after v1.15.0. `sp.rdrobust`
@@ -1344,7 +1363,7 @@ resolves to the latest version):
   author       = {Wang, Biaoyue},
   title        = {StatsPAI: The Agent-Native Causal Inference \& Econometrics Toolkit for Python},
   year         = {2026},
-  version      = {1.15.1},
+  version      = {1.15.2},
   doi          = {10.5281/zenodo.19933900},
   url          = {https://doi.org/10.5281/zenodo.19933900},
   license      = {MIT},
diff --git a/README_CN.md b/README_CN.md
@@ -50,6 +50,23 @@ StatsPAI 聚焦**因果推断**——在这条主线上，我们的目标是成
 
 ---
 
+**📦 v1.15.2（2026-05-17）— MCP 严格 JSON + Replicate 双轨指南 + 发布打包加固**
+
+v1.15.1 之上的 patch 发布，**估计器数值路径零变化**，三条独立的加固
+线一起落地：（1）`sp.agent.mcp_server` 现在产出严格合规的 JSON——
+原生 `NaN` / `±Infinity` 浮点会在序列化前递归替换为 `null`，让 RFC
+8259 严格解析器（Claude Desktop 在内）不再因为非标准 token 报错；
+（2）`sp.replicate` 把四篇经典论文复现——Card (1995)、Abadie-
+Diamond-Hainmueller (2010) 加州 Prop 99、Lalonde (1986) / DW
+(1999)、Lee (2008) 参议员 RD——从单轨 stub 升级为**经典轨 + 现代轨**
+完整菜谱，配套打包了原始公共域 CSV 和锁定的金标准数字；（3）发布
+打包收紧——wheel 冒烟测试在 `ImportError` 时**失败而不是吞错**，
+`py.typed` 标记打入 wheel，结果 `_repr_html_` 路径转义用户控制单元
+（notebook XSS 防御），新增 `[text]` extra 让 `sentence-transformers`
+成为 `sp.causal_text` 的显式可选依赖。升级命令：
+`pip install --upgrade statspai`。完整发布说明见
+[`CHANGELOG.md`](CHANGELOG.md) `[1.15.2]`。
+
 **📦 v1.15.1（2026-05-07）— RD 的 R 精确复现路径 + 负二项回归实现说明**
 
 这是 v1.15.0 之后的 patch 发布准备版。`sp.rdrobust` 新增
@@ -564,7 +581,7 @@ sp.__citation__                 # 与 sp.citation("bibtex") 等价
   author       = {Wang, Biaoyue},
   title        = {StatsPAI: The Agent-Native Causal Inference \& Econometrics Toolkit for Python},
   year         = {2026},
-  version      = {1.15.1},
+  version      = {1.15.2},
   doi          = {10.5281/zenodo.19933900},
   url          = {https://doi.org/10.5281/zenodo.19933900},
   license      = {MIT},
diff --git a/docs/index.md b/docs/index.md
@@ -171,7 +171,7 @@ method that returns the correct BibTeX entry) and this package:
   title   = {StatsPAI: A Unified, Agent-Native Python Toolkit for
              Causal Inference and Applied Econometrics},
   year    = {2026},
-  version = {1.15.1},
+  version = {1.15.2},
   url     = {https://github.com/brycewang-stanford/StatsPAI}
 }
 ```
diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name = "StatsPAI"
-version = "1.15.1"
+version = "1.15.2"
 description = "The Agent-Native Causal Inference & Econometrics Toolkit for Python"
 readme = "README.md"
 license = "MIT"
diff --git a/src/statspai/__init__.py b/src/statspai/__init__.py
@@ -22,7 +22,7 @@
 >>> sp.outreg2(result, filename="results.xlsx")
 """
 
-__version__ = "1.15.1"
+__version__ = "1.15.2"
 __author__ = "Biaoyue Wang"
 __email__ = "brycew6m@stanford.edu"
 

Original file line number	Diff line number	Diff line change
`@@ -171,7 +171,7 @@ method that returns the correct BibTeX entry) and this package:`
`171`	`171`	`title = {StatsPAI: A Unified, Agent-Native Python Toolkit for`
`172`	`172`	`Causal Inference and Applied Econometrics},`
`173`	`173`	`year = {2026},`
`174`		`- version = {1.15.1},`
	`174`	`+ version = {1.15.2},`
`175`	`175`	`url = {https://github.com/brycewang-stanford/StatsPAI}`
`176`	`176`	`}`
`177`	`177`	```