Skip to content

Commit 3285e38

Browse files
fix(joss): clean review follow-ups
1 parent 476d6f9 commit 3285e38

9 files changed

Lines changed: 73 additions & 65 deletions

File tree

CHANGELOG.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,15 @@ All notable changes to StatsPAI will be documented in this file.
44

55
## [Unreleased]
66

7+
### Fixed
8+
9+
- Cleaned up JOSS review follow-ups: removed two uncited duplicate BibTeX
10+
entries that caused editorialbot DOI suggestions, aligned the AKM
11+
shift-share citation key / DOI metadata, and refreshed v1.15.6 wording in
12+
reviewer-facing docs and README release callouts.
13+
- `tools/audit_citations.py` now treats transient HTTP/socket/SSL timeouts as
14+
unresolved citation lookups instead of leaking Python tracebacks.
15+
716
## [1.15.6] — 2026-05-24
817

918
### Changed — Co-authorship, JOSS submission readiness

README.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -126,13 +126,12 @@ StatsPAI's focus is **causal inference** — and on this axis we aim to be the m
126126

127127
**StatsPAI at a glance**: 1,018 registered functions in the live agent registry · 80 submodules · ~249k LOC (core) + ~86k LOC (tests). All four numbers are reproducible from the canonical generator (`python scripts/registry_stats.py`); the per-module table in [`docs/stats.md`](docs/stats.md) is regenerated from the same script. For the full coverage matrix (23 method families) and cross-ecosystem line-count comparison, see [`docs/stats.md`](docs/stats.md).
128128

129-
**📦 v1.15.5 (2026-05-21) — Agent-card coverage ratchet**
129+
**📦 v1.15.6 (2026-05-24) — JOSS readiness and citation metadata**
130130

131-
StatsPAI now ships a CI-ratcheted agent-card coverage audit, generated
132-
baseline cards for the 1,018-function registry, and inherited agent-card
133-
metadata for canonical estimator variants. This release is registry /
134-
metadata infrastructure: estimator numerical paths are unchanged. Full
135-
notes in [`CHANGELOG.md`](CHANGELOG.md) under `[1.15.5]`.
131+
StatsPAI now ships updated co-author and citation metadata, a JOSS reviewer
132+
guide, a validation dossier, and paper/reference fixes for the 1,018-function
133+
registry. Estimator numerical paths are unchanged. Full notes in
134+
[`CHANGELOG.md`](CHANGELOG.md) under `[1.15.6]`.
136135

137136
---
138137

README_CN.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -48,13 +48,12 @@ StatsPAI 聚焦**因果推断**——在这条主线上,我们的目标是成
4848

4949
**StatsPAI 一句话概览**:live agent registry 中有 1,018 个注册函数 · 80 个子模块 · ~249k 行核心代码 + ~86k 行测试。这四个数字都可以由唯一的生成器 (`python scripts/registry_stats.py`) 现场复算;[`docs/stats.md`](docs/stats.md) 中的按模块拆分表也由同一个脚本回写。完整覆盖矩阵(23 个方法家族)以及跨生态行数对比,详见 [`docs/stats.md`](docs/stats.md)
5050

51-
**📦 v1.15.5(2026-05-21)— Agent-card 覆盖率 ratchet**
51+
**📦 v1.15.6(2026-05-24)— JOSS 准备与引用元数据**
5252

53-
StatsPAI 现在带有 CI ratchet 的 agent-card 覆盖率审计、面向
54-
1,018 个注册函数的自动 baseline cards,以及 canonical 估计器变体的
55-
继承式 agent-card 元数据。本版本是 registry / metadata 基础设施更新:
53+
StatsPAI 现在带有更新后的共同作者与引用元数据、JOSS reviewer guide、
54+
validation dossier,以及面向 1,018 个注册函数的论文 / reference 修复。
5655
估计器数值路径不变。完整发布说明见 [`CHANGELOG.md`](CHANGELOG.md)
57-
`[1.15.5]`
56+
`[1.15.6]`
5857

5958
---
6059

docs/index.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,9 @@ mixed-effects, modern ML causal inference, the full three-school
1111
modules (bridging theorems, fairness, surrogates, PCMCI, TMLE survival,
1212
etc.), and publication-ready output in Word / Excel / LaTeX / HTML.
1313

14-
> **Current release: v1.15.5 (2026-05-21)**Agent-card coverage
15-
> ratchet, generated baseline cards, inherited estimator-family metadata,
16-
> and refreshed 1,018-function registry statistics. See the
14+
> **Current release: v1.15.6 (2026-05-24)**JOSS readiness update:
15+
> co-author and citation metadata, reviewer guide, validation dossier,
16+
> and paper/reference fixes for the 1,018-function registry. See the
1717
> [changelog](changelog.md) for detail.
1818
1919
```python

docs/joss_validation_dossier.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,14 +9,14 @@ research software. It is intentionally factual and reproducible.
99
- Package archive: <https://doi.org/10.5281/zenodo.19933900>
1010
- PyPI: <https://pypi.org/project/StatsPAI/>
1111
- License: MIT, with a plain-text `LICENSE` file in the repository.
12-
- Current release at the time of this dossier: `1.15.5`, released on
13-
2026-05-21.
12+
- Current release at the time of this dossier: `1.15.6`, released on
13+
2026-05-24.
1414
- Public GitHub repository creation date: 2025-07-26.
1515

1616
## Software Scope
1717

1818
StatsPAI exposes a unified Python interface for causal inference and applied
19-
econometrics. As of release `1.15.5`, the registry reports 1,018 public
19+
econometrics. As of release `1.15.6`, the registry reports 1,018 public
2020
functions across 80 submodules:
2121

2222
```bash
@@ -100,4 +100,3 @@ python -m twine check dist/*
100100

101101
For a shorter package-level check, use the reviewer guide in
102102
`docs/joss_reviewer_guide.md`.
103-

paper.bib

Lines changed: 0 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -3192,16 +3192,6 @@ @misc{did_misclassified2025
31923192
doi={10.48550/arXiv.2507.20415}
31933193
}
31943194

3195-
@article{adao2019shift,
3196-
title={Shift-Share Designs: Theory and Inference},
3197-
author={Ad{\\~a}o, Rodrigo and Koles{\\'a}r, Michal and Morales, Eduardo},
3198-
journal={Quarterly Journal of Economics},
3199-
volume={134},
3200-
number={4},
3201-
pages={1949--2010},
3202-
year={2019}
3203-
}
3204-
32053195
@article{santanna2020doubly,
32063196
title={Doubly Robust Difference-in-Differences Estimators},
32073197
author={Sant'Anna, Pedro H.C. and Zhao, Jun},
@@ -4278,13 +4268,6 @@ @article{runge2020discovering
42784268
doi={10.48550/arXiv.2003.03685}
42794269
}
42804270

4281-
@article{schwab2014ltmle,
4282-
title={{ltmle}: Longitudinal Targeted Maximum Likelihood Estimation},
4283-
author={Schwab, Joshua and Lendle, Samuel and Petersen, Maya and van der Laan, Mark J.},
4284-
journal={R package and accompanying working paper},
4285-
year={2014}
4286-
}
4287-
42884271
@inproceedings{tran2023inferring,
42894272
title={Inferring the Long-Term Causal Effects of Long-Term Treatments from Short-Term Experiments},
42904273
author={Tran, Allen and Bibaut, Aur{\'e}lien and Kallus, Nathan},

src/statspai/bartik/adao_correction.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -436,14 +436,15 @@ def shift_share_se(
436436

437437
# Register citation
438438
CausalResult._CITATIONS["adao_correction"] = (
439-
"@article{adao2019shift,\n"
439+
"@article{ado2019shift,\n"
440440
" title={Shift-Share Designs: Theory and Inference},\n"
441441
" author={Ad{\\~a}o, Rodrigo and Koles{\\'a}r, Michal and "
442442
"Morales, Eduardo},\n"
443443
" journal={Quarterly Journal of Economics},\n"
444444
" volume={134},\n"
445445
" number={4},\n"
446446
" pages={1949--2010},\n"
447-
" year={2019}\n"
447+
" year={2019},\n"
448+
" doi={10.1093/qje/qjz025}\n"
448449
"}"
449450
)

tests/test_audit_citations.py

Lines changed: 28 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,31 @@ def test_doi_regex_stops_at_sentence_period():
130130
assert m.group("id") == "10.1234/xyz"
131131

132132

133+
# ---------------------------------------------------------------------------
134+
# Network failures — must degrade to unresolved, not traceback
135+
# ---------------------------------------------------------------------------
136+
137+
138+
def test_verify_crossref_timeout_is_soft_failure(monkeypatch, capsys):
139+
def boom(*args, **kwargs):
140+
raise TimeoutError("read operation timed out")
141+
142+
monkeypatch.setattr(ac, "_http_get", boom)
143+
144+
assert ac.verify_crossref(["10.1234/example"]) == {}
145+
assert "TimeoutError" in capsys.readouterr().err
146+
147+
148+
def test_verify_arxiv_timeout_is_soft_failure(monkeypatch, capsys):
149+
def boom(*args, **kwargs):
150+
raise TimeoutError("read operation timed out")
151+
152+
monkeypatch.setattr(ac, "_http_get", boom)
153+
154+
assert ac.verify_arxiv(["2408.12345"]) == {}
155+
assert "TimeoutError" in capsys.readouterr().err
156+
157+
133158
# ---------------------------------------------------------------------------
134159
# Name helpers
135160
# ---------------------------------------------------------------------------
@@ -373,8 +398,7 @@ def test_cli_runs_without_crash_on_empty_tree(tmp_path):
373398
not a crash).
374399
* ``2`` — soft failure (rate limit, network) — acceptable.
375400
376-
Any other exit (including a traceback in ``stderr``) is a real
377-
failure.
401+
Any other exit, and any traceback in ``stderr``, is a real failure.
378402
"""
379403
(tmp_path / "src").mkdir()
380404
(tmp_path / "docs").mkdir()
@@ -387,25 +411,9 @@ def test_cli_runs_without_crash_on_empty_tree(tmp_path):
387411
capture_output=True, text=True, check=False,
388412
cwd=tmp_path,
389413
)
390-
# Traceback is acceptable IFF it terminates in a known transient
391-
# network failure (the auditor calls arXiv / Crossref / DataCite
392-
# live and a cold CI runner can hit HTTP 429 or socket-level
393-
# timeouts even on a no-citation tree, because REPO_ROOT scanning
394-
# still picks up the real repo's src/ and docs/ before the empty
395-
# tmp_path overrides reach it). A traceback whose tail names
396-
# something other than these classes is a real auditor crash.
397-
_known_transient_errors = (
398-
"TimeoutError", "socket.timeout",
399-
"URLError", "RemoteDisconnected",
400-
"ConnectionResetError", "ConnectionAbortedError",
401-
"HTTPError: 429", "HTTPError: 503",
402-
"HTTPError: 504", "ssl.SSLError",
414+
assert "Traceback" not in result.stderr, (
415+
f"auditor crashed with traceback:\n{result.stderr}"
403416
)
404-
if "Traceback" in result.stderr:
405-
assert any(err in result.stderr for err in _known_transient_errors), (
406-
f"auditor crashed with traceback (not a known transient "
407-
f"network error):\n{result.stderr}"
408-
)
409417
assert result.returncode in (0, 1, 2), (
410418
f"unexpected exit {result.returncode}: {result.stderr}"
411419
)

tools/audit_citations.py

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -34,9 +34,11 @@
3434

3535
import argparse
3636
import hashlib
37+
import http.client
3738
import html
3839
import json
3940
import re
41+
import socket
4042
import ssl
4143
import sys
4244
import time
@@ -67,6 +69,14 @@
6769
DEFAULT_ROOTS = ("src", "docs")
6870
DEFAULT_OUT = REPO_ROOT / "audit_report.md"
6971
USER_AGENT = "statspai-citation-audit/1.0 (mailto:brycew6m@stanford.edu)"
72+
_TRANSIENT_NETWORK_ERRORS = (
73+
urllib.error.URLError,
74+
TimeoutError,
75+
socket.timeout,
76+
ConnectionError,
77+
http.client.HTTPException,
78+
ssl.SSLError,
79+
)
7080

7181
ARXIV_RE = re.compile(
7282
r"""
@@ -437,8 +447,8 @@ def verify_arxiv(ids: list[str], refresh: bool = False) -> dict[str, PaperMeta]:
437447
)
438448
try:
439449
xml_bytes = _http_get(url, refresh=refresh, sleep=3.0)
440-
except urllib.error.URLError as e:
441-
print(f"[arxiv] HTTP error {e!r} for chunk starting {chunk[0]}",
450+
except _TRANSIENT_NETWORK_ERRORS as e:
451+
print(f"[arxiv] HTTP/network error {e!r} for chunk starting {chunk[0]}",
442452
file=sys.stderr)
443453
continue
444454
root = ET.fromstring(xml_bytes)
@@ -491,8 +501,8 @@ def verify_nber(ids: list[str], refresh: bool = False) -> dict[str, PaperMeta]:
491501
html_text = _http_get(url, refresh=refresh, sleep=1.0).decode(
492502
"utf-8", errors="replace"
493503
)
494-
except urllib.error.URLError as e:
495-
print(f"[nber] HTTP error {e!r} for w{wp}", file=sys.stderr)
504+
except _TRANSIENT_NETWORK_ERRORS as e:
505+
print(f"[nber] HTTP/network error {e!r} for w{wp}", file=sys.stderr)
496506
continue
497507
authors: list[str] = []
498508
title = ""
@@ -530,8 +540,8 @@ def _verify_datacite_one(doi: str, refresh: bool = False) -> Optional[PaperMeta]
530540
url = f"https://api.datacite.org/dois/{urllib.parse.quote(doi, safe='')}"
531541
try:
532542
raw = _http_get(url, refresh=refresh, sleep=0.5)
533-
except urllib.error.URLError as e:
534-
print(f"[datacite] HTTP error {e!r} for {doi}", file=sys.stderr)
543+
except _TRANSIENT_NETWORK_ERRORS as e:
544+
print(f"[datacite] HTTP/network error {e!r} for {doi}", file=sys.stderr)
535545
return None
536546
try:
537547
obj = json.loads(raw)
@@ -570,8 +580,8 @@ def verify_crossref(dois: list[str], refresh: bool = False) -> dict[str, PaperMe
570580
crossref_404 = True
571581
else:
572582
continue
573-
except urllib.error.URLError as e:
574-
print(f"[crossref] HTTP error {e!r} for {doi}", file=sys.stderr)
583+
except _TRANSIENT_NETWORK_ERRORS as e:
584+
print(f"[crossref] HTTP/network error {e!r} for {doi}", file=sys.stderr)
575585
continue
576586
if crossref_404:
577587
# Fall through to DataCite — Zenodo / Figshare / Dryad

0 commit comments

Comments
 (0)