@@ -22,7 +22,7 @@ affiliations:
2222 ror : 00f54p054
2323 - name : StatsPAI Inc., United States
2424 index : 2
25- date : 9 May 2026
25+ date : 12 May 2026
2626bibliography : paper.bib
2727---
2828
@@ -34,21 +34,23 @@ for estimating, diagnosing, comparing, and reporting models that are
3434usually spread across many specialized packages or proprietary
3535statistical environments. The package currently exposes more than 950
3636registered functions across more than 80 submodules, covering classical
37- regression, instrumental variables , panel data, difference-in-differences,
37+ regression, instrumental variable analysis , panel data, difference-in-differences,
3838regression discontinuity, synthetic control, matching,
3939stochastic frontier analysis, mixed-effects models, decomposition
4040methods, sensitivity analysis, and modern machine-learning estimators
4141for heterogeneous treatment effects.
4242
43- The package is designed for policy evaluation, social science research,
44- and other empirical workflows where researchers must move between
45- research design, estimation, diagnostics, robustness checks, and
46- publication tables. A common result contract gives users ` .summary() ` ,
43+ The package is designed for policy evaluation, social science and
44+ public health research, and other empirical workflows where researchers
45+ must move between research design, estimation, diagnostics, robustness
46+ checks, and publication tables. A common result contract gives users ` .summary() ` ,
4747` .plot() ` , ` .to_latex() ` , ` .to_docx() ` , and ` .cite() ` methods where
4848appropriate. ` StatsPAI ` is also agent-native: registered functions
49- expose machine-readable schemas and structured failure metadata so that
50- LLM-driven research assistants can discover estimators, choose among
51- alternatives, and surface assumptions without parsing free-form prose.
49+ expose machine-readable schemas (structured descriptions of each
50+ function's arguments and outputs that programs can parse directly) and
51+ structured failure metadata so that LLM-driven research assistants can
52+ discover estimators, choose among alternatives, and surface assumptions
53+ without parsing free-form prose.
5254The source code is available at
5355[ https://github.com/brycewang-stanford/statspai ] ( https://github.com/brycewang-stanford/statspai ) .
5456
@@ -73,7 +75,7 @@ through estimation, robustness, and publication output.
7375` StatsPAI ` addresses this gap for graduate students, applied
7476economists, policy researchers, and data scientists who want a
7577Python-native workflow without giving up the breadth of Stata or the
76- methodological depth of R. Its goal is not to replace every specialized
78+ methodological depth of R. The goal of StatsPAI is not to replace every specialized
7779implementation. Instead, it provides a coherent empirical workspace:
7880shared formula conventions, common result objects, consistent export
7981methods, citations attached to estimators, and validation metadata that
@@ -124,8 +126,9 @@ across implementations.
124126The package is implemented mainly in Python on top of NumPy, SciPy,
125127Pandas, statsmodels, scikit-learn, and linearmodels. This keeps the
126128installation path familiar for Python users and supports Python 3.9 and
127- newer. Optional accelerator backends are used only where they materially
128- change the computation: PyTorch for neural causal estimators, JAX for
129+ newer versions of Python. Optional accelerator backends are used only
130+ where they materially change the computation: PyTorch for neural causal
131+ estimators, JAX for
129132selected bootstrap and linear algebra workloads, and a Rust/PyO3 kernel
130133for high-dimensional fixed-effect and cluster-variance routines. This
131134keeps the default package inspectable while allowing heavy workloads to
@@ -149,9 +152,10 @@ The near-term research impact is a more reproducible empirical workflow
149152for applied policy evaluation. Because methods share one interface,
150153researchers can compare estimators on the same data, export tables with
151154the same metadata, and record the citations and assumptions attached to
152- each analysis. Early use in Stanford REAP research workflows has shown
153- the value of the package for rapid policy-evaluation prototyping, while
154- the agent-native registry supports a second use case: AI-assisted
155+ each analysis. Early use in research workflows of the Rural Education
156+ Action Program at Stanford University has shown the value of the
157+ package for rapid policy-evaluation prototyping, while the agent-native
158+ registry supports a second use case: AI-assisted
155159replication and robustness analysis in which statistical tools are
156160discovered and invoked through explicit schemas rather than informal
157161prompts.
@@ -161,9 +165,9 @@ prompts.
161165Generative AI tools, including Claude and OpenAI/Codex, were used to
162166draft portions of the documentation, assist with code generation, and
163167revise this manuscript. The corresponding author reviewed AI-generated
164- text, checked citations and software claims against repository evidence,
165- and retained responsibility for the correctness of the package and this
166- paper.
168+ text and checked citations and software claims against repository
169+ evidence. All of the authors take responsibility for the correctness of
170+ the package and this paper.
167171
168172# Acknowledgements
169173
0 commit comments