You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(rlasso): faithful hdm::rlassologit port (logistic rigorous Lasso)
The binary-outcome analogue of rlasso. hdm::rlassologit delegates its
penalized fit to glmnet's binomial lasso at a single data-driven lambda;
StatsPAI reproduces glmnet directly rather than substituting sklearn:
- _glmnet_logit_lasso: IRLS outer loop + weighted coordinate-descent
inner loop, 1/n-scaled deviance objective, population-variance
standardization, glmnet's pmin probability clamp. Matches R glmnet 4.1:
selected support EXACT, coefficients ~1e-6 (glmnet's own tolerance).
- rlassologit + RLassoLogitFit: post=True refits an unpenalized logistic
glm on the selected set (IRLS Newton) -> coefficients/intercept/
residuals match hdm to ~1e-9; post=False keeps the glmnet fit (~1e-6).
predict(type='response'/'link'); §3 result contract (cite/to_latex).
- RlassologitClassifier: a GENUINE (calibrated) logistic propensity for
Double-ML, wired as ml_m='rlassologit'; sp.dml(model='irm',
ml_m='rlassologit') uses it instead of the linear-probability
RlassoClassifier.
Subtlety fixed vs a naive port: hdm's default penalty list carries c=1.1
explicitly, so the post=FALSE -> c=0.5 switch is dead code on a default
call; penalty=None uses c=1.1 regardless of post (unlike rlasso).
The high-dim logistic *effect* (rlassologitEffect) is intentionally
deferred (a separate multi-day parity exercise) and documented as such.
Coverage: test_rlassologit_parity.py (5 hdm/glmnet pins, via
_generate_rlassologit.R) + 5 behavioural tests in test_rlasso.py.
Citations: only chernozhukov2016hdm (verified in paper.bib); glmnet/BCW
described in prose, no unverified bib keys (§10). Lazy-import contract
intact (import statspai pulls 0 sklearn submodules). registry 1125->1127.
(CHANGELOG also carries the earlier dispatcher note + a complementary
bch-deprecation doc paragraph that were sitting in the working tree.)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -138,7 +138,7 @@ StatsPAI's focus is **causal inference**. The grid below summarizes method-famil
138
138
139
139
**Legend**: B = broad API coverage within this comparison table; Y = implemented entry points; P = partial, scattered, or single-algorithm support; N = no first-class entry point. These are API-breadth labels, not validation tiers.
140
140
141
-
**StatsPAI at a glance**: 1,125 registered functions in the live agent registry · 87 submodules · 333k LOC (core) + 178k LOC (tests). All four numbers are reproducible from the canonical generator (`python scripts/registry_stats.py`); the per-module table in [`docs/stats.md`](docs/stats.md) is regenerated from the same script. For the API-breadth matrix (23 method families) and cross-ecosystem line-count comparison, see [`docs/stats.md`](docs/stats.md).
141
+
**StatsPAI at a glance**: 1,127 registered functions in the live agent registry · 87 submodules · 333k LOC (core) + 178k LOC (tests). All four numbers are reproducible from the canonical generator (`python scripts/registry_stats.py`); the per-module table in [`docs/stats.md`](docs/stats.md) is regenerated from the same script. For the API-breadth matrix (23 method families) and cross-ecosystem line-count comparison, see [`docs/stats.md`](docs/stats.md).
142
142
143
143
**Validation tiers matter**: `stability="stable"` means the public API is SemVer-stable; it does not by itself mean R/Stata/paper parity. Use `sp.list_functions(validation_status="certified")` for cross-language or published-reference evidence, and inspect `sp.describe_function(name)["limitations"]` before production use. See [`docs/guides/stability.md`](docs/guides/stability.md).
Copy file name to clipboardExpand all lines: docs/stats.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -90,7 +90,7 @@ Sorted by LOC. This table is generated from the live source tree by `python scri
90
90
|`causal_text`| 1,457 | 4 | 4 |
91
91
|`target_trial`| 1,457 | 7 | 9 |
92
92
|`mediation`| 1,454 | 4 | 6 |
93
-
|`rlasso`|1,442|5|6|
93
+
|`rlasso`|2,128|6|8|
94
94
|`bunching`| 1,437 | 5 | 8 |
95
95
|`fairness`| 1,418 | 3 | 6 |
96
96
|`power`| 1,404 | 3 | 12 |
@@ -126,7 +126,7 @@ Sorted by LOC. This table is generated from the live source tree by `python scri
126
126
|`checks`| 152 | 2 | 0 |
127
127
|`causal`| 111 | 1 | 0 |
128
128
|`schemas`| 0 | 0 | 0 |
129
-
|**Total**|**335,702**|**692**|**1125**|
129
+
|**Total**|**336,450**|**693**|**1127**|
130
130
## 3 · Causal-inference coverage matrix (full)
131
131
132
132
Legend: B = broad API coverage within this comparison table; Y = implemented entry points; P = partial, scattered, or single-algorithm support; N = no first-class entry point. These are API-breadth labels, not validation tiers.
"description": "``threshold`` -- coefficients below it are zeroed (default None).",
19112
+
"type": "object"
19113
+
},
19114
+
"intercept": {
19115
+
"default": true,
19116
+
"description": "Include an intercept.",
19117
+
"type": "boolean"
19118
+
},
19119
+
"penalty": {
19120
+
"description": "Overrides for ``c`` (slack; default 1.1 for ``post=True``, else 0.5), ``gamma`` (default ``0.1/log n``) and ``lambda`` (raw penalty; bypasses the data-driven level).",
19121
+
"type": "object"
19122
+
},
19123
+
"post": {
19124
+
"default": true,
19125
+
"description": "If ``True``, refit the selected support by *unpenalized* logistic regression (post-Lasso); else keep the glmnet-penalized fit.",
19126
+
"type": "boolean"
19127
+
},
19128
+
"y": {
19129
+
"description": "Outcome variable column name or outcome array.",
19130
+
"enum": [
19131
+
"0",
19132
+
"1"
19133
+
],
19134
+
"type": "string"
19135
+
}
19136
+
},
19137
+
"required": [
19138
+
"X",
19139
+
"y"
19140
+
],
19141
+
"type": "object"
19142
+
},
19143
+
"name": "rlassologit"
19144
+
},
19095
19145
{
19096
19146
"description": "Robust / unconstrained Synthetic Control. Assumptions: A convex (or regularized) combination of donor units reproduces the treated unit's pre-treatment outcome path; No interference: the treatment does not affect the donor units (SUTVA); No anticipation before the treatment date. Pre-conditions: Panel of one or more treated units plus an untreated donor pool, observed over time; Pre-treatment window long enough to fit donor weights (rule of thumb: more pre-periods than donors used); Outcome observed for every unit in every period. Failure modes: Large pre-treatment RMSPE -- the synthetic unit fails to track the treated unit before treatment -> Add donors / predictors, lengthen the pre-period, or use a bias-corrected estimator (sdid, augsynth); Placebo / permutation inference shows the estimate is not extreme relative to donors -> Report the placebo distribution honestly; the effect may not be distinguishable from noise. Alternatives: sp.sdid, sp.augsynth, sp.gsynth, sp.callaway_santanna. Typical minimum N: 15.",
0 commit comments