brycewang-stanford
diff --git a/‎CHANGELOG.md‎
Lines changed: 23 additions & 5 deletions b/‎CHANGELOG.md‎
Lines changed: 23 additions & 5 deletions
diff --git a/‎README.md‎
Lines changed: 1 addition & 1 deletion b/‎README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎README_CN.md‎
Lines changed: 1 addition & 1 deletion b/‎README_CN.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/guides/rigorous_lasso_hdm.md‎
Lines changed: 38 additions & 13 deletions b/‎docs/guides/rigorous_lasso_hdm.md‎
Lines changed: 38 additions & 13 deletions
diff --git a/‎docs/reference/index.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/reference/index.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/stats.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/stats.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎schemas/functions.json‎
Lines changed: 87 additions & 0 deletions b/‎schemas/functions.json‎
Lines changed: 87 additions & 0 deletions
diff --git a/‎schemas/index.json‎
Lines changed: 2 additions & 2 deletions b/‎schemas/index.json‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎schemas/tools.json‎
Lines changed: 50 additions & 0 deletions b/‎schemas/tools.json‎
Lines changed: 50 additions & 0 deletions
diff --git a/‎src/statspai/__init__.py‎
Lines changed: 4 additions & 0 deletions b/‎src/statspai/__init__.py‎
Lines changed: 4 additions & 0 deletions
@@ -28,11 +28,27 @@ All notable changes to StatsPAI will be documented in this file.
     (coef 0.2274, SE 0.2466) — **resolving the previously-tracked
     divergence** where the older `iv.bch_post_lasso_iv` was ~17× off
     (0.013). All four selection regimes (Z-only, X-only, both, none)
-    match `hdm` to ~1e-6 on a well-conditioned design.
+    match `hdm` to ~1e-6 on a well-conditioned design. Also routable
+    through the IV family dispatcher: `sp.iv(method='rlasso', ...)`
+    (instrument selection by default; double selection when `exog=`
+    controls are passed).
   - `sp.RlassoRegressor` / `sp.RlassoClassifier` — scikit-learn-compatible
     adapters so the rigorous Lasso can serve as a Double-ML nuisance
     learner: `sp.dml(model='plr', ml_g='rlasso', ml_m='rlasso')` now
     works (clone-safe across cross-fitting folds).
+  - `sp.rlassologit` — the **logistic** rigorous (post-)Lasso, a faithful
+    port of `hdm::rlassologit`. hdm delegates the penalized fit to glmnet's
+    binomial lasso at a single data-driven `λ`; StatsPAI reproduces glmnet
+    directly (IRLS + weighted coordinate descent, `1/n` deviance,
+    population-variance standardization, `pmin` clamp). The selected
+    support matches glmnet **exactly**; engine coefficients match R glmnet
+    4.1 to ~1e-6; `post=True` coefficients/residuals (unpenalized logistic
+    refit on the selected set) match `hdm` to ~1e-9. `sp.RlassologitClassifier`
+    is a *genuine* (calibrated) logistic propensity for Double-ML —
+    `sp.dml(model='irm', ml_m='rlassologit')` — unlike the linear-probability
+    `RlassoClassifier`. The high-dim logistic *effect* (`rlassologitEffect`)
+    is intentionally deferred (a separate parity exercise). 5 hdm/glmnet
+    parity pins in `test_rlassologit_parity.py`.
   Coverage: `tests/reference_parity/test_rlasso_parity.py` (17 pins vs
   `hdm`, generated by `_generate_rlasso.R` — including `rlasso_effects`
   multi-target and a tight `sp.dml(ml_g='rlasso')` pin against a manual
@@ -77,10 +93,12 @@ All notable changes to StatsPAI will be documented in this file.
   Chiang–Kato–Ma–Sasaki (*JBES* 40(3), 2022, doi
   `10.1080/07350015.2021.1895815`) references to `paper.bib`; cite `hdm`
   from the post-Lasso IV / RD-lasso modules that implement its methods.
-  The DML guide now documents the relationship to `hdm` openly, including
-  that `sp.lasso_iv` / `bch_post_lasso_iv` can select fewer instruments
-  than `hdm::rlassoIV` on weak-instrument designs (a tracked roadmap item,
-  not a silent divergence).
+  The DML guide documents the relationship to `hdm` openly. (The original
+  hand-rolled `bch_post_lasso_iv` under-selected instruments vs
+  `hdm::rlassoIV` on weak-instrument designs; this is now resolved by the
+  dedicated `sp.rlasso` port — `sp.rlasso_iv` reproduces `hdm::rlassoIV`
+  exactly — and `bch_post_lasso_iv` is deprecated. See the `sp.rlasso`
+  entry above.)
 
 ### Deprecated
 
 
@@ -138,7 +138,7 @@ StatsPAI's focus is **causal inference**. The grid below summarizes method-famil
 
 **Legend**: B = broad API coverage within this comparison table; Y = implemented entry points; P = partial, scattered, or single-algorithm support; N = no first-class entry point. These are API-breadth labels, not validation tiers.
 
-**StatsPAI at a glance**: 1,125 registered functions in the live agent registry · 87 submodules · 333k LOC (core) + 178k LOC (tests). All four numbers are reproducible from the canonical generator (`python scripts/registry_stats.py`); the per-module table in [`docs/stats.md`](docs/stats.md) is regenerated from the same script. For the API-breadth matrix (23 method families) and cross-ecosystem line-count comparison, see [`docs/stats.md`](docs/stats.md).
+**StatsPAI at a glance**: 1,127 registered functions in the live agent registry · 87 submodules · 333k LOC (core) + 178k LOC (tests). All four numbers are reproducible from the canonical generator (`python scripts/registry_stats.py`); the per-module table in [`docs/stats.md`](docs/stats.md) is regenerated from the same script. For the API-breadth matrix (23 method families) and cross-ecosystem line-count comparison, see [`docs/stats.md`](docs/stats.md).
 
 **Validation tiers matter**: `stability="stable"` means the public API is SemVer-stable; it does not by itself mean R/Stata/paper parity. Use `sp.list_functions(validation_status="certified")` for cross-language or published-reference evidence, and inspect `sp.describe_function(name)["limitations"]` before production use. See [`docs/guides/stability.md`](docs/guides/stability.md).
 
 
@@ -46,7 +46,7 @@ StatsPAI 聚焦**因果推断**。下表描述方法家族层面的 API 覆盖
 
 **图例**：B = 本表范围内 API 覆盖较宽；Y = 有已实现入口；P = 部分、分散或单算法支持；N = 无一等入口。这些只是 API 广度标签，不是 validation tier。
 
-**StatsPAI 一句话概览**：live agent registry 中有 1,125 个注册函数 · 87 个子模块 · 333k 行核心代码 + 178k 行测试。这四个数字都可以由唯一的生成器 (`python scripts/registry_stats.py`) 现场复算；[`docs/stats.md`](docs/stats.md) 中的按模块拆分表也由同一个脚本回写。23 个方法家族的 API 广度矩阵以及跨生态行数对比，详见 [`docs/stats.md`](docs/stats.md)。这些覆盖数字描述 API 广度，不等同于每个函数都有 R/Stata 数值验证；生产使用请查看 `validation_status`。
+**StatsPAI 一句话概览**：live agent registry 中有 1,127 个注册函数 · 87 个子模块 · 333k 行核心代码 + 178k 行测试。这四个数字都可以由唯一的生成器 (`python scripts/registry_stats.py`) 现场复算；[`docs/stats.md`](docs/stats.md) 中的按模块拆分表也由同一个脚本回写。23 个方法家族的 API 广度矩阵以及跨生态行数对比，详见 [`docs/stats.md`](docs/stats.md)。这些覆盖数字描述 API 广度，不等同于每个函数都有 R/Stata 数值验证；生产使用请查看 `validation_status`。
 
 **📦 v1.19.0（2026-06-20）— 跨引擎验证、数据 MCP 归一化、社会网络分析**
 
 
@@ -140,19 +140,44 @@ two are not numerically interchangeable.
 
 Ported and parity-tested against `hdm`: `rlasso`, `rlassoEffect` /
 `rlassoEffects` (single and multi-target), `rlassoIV` (all four selection
-regimes), `tsls`, and the data-driven `lambdaCalculation` for the
-homoskedastic and heteroskedastic (X-independent) penalties.
-
-**Not yet ported — `rlassologit`** (the *logistic* rigorous Lasso and its
-`rlassologitEffect(s)`). `hdm::rlassologit` delegates the penalized fit to
-`glmnet::glmnet(family="binomial")` at a single data-driven `λ`. Reproducing
-it faithfully means matching glmnet's logistic-lasso solution — its
-standardization, intercept handling and objective scaling differ from
-scikit-learn's L1 logistic regression at a fixed `λ` — which is a separate
-parity exercise. Rather than ship an unvalidated approximation (the very
-failure mode this module was built to avoid), it is intentionally left out
-until it can be pinned against `glmnet`. For a binary treatment under
-Double-ML, use `sp.dml(model='irm', ...)` with a genuine classifier.
+regimes), `tsls`, `rlassologit` (the logistic rigorous Lasso), and the
+data-driven `lambdaCalculation` for the homoskedastic and heteroskedastic
+(X-independent) penalties.
+
+### `sp.rlassologit` — the logistic rigorous Lasso
+
+`hdm::rlassologit` is the binary-outcome analogue of `rlasso`: its
+penalized fit is `glmnet(family="binomial", alpha=1, lambda=λ,
+standardize=TRUE)` at a single data-driven `λ`. StatsPAI reproduces
+glmnet's binomial lasso at that `λ` **directly** — an IRLS outer loop, a
+weighted coordinate-descent inner loop, the `1/n`-scaled deviance
+objective, population-variance standardization and glmnet's `pmin`
+probability clamp — rather than substituting scikit-learn's L1 logistic
+(whose objective/standardization differ at a fixed `λ`).
+
+```python
+fit = sp.rlassologit(X, y, post=True)   # y binary
+fit.predict(X, type="response")          # probabilities  (or "link" = log-odds)
+sp.RlassologitClassifier()               # genuine logistic propensity for sp.dml
+```
+
+Parity (vs `hdm` 0.3.2 / `glmnet` 4.1): the **selected support matches
+exactly**; the glmnet engine's coefficients match to ~1e-6 (glmnet's own
+convergence tolerance — no tighter ground truth exists); and `post=True`
+(the default) coefficients/intercept/residuals — coming from an
+*unpenalized* logistic refit on the selected set — match to ~1e-9.
+
+`sp.RlassologitClassifier` is the principled binary nuisance learner for
+Double-ML: `sp.dml(model='irm', ml_m='rlassologit')` uses a *calibrated*
+logistic propensity (unlike the linear-probability `RlassoClassifier`).
+
+**Not yet ported — `rlassologitEffect(s)`**, the high-dimensional
+*logistic treatment effect* (BCW double-selection with `√σ²`-weighting and
+a max-of-two sandwich variance). It layers several `rlassologit` /
+`rlasso` fits plus a glm with non-obvious internals; it is a separate
+multi-day parity exercise and is intentionally left out rather than
+shipped unvalidated. For a binary-treatment causal effect, use
+`sp.dml(model='irm', ml_m='rlassologit')`.
 
 **X-dependent penalty simulation** (`penalty={"X.dependent.lambda": True}`)
 is implemented but matches `hdm` only *in distribution* — R's
 
@@ -1,6 +1,6 @@
 # API Reference — Overview
 
-StatsPAI exposes 1,125 registered public functions under a single
+StatsPAI exposes 1,127 registered public functions under a single
 `import statspai as sp` namespace. Reference pages are grouped by
 methodological area:
 
 
@@ -90,7 +90,7 @@ Sorted by LOC. This table is generated from the live source tree by `python scri
 | `causal_text` | 1,457 | 4 | 4 |
 | `target_trial` | 1,457 | 7 | 9 |
 | `mediation` | 1,454 | 4 | 6 |
-| `rlasso` | 1,442 | 5 | 6 |
+| `rlasso` | 2,128 | 6 | 8 |
 | `bunching` | 1,437 | 5 | 8 |
 | `fairness` | 1,418 | 3 | 6 |
 | `power` | 1,404 | 3 | 12 |
@@ -126,7 +126,7 @@ Sorted by LOC. This table is generated from the live source tree by `python scri
 | `checks` | 152 | 2 | 0 |
 | `causal` | 111 | 1 | 0 |
 | `schemas` | 0 | 0 | 0 |
-| **Total** | **335,702** | **692** | **1125** |
+| **Total** | **336,450** | **693** | **1127** |
 ## 3 · Causal-inference coverage matrix (full)
 
 Legend: B = broad API coverage within this comparison table; Y = implemented entry points; P = partial, scattered, or single-algorithm support; N = no first-class entry point. These are API-breadth labels, not validation tiers.
 
@@ -32548,6 +32548,56 @@
       "type": "object"
     }
   },
+  {
+    "description": "Logistic rigorous (post-)Lasso -- a faithful port of ``hdm::rlassologit``.",
+    "name": "rlassologit",
+    "parameters": {
+      "properties": {
+        "X": {
+          "description": "Feature matrix or covariate DataFrame.",
+          "type": "string"
+        },
+        "colnames": {
+          "description": "Column names (default ``V1..Vp``).",
+          "items": {
+            "type": "string"
+          },
+          "type": "array"
+        },
+        "control": {
+          "description": "``threshold`` -- coefficients below it are zeroed (default None).",
+          "type": "object"
+        },
+        "intercept": {
+          "default": true,
+          "description": "Include an intercept.",
+          "type": "boolean"
+        },
+        "penalty": {
+          "description": "Overrides for ``c`` (slack; default 1.1 for ``post=True``, else 0.5), ``gamma`` (default ``0.1/log n``) and ``lambda`` (raw penalty; bypasses the data-driven level).",
+          "type": "object"
+        },
+        "post": {
+          "default": true,
+          "description": "If ``True``, refit the selected support by *unpenalized* logistic regression (post-Lasso); else keep the glmnet-penalized fit.",
+          "type": "boolean"
+        },
+        "y": {
+          "description": "Outcome variable column name or outcome array.",
+          "enum": [
+            "0",
+            "1"
+          ],
+          "type": "string"
+        }
+      },
+      "required": [
+        "X",
+        "y"
+      ],
+      "type": "object"
+    }
+  },
   {
     "description": "Rigorous (post-)Lasso as a scikit-learn regressor.",
     "name": "RlassoRegressor",
@@ -32659,6 +32709,43 @@
       "type": "object"
     }
   },
+  {
+    "description": "Logistic rigorous-Lasso classifier -- a genuine (calibrated) propensity.",
+    "name": "RlassologitClassifier",
+    "parameters": {
+      "properties": {
+        "c": {
+          "description": "c parameter (Optional[float]).",
+          "type": "number"
+        },
+        "clip": {
+          "default": 1e-05,
+          "description": "clip parameter (float).",
+          "type": "number"
+        },
+        "gamma": {
+          "description": "gamma parameter (Optional[float]).",
+          "type": "number"
+        },
+        "intercept": {
+          "default": true,
+          "description": "intercept parameter (bool).",
+          "type": "boolean"
+        },
+        "lambda_": {
+          "description": "lambda_ parameter (Optional[float]).",
+          "type": "number"
+        },
+        "post": {
+          "default": true,
+          "description": "post parameter (bool).",
+          "type": "boolean"
+        }
+      },
+      "required": [],
+      "type": "object"
+    }
+  },
   {
     "description": "Estimate OLS / IV with high-dimensional fixed effects via pyfixest. Validation: certified parity evidence.",
     "name": "feols",
 
@@ -1,8 +1,8 @@
 {
   "counts": {
     "agent_cards": 376,
-    "functions": 1125,
-    "tools": 510
+    "functions": 1127,
+    "tools": 511
   },
   "files": [
     "tools.json",
 
@@ -19092,6 +19092,56 @@
     },
     "name": "rlasso_iv"
   },
+  {
+    "description": "Logistic rigorous (post-)Lasso -- a faithful port of ``hdm::rlassologit``.",
+    "input_schema": {
+      "properties": {
+        "X": {
+          "description": "Feature matrix or covariate DataFrame.",
+          "type": "string"
+        },
+        "colnames": {
+          "description": "Column names (default ``V1..Vp``).",
+          "items": {
+            "type": "string"
+          },
+          "type": "array"
+        },
+        "control": {
+          "description": "``threshold`` -- coefficients below it are zeroed (default None).",
+          "type": "object"
+        },
+        "intercept": {
+          "default": true,
+          "description": "Include an intercept.",
+          "type": "boolean"
+        },
+        "penalty": {
+          "description": "Overrides for ``c`` (slack; default 1.1 for ``post=True``, else 0.5), ``gamma`` (default ``0.1/log n``) and ``lambda`` (raw penalty; bypasses the data-driven level).",
+          "type": "object"
+        },
+        "post": {
+          "default": true,
+          "description": "If ``True``, refit the selected support by *unpenalized* logistic regression (post-Lasso); else keep the glmnet-penalized fit.",
+          "type": "boolean"
+        },
+        "y": {
+          "description": "Outcome variable column name or outcome array.",
+          "enum": [
+            "0",
+            "1"
+          ],
+          "type": "string"
+        }
+      },
+      "required": [
+        "X",
+        "y"
+      ],
+      "type": "object"
+    },
+    "name": "rlassologit"
+  },
   {
     "description": "Robust / unconstrained Synthetic Control. Assumptions: A convex (or regularized) combination of donor units reproduces the treated unit's pre-treatment outcome path; No interference: the treatment does not affect the donor units (SUTVA); No anticipation before the treatment date. Pre-conditions: Panel of one or more treated units plus an untreated donor pool, observed over time; Pre-treatment window long enough to fit donor weights (rule of thumb: more pre-periods than donors used); Outcome observed for every unit in every period. Failure modes: Large pre-treatment RMSPE -- the synthetic unit fails to track the treated unit before treatment -> Add donors / predictors, lengthen the pre-period, or use a bias-corrected estimator (sdid, augsynth); Placebo / permutation inference shows the estimate is not extreme relative to donors -> Report the placebo distribution honestly; the effect may not be distinguishable from noise. Alternatives: sp.sdid, sp.augsynth, sp.gsynth, sp.callaway_santanna. Typical minimum N: 15.",
     "input_schema": {
 
@@ -894,8 +894,10 @@
     rlasso_effect,
     rlasso_effects,
     rlasso_iv,
+    rlassologit,
     RlassoRegressor,
     RlassoClassifier,
+    RlassologitClassifier,
 )
 
 # High-dimensional fixed effects (pyfixest backend)
@@ -1651,8 +1653,10 @@
     "rlasso_effect",
     "rlasso_effects",
     "rlasso_iv",
+    "rlassologit",
     "RlassoRegressor",
     "RlassoClassifier",
+    "RlassologitClassifier",
     # High-dimensional FE (pyfixest backend, optional)
     "feols",
     "fepois",