|
| 1 | +# TanML: Automated Model Validation Toolkit for Tabular Machine Learning |
| 2 | + |
| 3 | +[](https://github.com/tdlabs-ai/tanml#license--citation) |
| 4 | +[](https://opensource.org/licenses/MIT) |
| 5 | +[](https://pepy.tech/project/tanml) |
| 6 | + |
| 7 | +**TanML** validates tabular ML models with a zero-config **Streamlit UI** and exports an audit-ready, **editable Word report (.docx)**. It covers data quality, correlation/VIF, performance, explainability (SHAP), and robustness/stress tests—built for regulated settings (MRM, credit risk, insurance, etc.). |
| 8 | + |
| 9 | +* **Status:** Beta (`0.x`) |
| 10 | +* **License:** MIT |
| 11 | +* **Python:** 3.8–3.12 |
| 12 | +* **OS:** Linux / macOS / Windows (incl. WSL) |
| 13 | + |
| 14 | +--- |
| 15 | + |
| 16 | +## Why TanML? |
| 17 | + |
| 18 | +* **Zero-config UI:** launch Streamlit, upload data, click **Run**—no YAML needed. |
| 19 | +* **Audit-ready outputs:** tables/plots + a polished DOCX your stakeholders can edit. |
| 20 | +* **Regulatory alignment:** supports common Model Risk Management themes (e.g., SR 11-7 style). |
| 21 | +* **Works with your stack:** scikit-learn, XGBoost/LightGBM/CatBoost, etc. |
| 22 | + |
| 23 | +--- |
| 24 | + |
| 25 | +## Install |
| 26 | + |
| 27 | +```bash |
| 28 | +pip install tanml |
| 29 | +``` |
| 30 | + |
| 31 | +## Quick Start (UI) |
| 32 | + |
| 33 | +```bash |
| 34 | +tanml ui |
| 35 | +``` |
| 36 | + |
| 37 | +* Opens at **[http://127.0.0.1:8501](http://127.0.0.1:8501)** |
| 38 | +* **Upload limit ~1 GB** (preconfigured) |
| 39 | +* **Telemetry disabled by default** |
| 40 | + |
| 41 | +### In the app |
| 42 | + |
| 43 | +1. **Load data** — upload a cleaned CSV/XLSX/Parquet (optional: raw or separate Train/Test). |
| 44 | +2. **Select target & features** — target auto-suggested; features default to all non-target columns. |
| 45 | +3. **Pick a model** — choose library/algorithm (scikit-learn, XGBoost, LightGBM, CatBoost) and tweak params. |
| 46 | +4. **Run validation** — click **▶️ Refit & validate**. |
| 47 | +5. **Export** — click **⬇️ Download report** to get a **DOCX** (auto-selects classification/regression template). |
| 48 | + |
| 49 | +**Outputs** |
| 50 | + |
| 51 | +* Report: `./.ui_runs/<session>/tanml_report_*.docx` |
| 52 | +* Artifacts (CSV/PNGs): `./.ui_runs/<session>/artifacts/*` |
| 53 | + |
| 54 | +--- |
| 55 | + |
| 56 | +## What TanML Checks |
| 57 | + |
| 58 | +* **Raw Data (optional):** rows/cols, missingness, duplicates, constant columns |
| 59 | +* **Data Quality & EDA:** summaries, distributions |
| 60 | +* **Correlation & Multicollinearity:** heatmap, top-pairs CSV, **VIF** table |
| 61 | +* **Performance** |
| 62 | + |
| 63 | + * **Classification:** AUC, PR-AUC, KS, decile lift, confusion |
| 64 | + * **Regression:** R², MAE, MSE/RMSE, error stats |
| 65 | +* **Explainability:** SHAP (auto explainer; configurable background size) |
| 66 | +* **Robustness/Stress Tests:** feature perturbations → delta-metrics |
| 67 | +* **Model Metadata:** model class, hyperparameters, features, training info |
| 68 | + |
| 69 | +--- |
| 70 | + |
| 71 | +## Optional CLI Flags |
| 72 | + |
| 73 | +Most users just run `tanml ui`. These help on teams/servers: |
| 74 | + |
| 75 | +```bash |
| 76 | +# Share on LAN |
| 77 | +tanml ui --public |
| 78 | + |
| 79 | +# Different port |
| 80 | +tanml ui --port 9000 |
| 81 | + |
| 82 | +# Headless (server/CI; no auto-open browser) |
| 83 | +tanml ui --headless |
| 84 | + |
| 85 | +# Larger limit (e.g., 2 GB) |
| 86 | +tanml ui --max-mb 2048 |
| 87 | +``` |
| 88 | + |
| 89 | +Env var equivalents (Linux/macOS bash): |
| 90 | + |
| 91 | +```bash |
| 92 | +TANML_SERVER_ADDRESS=0.0.0.0 TANML_PORT=9000 TANML_MAX_MB=2048 tanml ui |
| 93 | +``` |
| 94 | + |
| 95 | +Windows PowerShell: |
| 96 | + |
| 97 | +```powershell |
| 98 | +$env:TANML_SERVER_ADDRESS="0.0.0.0"; $env:TANML_PORT="9000"; $env:TANML_MAX_MB="2048"; tanml ui |
| 99 | +``` |
| 100 | + |
| 101 | +**Defaults:** address `127.0.0.1`, port `8501`, limit `1024 MB`, telemetry **OFF**. |
| 102 | + |
| 103 | +--- |
| 104 | + |
| 105 | +## Templates |
| 106 | + |
| 107 | +TanML ships DOCX templates (packaged in wheel & sdist): |
| 108 | + |
| 109 | +* `tanml/report/templates/report_template_cls.docx` |
| 110 | +* `tanml/report/templates/report_template_reg.docx` |
| 111 | + |
| 112 | +--- |
| 113 | + |
| 114 | +## Troubleshooting |
| 115 | + |
| 116 | +* **Page didn’t open?** Visit `http://127.0.0.1:8501` or run `tanml ui --port 9000`. |
| 117 | +* **Large CSVs are slow/heavy?** Prefer **Parquet**; CSV → DataFrame can use several GB RAM. |
| 118 | +* **Artifacts missing?** Check `./.ui_runs/<session>/artifacts/`. |
| 119 | +* **Corporate networks:** use `tanml ui --public` to share on LAN. |
| 120 | + |
| 121 | +--- |
| 122 | + |
| 123 | +## License & Citation |
| 124 | + |
| 125 | +**License:** MIT. See [LICENSE](https://github.com/tdlabs-ai/tanml/blob/main/LICENSE). |
| 126 | +SPDX-License-Identifier: MIT |
| 127 | + |
| 128 | +© 2025 Tanmay Sah and Dolly Sah. You may use, modify, and distribute this software with appropriate attribution. |
| 129 | + |
| 130 | +### How to cite |
| 131 | + |
| 132 | +If TanML helps your work or publications, please cite: |
| 133 | + |
| 134 | +> Sah, T., & Sah, D. (2025). *TanML: Automated Model Validation Toolkit for Tabular Machine Learning* [Software]. Available at https://github.com/tdlabs-ai/tanml |
| 135 | +
|
| 136 | +Or in BibTeX (version-agnostic): |
| 137 | + |
| 138 | +```bibtex |
| 139 | +@misc{tanml, |
| 140 | + author = {Sah, Tanmay and Sah, Dolly}, |
| 141 | + title = {TanML: Automated Model Validation Toolkit for Tabular Machine Learning}, |
| 142 | + year = {2025}, |
| 143 | + note = {Software; MIT License}, |
| 144 | + url = {https://github.com/tdlabs-ai/tanml} |
| 145 | +} |
| 146 | +``` |
| 147 | + |
| 148 | +A machine-readable citation file (`CITATION.cff`) is included for citation tools and GitHub’s “Cite this repository” button. |
0 commit comments