Skip to content

Commit a5d052a

Browse files
committed
feat: initial public repo (PyPI v0.1.7)
0 parents  commit a5d052a

54 files changed

Lines changed: 6331 additions & 0 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# venv & editors
2+
.venv/
3+
.venv_wsl/
4+
.vscode/
5+
.idea/
6+
7+
# builds & packaging
8+
dist/
9+
build/
10+
*.egg-info/
11+
12+
# caches
13+
__pycache__/
14+
*.pyc
15+
.pytest_cache/
16+
.ipynb_checkpoints/
17+
18+
# local outputs (don’t commit)
19+
.ui_runs/
20+
reports/
21+
data/
22+
models/
23+
catboost_info/
24+
25+
# OS junk
26+
.DS_Store
27+
Thumbs.db
28+
29+
# secrets
30+
.env
31+
.streamlit/secrets.toml

CITATION.cff

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
cff-version: 1.2.0
2+
message: "If you use TanML, please cite this repository as follows."
3+
title: "TanML: Automated Model Validation Toolkit for Tabular Machine Learning"
4+
license: MIT
5+
repository-code: https://github.com/tdlabs-ai/tanml
6+
keywords:
7+
- model validation
8+
- model risk management
9+
- model governance
10+
- SR 11-7
11+
- tabular ML
12+
- credit risk
13+
- insurance analytics
14+
- explainability
15+
- XAI
16+
- SHAP
17+
- stress testing
18+
- reporting
19+
- docx
20+
- streamlit
21+
- xgboost
22+
- lightgbm
23+
- catboost
24+
25+
authors:
26+
- family-names: Sah
27+
given-names: Tanmay
28+
orcid: https://orcid.org/0009-0004-8583-2208
29+
- family-names: Sah
30+
given-names: Dolly
31+
32+
preferred-citation:
33+
type: software
34+
title: "TanML: Automated Model Validation Toolkit for Tabular Machine Learning"
35+
authors:
36+
- family-names: Sah
37+
given-names: Tanmay
38+
orcid: https://orcid.org/0009-0004-8583-2208
39+
- family-names: Sah
40+
given-names: Dolly
41+
repository-code: https://github.com/tdlabs-ai/tanml
42+
license: MIT
43+
year: 2025

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2025 Tanmay Sah and Dolly Sah
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

MANIFEST.in

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
include README.md
2+
include LICENSE
3+
include pyproject.toml
4+
include tanml/report/templates/*.docx

README.md

Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
# TanML: Automated Model Validation Toolkit for Tabular Machine Learning
2+
3+
[![Cite this repo](https://img.shields.io/badge/Cite-this_repo-blue)](https://github.com/tdlabs-ai/tanml#license--citation)
4+
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
5+
[![Downloads](https://pepy.tech/badge/tanml)](https://pepy.tech/project/tanml)
6+
7+
**TanML** validates tabular ML models with a zero-config **Streamlit UI** and exports an audit-ready, **editable Word report (.docx)**. It covers data quality, correlation/VIF, performance, explainability (SHAP), and robustness/stress tests—built for regulated settings (MRM, credit risk, insurance, etc.).
8+
9+
* **Status:** Beta (`0.x`)
10+
* **License:** MIT
11+
* **Python:** 3.8–3.12
12+
* **OS:** Linux / macOS / Windows (incl. WSL)
13+
14+
---
15+
16+
## Why TanML?
17+
18+
* **Zero-config UI:** launch Streamlit, upload data, click **Run**—no YAML needed.
19+
* **Audit-ready outputs:** tables/plots + a polished DOCX your stakeholders can edit.
20+
* **Regulatory alignment:** supports common Model Risk Management themes (e.g., SR 11-7 style).
21+
* **Works with your stack:** scikit-learn, XGBoost/LightGBM/CatBoost, etc.
22+
23+
---
24+
25+
## Install
26+
27+
```bash
28+
pip install tanml
29+
```
30+
31+
## Quick Start (UI)
32+
33+
```bash
34+
tanml ui
35+
```
36+
37+
* Opens at **[http://127.0.0.1:8501](http://127.0.0.1:8501)**
38+
* **Upload limit ~1 GB** (preconfigured)
39+
* **Telemetry disabled by default**
40+
41+
### In the app
42+
43+
1. **Load data** — upload a cleaned CSV/XLSX/Parquet (optional: raw or separate Train/Test).
44+
2. **Select target & features** — target auto-suggested; features default to all non-target columns.
45+
3. **Pick a model** — choose library/algorithm (scikit-learn, XGBoost, LightGBM, CatBoost) and tweak params.
46+
4. **Run validation** — click **▶️ Refit & validate**.
47+
5. **Export** — click **⬇️ Download report** to get a **DOCX** (auto-selects classification/regression template).
48+
49+
**Outputs**
50+
51+
* Report: `./.ui_runs/<session>/tanml_report_*.docx`
52+
* Artifacts (CSV/PNGs): `./.ui_runs/<session>/artifacts/*`
53+
54+
---
55+
56+
## What TanML Checks
57+
58+
* **Raw Data (optional):** rows/cols, missingness, duplicates, constant columns
59+
* **Data Quality & EDA:** summaries, distributions
60+
* **Correlation & Multicollinearity:** heatmap, top-pairs CSV, **VIF** table
61+
* **Performance**
62+
63+
* **Classification:** AUC, PR-AUC, KS, decile lift, confusion
64+
* **Regression:** R², MAE, MSE/RMSE, error stats
65+
* **Explainability:** SHAP (auto explainer; configurable background size)
66+
* **Robustness/Stress Tests:** feature perturbations → delta-metrics
67+
* **Model Metadata:** model class, hyperparameters, features, training info
68+
69+
---
70+
71+
## Optional CLI Flags
72+
73+
Most users just run `tanml ui`. These help on teams/servers:
74+
75+
```bash
76+
# Share on LAN
77+
tanml ui --public
78+
79+
# Different port
80+
tanml ui --port 9000
81+
82+
# Headless (server/CI; no auto-open browser)
83+
tanml ui --headless
84+
85+
# Larger limit (e.g., 2 GB)
86+
tanml ui --max-mb 2048
87+
```
88+
89+
Env var equivalents (Linux/macOS bash):
90+
91+
```bash
92+
TANML_SERVER_ADDRESS=0.0.0.0 TANML_PORT=9000 TANML_MAX_MB=2048 tanml ui
93+
```
94+
95+
Windows PowerShell:
96+
97+
```powershell
98+
$env:TANML_SERVER_ADDRESS="0.0.0.0"; $env:TANML_PORT="9000"; $env:TANML_MAX_MB="2048"; tanml ui
99+
```
100+
101+
**Defaults:** address `127.0.0.1`, port `8501`, limit `1024 MB`, telemetry **OFF**.
102+
103+
---
104+
105+
## Templates
106+
107+
TanML ships DOCX templates (packaged in wheel & sdist):
108+
109+
* `tanml/report/templates/report_template_cls.docx`
110+
* `tanml/report/templates/report_template_reg.docx`
111+
112+
---
113+
114+
## Troubleshooting
115+
116+
* **Page didn’t open?** Visit `http://127.0.0.1:8501` or run `tanml ui --port 9000`.
117+
* **Large CSVs are slow/heavy?** Prefer **Parquet**; CSV → DataFrame can use several GB RAM.
118+
* **Artifacts missing?** Check `./.ui_runs/<session>/artifacts/`.
119+
* **Corporate networks:** use `tanml ui --public` to share on LAN.
120+
121+
---
122+
123+
## License & Citation
124+
125+
**License:** MIT. See [LICENSE](https://github.com/tdlabs-ai/tanml/blob/main/LICENSE).
126+
SPDX-License-Identifier: MIT
127+
128+
© 2025 Tanmay Sah and Dolly Sah. You may use, modify, and distribute this software with appropriate attribution.
129+
130+
### How to cite
131+
132+
If TanML helps your work or publications, please cite:
133+
134+
> Sah, T., & Sah, D. (2025). *TanML: Automated Model Validation Toolkit for Tabular Machine Learning* [Software]. Available at https://github.com/tdlabs-ai/tanml
135+
136+
Or in BibTeX (version-agnostic):
137+
138+
```bibtex
139+
@misc{tanml,
140+
author = {Sah, Tanmay and Sah, Dolly},
141+
title = {TanML: Automated Model Validation Toolkit for Tabular Machine Learning},
142+
year = {2025},
143+
note = {Software; MIT License},
144+
url = {https://github.com/tdlabs-ai/tanml}
145+
}
146+
```
147+
148+
A machine-readable citation file (`CITATION.cff`) is included for citation tools and GitHub’s “Cite this repository” button.

docs/README-PyPI.md

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
# TanML: Automated Model Validation Toolkit for Tabular Machine Learning
2+
3+
[![Cite this repo](https://img.shields.io/badge/Cite-this_repo-blue)](https://github.com/tdlabs-ai/tanml#license--citation)
4+
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
5+
[![Downloads](https://pepy.tech/badge/tanml)](https://pepy.tech/project/tanml)
6+
7+
**TanML** validates tabular ML models with a zero-config **Streamlit UI** and exports an audit-ready, **editable Word report (.docx)**. It covers data quality, correlation/VIF, performance, explainability (SHAP), and robustness/stress tests—built for regulated settings (MRM, credit risk, insurance, etc.).
8+
9+
* **Status:** Beta (`0.x`)
10+
* **License:** MIT
11+
* **Python:** 3.8–3.12
12+
* **OS:** Linux / macOS / Windows (incl. WSL)
13+
14+
---
15+
16+
## Why TanML?
17+
18+
* **Zero-config UI:** launch Streamlit, upload data, click **Run**—no YAML needed.
19+
* **Audit-ready outputs:** tables/plots + a polished DOCX your stakeholders can edit.
20+
* **Regulatory alignment:** supports common Model Risk Management themes (e.g., SR 11-7 style).
21+
* **Works with your stack:** scikit-learn, XGBoost/LightGBM/CatBoost, etc.
22+
23+
---
24+
25+
## Install
26+
27+
```bash
28+
pip install tanml
29+
```
30+
31+
## Quick Start (UI)
32+
33+
```bash
34+
tanml ui
35+
```
36+
37+
* Opens at **[http://127.0.0.1:8501](http://127.0.0.1:8501)**
38+
* **Upload limit ~1 GB** (preconfigured)
39+
* **Telemetry disabled by default**
40+
41+
### In the app
42+
43+
1. **Load data** — upload a cleaned CSV/XLSX/Parquet (optional: raw or separate Train/Test).
44+
2. **Select target & features** — target auto-suggested; features default to all non-target columns.
45+
3. **Pick a model** — choose library/algorithm (scikit-learn, XGBoost, LightGBM, CatBoost) and tweak params.
46+
4. **Run validation** — click **▶️ Refit & validate**.
47+
5. **Export** — click **⬇️ Download report** to get a **DOCX** (auto-selects classification/regression template).
48+
49+
**Outputs**
50+
51+
* Report: `./.ui_runs/<session>/tanml_report_*.docx`
52+
* Artifacts (CSV/PNGs): `./.ui_runs/<session>/artifacts/*`
53+
54+
---
55+
56+
## What TanML Checks
57+
58+
* **Raw Data (optional):** rows/cols, missingness, duplicates, constant columns
59+
* **Data Quality & EDA:** summaries, distributions
60+
* **Correlation & Multicollinearity:** heatmap, top-pairs CSV, **VIF** table
61+
* **Performance**
62+
63+
* **Classification:** AUC, PR-AUC, KS, decile lift, confusion
64+
* **Regression:** R², MAE, MSE/RMSE, error stats
65+
* **Explainability:** SHAP (auto explainer; configurable background size)
66+
* **Robustness/Stress Tests:** feature perturbations → delta-metrics
67+
* **Model Metadata:** model class, hyperparameters, features, training info
68+
69+
---
70+
71+
72+
## Templates
73+
74+
TanML ships DOCX templates (packaged in wheel & sdist):
75+
76+
* `tanml/report/templates/report_template_cls.docx`
77+
* `tanml/report/templates/report_template_reg.docx`
78+
79+
---
80+
81+
82+
## License & Citation
83+
84+
**License:** MIT. See [LICENSE](https://github.com/tdlabs-ai/tanml/blob/main/LICENSE).
85+
SPDX-License-Identifier: MIT
86+
87+
© 2025 Tanmay Sah and Dolly Sah. You may use, modify, and distribute this software with appropriate attribution.
88+
89+
### How to cite
90+
91+
If TanML helps your work or publications, please cite:
92+
93+
> Sah, T., & Sah, D. (2025). *TanML: Automated Model Validation Toolkit for Tabular Machine Learning* [Software]. Available at https://github.com/tdlabs-ai/tanml
94+
95+
Or in BibTeX (version-agnostic):
96+
97+
```bibtex
98+
@misc{tanml,
99+
author = {Sah, Tanmay and Sah, Dolly},
100+
title = {TanML: Automated Model Validation Toolkit for Tabular Machine Learning},
101+
year = {2025},
102+
note = {Software; MIT License},
103+
url = {https://github.com/tdlabs-ai/tanml}
104+
}
105+
```
106+
107+
A machine-readable citation file (`CITATION.cff`) is included for citation tools and GitHub’s “Cite this repository” button.

0 commit comments

Comments
 (0)