This document provides an overview of TanML's architecture, folder structure, and component interactions.
This diagram illustrates the end-to-end lifecycle of a TanML workflow, from raw data ingestion to the final audit report.
flowchart LR
A["📁 Data Ingestion"] --> B["📊 Profiling"]
B --> C["🔧 Preprocessing"]
C --> D["📈 Feature Tech"]
D --> E["🤖 Model Build"]
E --> F["🎯 Validation"]
F --> G["📝 Audit Report"]
tanml/
├── __init__.py # Package exports & version
├── cli/ # Command-line interface
│ └── main.py # Entry point for `tanml` command
│
├── models/ # ML Model Management
│ └── registry.py # Model factory, task inference, estimator builders
│
├── checks/ # Validation & Quality Checks
│ ├── meta.py # Dataset metadata checks
│ ├── drift.py # Feature drift detection
│ └── stability.py # Model stability analysis
│
├── analysis/ # Statistical Analysis
│ ├── stats.py # Statistical metrics & tests
│ ├── shap_utils.py # SHAP explainability
│ └── cluster.py # Cluster coverage analysis
│
├── utils/ # Utilities
│ └── data_loader.py # Multi-format data loading (CSV, Excel, Parquet, etc.)
│
└── ui/ # Streamlit Web Interface
├── app.py # Main Streamlit app & routing
├── config.py # UI configuration & styling
├── narratives.py # Auto-generated insights text
├── glossary.py # Term definitions
│
├── assets/ # Static files (logo, images)
│
├── pages/ # Page modules
│ ├── setup.py # Home page
│ ├── profiling.py # Data Profiling
│ ├── preprocessing.py # Data Preprocessing
│ ├── ranking.py # Feature Power Ranking
│ ├── model_dev.py # Model Development
│ └── evaluation/ # Model Evaluation (modular tabs)
│ ├── main.py # Tab orchestration
│ └── tabs/ # Extensible tab system
│ ├── registry.py # Tab registration
│ ├── metrics.py # Performance metrics
│ ├── drift.py # Drift analysis
│ ├── stress.py # Stress testing
│ ├── explainability.py # SHAP analysis
│ └── benchmark.py # Model comparison
│
├── components/ # Reusable UI components
│ └── forms.py # Model configuration forms
│
├── services/ # Business logic services
│ ├── data.py # Data upload & management
│ ├── session.py # Session state utilities
│ └── cv.py # Cross-validation runner
│
└── reports/ # Report generation
├── generators.py # DOCX report builders
└── templates/ # Report templates (if any)
- Purpose: Factory pattern for ML models
- Features:
list_models(): Available algorithms by librarybuild_estimator(): Create model instancesinfer_task_from_target(): Auto-detect classification/regression
- Purpose: Extensible evaluation tabs
- Pattern: Registry-based plugin architecture
- Adding new tabs: Create module in
tabs/, register with@register_tab
- Purpose: Generate audit-ready DOCX reports
- Reports:
_generate_dev_report_docx(): Model Development report_generate_eval_report_docx(): Model Evaluation report_generate_ranking_report_docx(): Feature Ranking report
- data.py: File upload handling, data persistence
- session.py: Streamlit session state management
- cv.py: Repeated cross-validation with metrics collection
| Layer | Technology |
|---|---|
| Web Framework | Streamlit |
| ML Libraries | Scikit-learn, XGBoost, LightGBM, CatBoost, Statsmodels |
| Explainability | SHAP |
| Data Processing | pandas, NumPy |
| Visualization | Matplotlib, Seaborn, Altair |
| Report Generation | python-docx |
| Data Formats | CSV, Excel, Parquet, Stata, SPSS, SAS |
| Entry Point | Command | Description |
|---|---|---|
| CLI | tanml ui |
Launches the TanML UI |
| CLI with options | tanml ui --port 8080 |
Launch with custom options |
| Import | import tanml |
Use as Python library |
See CONTRIBUTING.md for guidelines on adding new features, tabs, or models.