RiskLens

An end-to-end consumer credit risk modeling pipeline - Probability of Default, Loss Given Default, and Expected Loss - built on Lending Club data, with drift monitoring, a live dashboard, and MRM-style model documentation.

Live dashboard · Model Development Document

What it is

RiskLens is a complete credit risk modeling pipeline on ~1.35M resolved Lending Club loans (2007-2018). It estimates the Expected Loss of an unsecured consumer loan at origination by modeling its three standard components separately - Probability of Default, Loss Given Default, and Exposure at Default - then composing them as EL = PD × LGD × EAD.

The project covers the full lifecycle a credit risk team works through: leakage-controlled feature engineering, an out-of-time validated PD model with an interpretable baseline, calibration analysis, survival/vintage curves, Bayesian uncertainty quantification, production drift monitoring (PSI/CSI), a containerized scoring API, a public dashboard, and a Model Risk Management (MRM) style model development document.

Key results

All metrics are measured on a strict out-of-time test set (2018 Q1-Q2 originations) held out from training and model selection.

Metric	Value	Notes
PD test AUC (XGBoost)	0.719	Inside the 0.68-0.74 no-leakage range for origination-only features
PD test AUC (logistic baseline)	0.695	Interpretable reference model
PD test KS / Brier	0.329 / 0.139	Credit-standard separation and accuracy
PD test ECE (raw XGBoost)	0.017	Already well-calibrated; uncalibrated model recommended
Portfolio Expected Loss	$3.97B predicted vs $3.91B observed	Across $19.40B funded exposure
EL rate error	32 bps	20.47% predicted vs 20.14% observed
Survival concordance (Cox PH)	0.670	Time-to-default ranking
Max feature drift (CSI)	0.13 (interest rate)	2015-2016 vs 2013-2014 baseline

The PD test AUC of 0.719 sits inside the range expected for a clean, leakage-free build — a 115-column post-origination whitelist guards against target leakage.

Architecture

flowchart TD
    A[Raw Lending Club data<br/>2007-2018, ~2.26M loans] --> B[Feature pipeline<br/>leakage whitelist, derived features]
    B --> C[Vintage split<br/>train 2013-16 / val 2017 / test 2018]
    C --> D[PD models<br/>logistic baseline + XGBoost]
    C --> E[LGD model<br/>two-stage hurdle]
    C --> F[Survival model<br/>Cox PH + vintage curves]
    C --> G[Bayesian hierarchical<br/>partial pooling]
    D --> H[Calibration + SHAP]
    D --> I[Expected Loss<br/>EL = PD x LGD x EAD]
    E --> I
    B --> J[Drift monitoring<br/>PSI / CSI / change-point]
    I --> K[Streamlit dashboard]
    J --> K
    I --> L[FastAPI scoring service<br/>Dockerized]
    H --> M[MRM model document]
    F --> M
    G --> M
    J --> M

Repository structure

risklens/
├── src/risklens/          Core package
│   ├── data/              Loading, vintage splits, dataset builders
│   ├── features/          Feature pipeline, leakage blacklist
│   ├── models/            PD, LGD, survival, Bayesian, Expected Loss
│   ├── monitoring/        PSI, CSI, change-point detection
│   └── evaluation/        Metrics, calibration plots, SHAP
├── scripts/               Numbered pipeline 01-18 (run in order)
├── notebooks/             EDA and model-development notebooks
├── app/                   Streamlit dashboard + FastAPI service
├── sql/                   SQL feature definitions and monitoring queries
├── tests/                 28-test suite (pytest)
├── docs/                  MRM document, case studies, figures
├── configs/               YAML model configurations
└── Dockerfile             Containerized inference service

Running the pipeline

RiskLens uses uv for dependency management.

# Install dependencies
uv sync

# Run the pipeline in order (scripts are numbered 01-18)
uv run python scripts/01_download_data.py
uv run python scripts/02_build_features.py
uv run python scripts/03_eda.py
uv run python scripts/04_train_pd.py
uv run python scripts/05_train_xgboost.py
# ... continue through scripts/18_detect_changepoints.py

Each script reads from a YAML config and writes versioned artifacts to artifacts/, so any result is reproducible from configuration.

Dashboard

uv run streamlit run app/dashboard.py

Or use the live deployment.

Scoring API

docker build -t risklens-api .
docker run -p 8000:8000 risklens-api
# POST loan features to http://localhost:8000/score

Tests

uv run pytest tests/ -v

Methodology highlights

Out-of-time validation. Train/validation/test are split strictly by origination vintage (2013-16 / 2017 / 2018), not randomly - this prevents future information leaking into training and gives an honest estimate of forward-looking performance.
Leakage control. A 115-column whitelist removes all post-origination fields; a test AUC above ~0.85 is treated as a leakage tripwire.
Calibration tested, not assumed. The raw XGBoost model proved already well-calibrated (ECE 0.017); isotonic calibration did not improve it, so the uncalibrated model is recommended.
Uncertainty quantification. A Bayesian hierarchical model with partial pooling produces credible intervals on segment default rates across 318 grade-by-vintage cells.
Leading-indicator monitoring. Bayesian change-point detection shows the interest-rate distribution shifted 8 quarters before default rates responded — demonstrating the early-warning value of input drift monitoring.

Documentation

Model Development Document - MRM-style 25-page document covering data, methodology, validation, limitations, monitoring, and governance.
COVID drift case study - distribution-shift detection in the lending portfolio.
Data dictionary - feature definitions and target specification.

Built on public Lending Club data as a portfolio demonstration of credit risk modeling and model risk documentation practice. Not deployed in production credit decisioning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RiskLens

What it is

Key results

Architecture

Repository structure

Running the pipeline

Dashboard

Scoring API

Tests

Methodology highlights

Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github/workflows		.github/workflows
.streamlit		.streamlit
app		app
artifacts		artifacts
configs		configs
data		data
docker		docker
docs		docs
notebooks		notebooks
scripts		scripts
sql		sql
src		src
tests		tests
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

RiskLens

What it is

Key results

Architecture

Repository structure

Running the pipeline

Dashboard

Scoring API

Tests

Methodology highlights

Documentation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages