This project predicts whether a loan applicant is likely to default on a loan using machine learning. The solution is built as an end-to-end ML system using PostgreSQL, XGBoost, and FastAPI.
Financial institutions face significant losses when customers fail to repay loans. The objective of this project is to predict customer default risk and classify applicants as:
- High Risk
- Low Risk This helps lenders make better credit approval decisions.
Home Credit Default Risk Dataset
- Python
- PostgreSQL
- Pandas
- NumPy
- Scikit-Learn
- XGBoost
- FastAPI
- Swagger UI
- Git
- GitHub
CSV Files ↓ PostgreSQL Database ↓ Feature Engineering ↓ Feature Store ↓ Data Preprocessing ↓ XGBoost Model ↓ Model Serialization ↓ FastAPI API ↓ Real-Time Predictions
- Loaded 11 source files into PostgreSQL
- Created feature engineering pipeline
- Built customer-level feature store
- Missing value handling
- Label Encoding
- Train/Test Split
- Logistic Regression
- Random Forest
- XGBoost
Metrics used:
- Accuracy
- Precision
- Recall
- F1 Score
- ROC-AUC
- FastAPI REST API
- Swagger Documentation
- Real-time customer risk prediction
- Dockerization
- Hyperparameter Tuning
- Model Monitoring
- CI/CD Pipeline
- Cloud Deployment
Sanjay Sekar
Data Scientist
