Skip to content

Ritesh-456/gold-price-ltv-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

14 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Gold Price Forecasting and Loan-to-Value (LTV) Risk Modelling Using Machine Learning and Deep Learning Techniques

๐Ÿฆ 1. Introduction

Gold-backed lending is one of the fastest-growing secured loan sectors in India, with Non-Banking Financial Companies (NBFCs) and banks relying heavily on gold price stability to ensure the safety of their collateralized portfolios. However, gold prices exhibit high sensitivity to global market conditions, macroeconomic indicators, and commodity price fluctuations. Sudden declines in gold value directly increase the Loan-to-Value (LTV) ratio, exposing lenders to elevated default and margin-call risk.

To address this challenge, this project develops a comprehensive data-driven risk assessment and forecasting framework capable of predicting future gold prices, analyzing their impact on loan portfolios, and explaining the underlying drivers behind these predictions.


โ—2. Problem Statement

Financial institutions face significant operational and credit risks when gold prices exhibit volatility.
Key questions addressed in this research include:

  • ๐Ÿ”นHow will gold prices move in the next 30โ€“90 days?
  • ๐Ÿ”นWhat is the expected LTV position of a loan portfolio under normal and stressed market conditions?
  • ๐Ÿ”นWhich macroeconomic factors most strongly influence gold price trends?
  • ๐Ÿ”นHow can machine learning models provide interpretable and actionable insights?

The goal is to build an end-to-end analytical system that integrates data preprocessing, feature engineering, ML forecasting, explainability, and real-time analytics.


๐ŸŽฏ3. Objectives

Primary Objectives

  • ๐Ÿ“ˆ Forecast short-term gold price movements using classical, machine learning, and deep learning models.
  • ๐Ÿงฎ Estimate future LTV ratios for a synthetic gold loan portfolio.
  • ๐Ÿšจ Identify high-risk loans under multiple stress scenarios.
  • ๐Ÿ” Provide explainability of predictions using SHAP values.

Secondary Objectives

  • ๐Ÿ–ฅ Build a fully functional multi-tab Streamlit dashboard for interactive risk monitoring.
  • โšก Deploy GPU-optimized models for improved training and inference performance.

๐Ÿ”ฌ 4. Methodology

The project follows a structured data science lifecycle.


4.1 Data Acquisition

Three heterogeneous datasets were collected:

  • Daily MCX India gold prices (2014โ€“2025)
  • Gold ETF and macroeconomic indicators (USO, USD Index, S&P 500, Dow Jones, commodity prices)
  • World Gold Council historical gold data (1978โ€“2023)

4.2 Data Processing and Standardization

  • Cleaning missing and inconsistent values
  • Converting date fields to uniform time-index
  • Merging multi-source datasets
  • Interpolating missing macroeconomic indicators
  • Generating a unified daily dataset with 1659 observations ร— 98 features

4.3 Feature Engineering

Advanced time-series engineering was performed:

  • โฑ Lag features: 1, 2, 7, 14, 30 days
  • ๐Ÿ“‰ Moving averages: MA7, MA14, MA30, MA60
  • ๐Ÿ“Š Rolling volatility: STD7, STD14, STD30, STD60
  • ๐Ÿ”„ Trend direction flags
  • ๐Ÿ“… Date-derived features: Year, Month, Quarter, Day-of-Week
  • ๐ŸŒ Cross-market features from USO, USD Index, GDX, Silver, Oil, Bond yields

This enriched feature space significantly improved model predictive capacity.


๐Ÿค– 5. Modelling Approach


5.1 Classical Models

  • Linear Regression
    • Served as a baseline.
    • Demonstrated limitations due to non-linear relationships.

๐ŸŒฒ 5.2 Machine Learning Models

Random Forest Regressor

  • Captures non-linear interactions.
  • Provides feature-level interpretability.

โšก XGBoost Regressor (GPU Accelerated)

  • Trained using NVIDIA RTX 3050 (CUDA)
  • Delivered superior generalization
  • Exported to JSON format to ensure compatibility with XGBoost โ‰ฅ 3.1
  • Used for SHAP explainability

๐Ÿง  5.3 Deep Learning

LSTM (Long Short-Term Memory) Network

  • Input sequence length: 30 days
  • 2-layer architecture with 128 hidden units
  • Learned temporal dependencies effectively
  • Produced 30โ€“90 day forward forecasts

All models were evaluated using RMSE, MAE, and Rยฒ.


๐Ÿ“Š 6. Portfolio LTV Risk Modelling

A synthetic loan portfolio was generated with:

  • Gold purity
  • Loan amount
  • Disbursal date
  • Principal-to-purity-adjusted gold value
  • Historical gold price at issuance

Risk Metrics Computed

  • Current LTV
  • Forecasted LTV (using ML predictions)
  • Stress scenario LTVs: โˆ’5%, โˆ’10%, โˆ’20% gold price drop
  • High-risk loan detection

This enabled lenders to anticipate margin call triggers.


๐Ÿงฎ 7. Explainability & Model Interpretation

To ensure transparency and regulator-friendly modelling:

SHAP (Shapley Additive Explanations) was used.

Key drivers revealed:

  • USD Index
  • USO Close
  • S&P500 Close
  • GDX Gold Miner ETF
  • Lagged Adj Close values

SHAP ensured the system is interpretable and suitable for financial environments requiring auditability.


๐Ÿ–ฅ 8. Dashboard Implementation

A multi-tab Streamlit application was developed to integrate all analytical components.

๐Ÿ“Œ Tab 1 โ€” LTV Analysis

  • Portfolio distribution
  • LTV metrics
  • Stress-test evaluation
  • Risk flagging

๐Ÿ“Œ Tab 2 โ€” ML Forecast & Prediction

  • Single-day predictions (RF / XGB / LSTM)
  • Editable input panel
  • 30โ€“90 day LSTM forward forecast

๐Ÿ“Œ Tab 3 โ€” Explainability

  • SHAP summary plot
  • Feature importance rankings

This dashboard acts as a prototype risk analytics tool for fintechs, NBFCs, and banks.


๐Ÿ“ 9. Results

  • XGBoost produced the strongest predictive performance.
  • LSTM successfully captured long-term temporal patterns and provided stable forecasts.
  • SHAP validated that the model relies on meaningful macroeconomic indicators.
  • Stress testing correctly highlighted loans with potential LTV breaches.

๐Ÿ“ 10. Conclusion

This project demonstrates a complete real-world financial risk modelling system integrating:

  • Multi-source data
  • Advanced engineered features
  • Machine learning & deep learning approaches
  • Model explainability
  • Portfolio stress testing
  • Dashboard deployment

It accurately mirrors workflows used by NBFCs, banks, and fintech companies that depend on gold-backed lending.


๐Ÿš€ 11. Future Enhancements

  • Integrate real-time MCX/live gold price API
  • Add ARIMA / SARIMA for statistical comparison
  • Deploy dashboard on cloud with automated updates
  • Implement reinforcement learning for dynamic LTV thresholds
  • Expand modelling to multi-collateral risk systems
  • Introduce PostgreSQL database for portfolio storage

๐Ÿ›  12. Technologies Used

  • Pytho, Pandas, NumPy
  • Scikit-learn, XGBoost, PyTorch
  • SHAP, Statsmodels
  • Streamlit
  • Matplotlib, Seaborn
  • CUDA GPU (RTX 3050)
  • Jupyter Notebooks

๐Ÿ“‚ 13. Repository Structure

gold-price-ltv-analysis/
โ”‚
โ”œโ”€โ”€ Dataset/
โ”œโ”€โ”€ models/
โ”œโ”€โ”€ outputs/
โ”œโ”€โ”€ Notebooks/
โ”‚
โ”œโ”€โ”€ app.py
โ”œโ”€โ”€ lstm_model_def.py
โ”œโ”€โ”€ model_loader.py
โ”œโ”€โ”€ requirements.txt
โ””โ”€โ”€ README.md

About

It a Gold Price Impact and LTV Risk Analysis dashboard. I built it to understand how companies like Rupeek or Muthoot assess lending risk. The project involves 45 years of gold price data, forecasting future prices, simulating a loan portfolio of 500 customers, and calculating real-time LTV

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors