Gold Price Forecasting and Loan-to-Value (LTV) Risk Modelling Using Machine Learning and Deep Learning Techniques
Gold-backed lending is one of the fastest-growing secured loan sectors in India, with Non-Banking Financial Companies (NBFCs) and banks relying heavily on gold price stability to ensure the safety of their collateralized portfolios. However, gold prices exhibit high sensitivity to global market conditions, macroeconomic indicators, and commodity price fluctuations. Sudden declines in gold value directly increase the Loan-to-Value (LTV) ratio, exposing lenders to elevated default and margin-call risk.
To address this challenge, this project develops a comprehensive data-driven risk assessment and forecasting framework capable of predicting future gold prices, analyzing their impact on loan portfolios, and explaining the underlying drivers behind these predictions.
Financial institutions face significant operational and credit risks when gold prices exhibit volatility.
Key questions addressed in this research include:
- ๐นHow will gold prices move in the next 30โ90 days?
- ๐นWhat is the expected LTV position of a loan portfolio under normal and stressed market conditions?
- ๐นWhich macroeconomic factors most strongly influence gold price trends?
- ๐นHow can machine learning models provide interpretable and actionable insights?
The goal is to build an end-to-end analytical system that integrates data preprocessing, feature engineering, ML forecasting, explainability, and real-time analytics.
- ๐ Forecast short-term gold price movements using classical, machine learning, and deep learning models.
- ๐งฎ Estimate future LTV ratios for a synthetic gold loan portfolio.
- ๐จ Identify high-risk loans under multiple stress scenarios.
- ๐ Provide explainability of predictions using SHAP values.
- ๐ฅ Build a fully functional multi-tab Streamlit dashboard for interactive risk monitoring.
- โก Deploy GPU-optimized models for improved training and inference performance.
The project follows a structured data science lifecycle.
Three heterogeneous datasets were collected:
- Daily MCX India gold prices (2014โ2025)
- Gold ETF and macroeconomic indicators (USO, USD Index, S&P 500, Dow Jones, commodity prices)
- World Gold Council historical gold data (1978โ2023)
- Cleaning missing and inconsistent values
- Converting date fields to uniform time-index
- Merging multi-source datasets
- Interpolating missing macroeconomic indicators
- Generating a unified daily dataset with 1659 observations ร 98 features
Advanced time-series engineering was performed:
- โฑ Lag features: 1, 2, 7, 14, 30 days
- ๐ Moving averages: MA7, MA14, MA30, MA60
- ๐ Rolling volatility: STD7, STD14, STD30, STD60
- ๐ Trend direction flags
- ๐ Date-derived features: Year, Month, Quarter, Day-of-Week
- ๐ Cross-market features from USO, USD Index, GDX, Silver, Oil, Bond yields
This enriched feature space significantly improved model predictive capacity.
- Linear Regression
- Served as a baseline.
- Demonstrated limitations due to non-linear relationships.
- Captures non-linear interactions.
- Provides feature-level interpretability.
- Trained using NVIDIA RTX 3050 (CUDA)
- Delivered superior generalization
- Exported to JSON format to ensure compatibility with XGBoost โฅ 3.1
- Used for SHAP explainability
- Input sequence length: 30 days
- 2-layer architecture with 128 hidden units
- Learned temporal dependencies effectively
- Produced 30โ90 day forward forecasts
All models were evaluated using RMSE, MAE, and Rยฒ.
A synthetic loan portfolio was generated with:
- Gold purity
- Loan amount
- Disbursal date
- Principal-to-purity-adjusted gold value
- Historical gold price at issuance
- Current LTV
- Forecasted LTV (using ML predictions)
- Stress scenario LTVs: โ5%, โ10%, โ20% gold price drop
- High-risk loan detection
This enabled lenders to anticipate margin call triggers.
To ensure transparency and regulator-friendly modelling:
Key drivers revealed:
- USD Index
- USO Close
- S&P500 Close
- GDX Gold Miner ETF
- Lagged Adj Close values
SHAP ensured the system is interpretable and suitable for financial environments requiring auditability.
A multi-tab Streamlit application was developed to integrate all analytical components.
- Portfolio distribution
- LTV metrics
- Stress-test evaluation
- Risk flagging
- Single-day predictions (RF / XGB / LSTM)
- Editable input panel
- 30โ90 day LSTM forward forecast
- SHAP summary plot
- Feature importance rankings
This dashboard acts as a prototype risk analytics tool for fintechs, NBFCs, and banks.
- XGBoost produced the strongest predictive performance.
- LSTM successfully captured long-term temporal patterns and provided stable forecasts.
- SHAP validated that the model relies on meaningful macroeconomic indicators.
- Stress testing correctly highlighted loans with potential LTV breaches.
This project demonstrates a complete real-world financial risk modelling system integrating:
- Multi-source data
- Advanced engineered features
- Machine learning & deep learning approaches
- Model explainability
- Portfolio stress testing
- Dashboard deployment
It accurately mirrors workflows used by NBFCs, banks, and fintech companies that depend on gold-backed lending.
- Integrate real-time MCX/live gold price API
- Add ARIMA / SARIMA for statistical comparison
- Deploy dashboard on cloud with automated updates
- Implement reinforcement learning for dynamic LTV thresholds
- Expand modelling to multi-collateral risk systems
- Introduce PostgreSQL database for portfolio storage
- Pytho, Pandas, NumPy
- Scikit-learn, XGBoost, PyTorch
- SHAP, Statsmodels
- Streamlit
- Matplotlib, Seaborn
- CUDA GPU (RTX 3050)
- Jupyter Notebooks
gold-price-ltv-analysis/
โ
โโโ Dataset/
โโโ models/
โโโ outputs/
โโโ Notebooks/
โ
โโโ app.py
โโโ lstm_model_def.py
โโโ model_loader.py
โโโ requirements.txt
โโโ README.md