This project builds and compares machine learning and deep learning models to predict short-term clinical deterioration using synthetic electronic health records (EHR) generated by Synthea. The goal is to explore whether routinely collected EHR data can support early warning systems that flag patients at higher risk in the near future.
The work combines:
- classical ML baselines (Logistic Regression, Random Forest)
- sequence-based deep learning (LSTM)
- feature engineering from multi-table EHR (patients, encounters, observations)
- evaluation using ROC-AUC, PR-AUC, precision, recall and confusion matrices
Task: Binary classification
Outcome: Did a patient experience deterioration within 30 days (yes/no)?
Data: Synthetic EHR from Synthea (patients, encounters, conditions, observations)
Unit of prediction: Encounter / patient-trajectory snapshots
The aim is to simulate a simplified early warning AI system that could, in principle, help clinicians prioritize monitoring and follow-up.
Source: Synthea synthetic EHR
Key tables used:
patients– demographicsencounters– visit historyconditions– active diagnoses / comorbiditiesobservations– labs / vitals / measurements
Example engineered features:
ageactive_conditions(count of active diagnoses)past_encounters_30d,past_encounters_60d,past_encounters_90dobs_total_before(total observations before the index encounter)obs_30d,obs_60d,obs_90d(recency windows)target_30d– indicator if a deterioration event occurs within 30 days
For sequence modelling, a fixed window of 10 past encounters per patient is constructed, padded where necessary.
- Logistic Regression
- Random Forest
- (Optional: XGBoost / other tree ensembles)
These models operate on a single, engineered feature vector per encounter.
- LSTM (Long Short-Term Memory)
Consumes a sequence of up to 10 past encounters per patient:shape = (n_samples, 10, n_features)
This captures temporal dynamics in encounter history rather than just static snapshots.
Estimated performance on the held-out test set:
| Model | ROC-AUC | PR-AUC | Precision | Recall |
|---|---|---|---|---|
| Logistic Regression | 0.811 | 0.259 | 0.147 | 0.773 |
| Random Forest | 0.781 | 0.161 | 0.000 | 0.000 |
| LSTM (sequence model) | 0.753 | 0.228 | 0.333 | 0.136 |
Interpretation:
- Logistic Regression achieves the best discrimination and recall, catching most high-risk patients but with relatively low precision.
- The LSTM sequence model achieves higher precision (fewer false alarms) but lower recall at the default threshold, behaving more conservatively.
- Random Forest collapses to majority prediction under the current settings, highlighting class imbalance and hyperparameter sensitivity.
This illustrates a trade-off between sensitivity (recall) and alert burden (precision), which is central to deploying AI in clinical environments.
Suggested plots (stored under figures/):
- Feature distributions (age, comorbidities, encounter counts)
- Target distribution (class balance)
- Correlation/feature importance plots
- ROC & Precision–Recall curves for each model
- Confusion matrices for baseline vs LSTM
This project also includes a modular FastAPI backend that exposes the clinical deterioration prediction logic through REST endpoints.
.\.venv\Scripts\python.exe -m uvicorn clinical_deterioration_ai.api:app --reload --app-dir srcclinical-deterioration-ai-synthea/
├── data/
├── notebooks/
├── src/
│ └── clinical_deterioration_ai/
│ ├── __init__.py
│ ├── predictor.py
│ ├── preprocess.py
│ ├── model.py
│ ├── api.py
│ └── utils.py
├── pyproject.toml
├── README.md
└── docker-compose.yml