Clinical Deterioration AI – Synthea

This project builds and compares machine learning and deep learning models to predict short-term clinical deterioration using synthetic electronic health records (EHR) generated by Synthea. The goal is to explore whether routinely collected EHR data can support early warning systems that flag patients at higher risk in the near future.

The work combines:

classical ML baselines (Logistic Regression, Random Forest)
sequence-based deep learning (LSTM)
feature engineering from multi-table EHR (patients, encounters, observations)
evaluation using ROC-AUC, PR-AUC, precision, recall and confusion matrices

🎯 Problem

Task: Binary classification
Outcome: Did a patient experience deterioration within 30 days (yes/no)?
Data: Synthetic EHR from Synthea (patients, encounters, conditions, observations)
Unit of prediction: Encounter / patient-trajectory snapshots

The aim is to simulate a simplified early warning AI system that could, in principle, help clinicians prioritize monitoring and follow-up.

🗂 Data & Feature Engineering

Source: Synthea synthetic EHR

Key tables used:

patients – demographics
encounters – visit history
conditions – active diagnoses / comorbidities
observations – labs / vitals / measurements

Example engineered features:

age
active_conditions (count of active diagnoses)
past_encounters_30d, past_encounters_60d, past_encounters_90d
obs_total_before (total observations before the index encounter)
obs_30d, obs_60d, obs_90d (recency windows)
target_30d – indicator if a deterioration event occurs within 30 days

For sequence modelling, a fixed window of 10 past encounters per patient is constructed, padded where necessary.

🤖 Models

Baseline (static feature) models

Logistic Regression
Random Forest
(Optional: XGBoost / other tree ensembles)

These models operate on a single, engineered feature vector per encounter.

Sequence model (temporal)

LSTM (Long Short-Term Memory)
Consumes a sequence of up to 10 past encounters per patient: shape = (n_samples, 10, n_features)

This captures temporal dynamics in encounter history rather than just static snapshots.

📊 Results (example summary)

Estimated performance on the held-out test set:

Model	ROC-AUC	PR-AUC	Precision	Recall
Logistic Regression	0.811	0.259	0.147	0.773
Random Forest	0.781	0.161	0.000	0.000
LSTM (sequence model)	0.753	0.228	0.333	0.136

Interpretation:

Logistic Regression achieves the best discrimination and recall, catching most high-risk patients but with relatively low precision.
The LSTM sequence model achieves higher precision (fewer false alarms) but lower recall at the default threshold, behaving more conservatively.
Random Forest collapses to majority prediction under the current settings, highlighting class imbalance and hyperparameter sensitivity.

This illustrates a trade-off between sensitivity (recall) and alert burden (precision), which is central to deploying AI in clinical environments.

📈 Visualizations

Suggested plots (stored under figures/):

Feature distributions (age, comorbidities, encounter counts)
Target distribution (class balance)
Correlation/feature importance plots
ROC & Precision–Recall curves for each model
Confusion matrices for baseline vs LSTM

FastAPI Service

This project also includes a modular FastAPI backend that exposes the clinical deterioration prediction logic through REST endpoints.

Run the API

.\.venv\Scripts\python.exe -m uvicorn clinical_deterioration_ai.api:app --reload --app-dir src

Project Structure

clinical-deterioration-ai-synthea/
├── data/
├── notebooks/
├── src/
│   └── clinical_deterioration_ai/
│       ├── __init__.py
│       ├── predictor.py
│       ├── preprocess.py
│       ├── model.py
│       ├── api.py
│       └── utils.py
├── pyproject.toml
├── README.md
└── docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clinical Deterioration AI – Synthea

🎯 Problem

🗂 Data & Feature Engineering

🤖 Models

Baseline (static feature) models

Sequence model (temporal)

📊 Results (example summary)

📈 Visualizations

FastAPI Service

Run the API

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
notebooks		notebooks
src		src
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Clinical Deterioration AI – Synthea

🎯 Problem

🗂 Data & Feature Engineering

🤖 Models

Baseline (static feature) models

Sequence model (temporal)

📊 Results (example summary)

📈 Visualizations

FastAPI Service

Run the API

Project Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages