Skip to content

AbdulBari33/clinical-deterioration-ai-synthea

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Clinical Deterioration AI – Synthea

This project builds and compares machine learning and deep learning models to predict short-term clinical deterioration using synthetic electronic health records (EHR) generated by Synthea. The goal is to explore whether routinely collected EHR data can support early warning systems that flag patients at higher risk in the near future.

The work combines:

  • classical ML baselines (Logistic Regression, Random Forest)
  • sequence-based deep learning (LSTM)
  • feature engineering from multi-table EHR (patients, encounters, observations)
  • evaluation using ROC-AUC, PR-AUC, precision, recall and confusion matrices

🎯 Problem

Task: Binary classification
Outcome: Did a patient experience deterioration within 30 days (yes/no)?
Data: Synthetic EHR from Synthea (patients, encounters, conditions, observations)
Unit of prediction: Encounter / patient-trajectory snapshots

The aim is to simulate a simplified early warning AI system that could, in principle, help clinicians prioritize monitoring and follow-up.


🗂 Data & Feature Engineering

Source: Synthea synthetic EHR

Key tables used:

  • patients – demographics
  • encounters – visit history
  • conditions – active diagnoses / comorbidities
  • observations – labs / vitals / measurements

Example engineered features:

  • age
  • active_conditions (count of active diagnoses)
  • past_encounters_30d, past_encounters_60d, past_encounters_90d
  • obs_total_before (total observations before the index encounter)
  • obs_30d, obs_60d, obs_90d (recency windows)
  • target_30d – indicator if a deterioration event occurs within 30 days

For sequence modelling, a fixed window of 10 past encounters per patient is constructed, padded where necessary.


🤖 Models

Baseline (static feature) models

  • Logistic Regression
  • Random Forest
  • (Optional: XGBoost / other tree ensembles)

These models operate on a single, engineered feature vector per encounter.

Sequence model (temporal)

  • LSTM (Long Short-Term Memory)
    Consumes a sequence of up to 10 past encounters per patient: shape = (n_samples, 10, n_features)

This captures temporal dynamics in encounter history rather than just static snapshots.


📊 Results (example summary)

Estimated performance on the held-out test set:

Model ROC-AUC PR-AUC Precision Recall
Logistic Regression 0.811 0.259 0.147 0.773
Random Forest 0.781 0.161 0.000 0.000
LSTM (sequence model) 0.753 0.228 0.333 0.136

Interpretation:

  • Logistic Regression achieves the best discrimination and recall, catching most high-risk patients but with relatively low precision.
  • The LSTM sequence model achieves higher precision (fewer false alarms) but lower recall at the default threshold, behaving more conservatively.
  • Random Forest collapses to majority prediction under the current settings, highlighting class imbalance and hyperparameter sensitivity.

This illustrates a trade-off between sensitivity (recall) and alert burden (precision), which is central to deploying AI in clinical environments.


📈 Visualizations

Suggested plots (stored under figures/):

  • Feature distributions (age, comorbidities, encounter counts)
  • Target distribution (class balance)
  • Correlation/feature importance plots
  • ROC & Precision–Recall curves for each model
  • Confusion matrices for baseline vs LSTM

FastAPI Service

This project also includes a modular FastAPI backend that exposes the clinical deterioration prediction logic through REST endpoints.

Run the API

.\.venv\Scripts\python.exe -m uvicorn clinical_deterioration_ai.api:app --reload --app-dir src

Project Structure

clinical-deterioration-ai-synthea/
├── data/
├── notebooks/
├── src/
│   └── clinical_deterioration_ai/
│       ├── __init__.py
│       ├── predictor.py
│       ├── preprocess.py
│       ├── model.py
│       ├── api.py
│       └── utils.py
├── pyproject.toml
├── README.md
└── docker-compose.yml

About

AI-powered early warning system for predicting short-term clinical deterioration from synthetic EHR data using Python, FastAPI, and ML models.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors