Skip to content

lxq472/tp6-healthcare-diabetes-risk-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Healthcare Diabetes Risk Prediction

Project Overview

This project analyzes clinical patient data to predict diabetes risk using SQL, Python (Machine Learning), and Tableau visualization.

The goal is to simulate a real-world healthcare analytics workflow including:

  • Data cleaning & quality checks (SQL)
  • Risk modeling using Logistic Regression (Python)
  • Model evaluation (ROC, Accuracy, Classification metrics)
  • Risk segmentation (Low / Medium / High)
  • Interactive healthcare dashboard (Tableau)

Tools & Technologies

  • SQL (PostgreSQL)
  • Python (Pandas, Scikit-learn, Matplotlib, Seaborn)
  • Machine Learning (Logistic Regression)
  • Tableau (Data Visualization)
  • GitHub (Project Portfolio)

Dataset

Clinical dataset containing patient health indicators:

  • Glucose
  • BMI
  • Blood Pressure
  • Insulin
  • Age
  • Diabetes Outcome

Machine Learning Model

Logistic Regression was used to predict diabetes probability.

Model Performance

  • Accuracy: ~0.80
  • ROC AUC: ~0.85
  • High Risk Precision: strong clinical signal

The model was further used to create Risk Segments:

  • Low Risk
  • Medium Risk
  • High Risk

SQL Analysis

  • Data Quality Check
  • Diabetes prevalence
  • Feature importance (avg by outcome)
  • Risk segment distribution
  • Probability sanity check

Tableau Dashboard

The dashboard visualizes:


Business / Healthcare Insight

  • High glucose and BMI strongly correlate with diabetes
  • Risk segmentation can help early screening
  • Model can support preventive healthcare analytics

Releases

No releases published

Packages

 
 
 

Contributors