This project analyzes clinical patient data to predict diabetes risk using SQL, Python (Machine Learning), and Tableau visualization.
The goal is to simulate a real-world healthcare analytics workflow including:
- Data cleaning & quality checks (SQL)
- Risk modeling using Logistic Regression (Python)
- Model evaluation (ROC, Accuracy, Classification metrics)
- Risk segmentation (Low / Medium / High)
- Interactive healthcare dashboard (Tableau)
- SQL (PostgreSQL)
- Python (Pandas, Scikit-learn, Matplotlib, Seaborn)
- Machine Learning (Logistic Regression)
- Tableau (Data Visualization)
- GitHub (Project Portfolio)
Clinical dataset containing patient health indicators:
- Glucose
- BMI
- Blood Pressure
- Insulin
- Age
- Diabetes Outcome
Logistic Regression was used to predict diabetes probability.
Model Performance
- Accuracy: ~0.80
- ROC AUC: ~0.85
- High Risk Precision: strong clinical signal
The model was further used to create Risk Segments:
- Low Risk
- Medium Risk
- High Risk
- Data Quality Check
- Diabetes prevalence
- Feature importance (avg by outcome)
- Risk segment distribution
- Probability sanity check
The dashboard visualizes:
- Diabetes prevalence
- Risk distribution
- Probability vs Outcome
- Feature comparison
- Clinical insights
- Tableau Public link: https://public.tableau.com/app/profile/nikolaos.giannoulis/viz/HealthcareDiabetesRiskPredictionMachineLearningClinicalAnalytics/DiabetesRiskPredictionClinicalAnalyticsDashboard
- High glucose and BMI strongly correlate with diabetes
- Risk segmentation can help early screening
- Model can support preventive healthcare analytics