Graph-based anomaly detection for Medicare Part B billing data. Identifies suspicious billing patterns in skin substitutes and other high-fraud-risk services.
This project is the strongest public technical/research anchor in this GitHub profile. It combines public healthcare data, graph construction, anomaly scoring, and validation against independent enforcement signals.
This tool uses graph neural networks to detect anomalous billing patterns in publicly available CMS data. Unlike traditional tabular analysis, graph-based methods reveal coordinated fraud networks — kickback rings, referral schemes, and geographic clusters that individual provider analysis would miss.
The model identified 7 providers in its top 67 anomalies who were independently charged by DOJ:
| Rank | Provider | State | DOJ Action | Amount |
|---|---|---|---|---|
| #12 | Ira Denny, NP | AZ | Indicted Jun 2025 | $209M |
| #19 | Kinds | AZ | Indicted | Part of $1.2B scheme |
| #36 | Carlos Ching, MD | AZ | Guilty Plea 2024 | Part of $1.2B scheme |
| #55 | David Jenson, DPM | TX | Charged Jun 2025 | $45M |
| #61 | Bethany Jameson, NP | AZ | Guilty Plea 2024 | Part of $1.2B scheme |
| #67 | Gina Palacios, NP | AZ | Charged Jul 2025 | $28M |
| #318 | Alexander Frank | OK | LEIE Excluded Aug 2025 | False claims |
Detection rate: 9% of top-ranked anomalies were confirmed fraud — approximately 9x lift over random targeting.
# Clone repository
git clone https://github.com/thatSandemaboy/medicare-fraud-detection.git
cd medicare-fraud-detection
# Setup environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
# Download CMS data (public API)
python scripts/download_data.py
# Build provider graph
python scripts/build_graph.py
# Run anomaly detection
python scripts/detect_anomalies.py
# View results
python scripts/generate_report.pyOutput: results/anomaly_rankings.csv — providers ranked by anomaly score.
All data is publicly available:
| Source | Description | Link |
|---|---|---|
| CMS Provider Utilization | Medicare Part B claims by provider/service | data.cms.gov |
| HHS-OIG LEIE | Excluded providers list | oig.hhs.gov/exclusions |
| NPI Registry | Provider details | npiregistry.cms.hhs.gov |
Builds a provider relationship graph:
- Nodes: Providers (NPI), Products (HCPCS codes), States
- Edges: Billing relationships, geographic co-location, product similarity
Provider A --[BILLS]--> Product X
Provider A --[LOCATED_IN]--> State Y
Provider A --[SAME_STATE]--> Provider B
For each provider:
- Volume metrics: Total services, beneficiaries, average payment
- Billing patterns: Charge-to-payment ratio, specialty alignment
- Graph metrics: Degree centrality, PageRank, clustering coefficient
Unsupervised approach (no labeled fraud data required):
- Graph Autoencoder reconstruction error
- Heuristic scoring (volume outliers, payment anomalies)
- Combined score: 50% graph + 50% heuristic
Traditional analysis examines providers in isolation. Graph analysis reveals:
- Coordinated billing patterns across multiple providers
- Geographic clusters with unusual concentration
- Referral networks that may indicate kickback schemes
The Arizona providers flagged by this model weren't isolated outliers — they were connected through the Gehrke/King scheme. Graph analysis surfaced this connection.
├── data/
│ ├── raw/ # Downloaded CMS data
│ ├── processed/ # Cleaned datasets
│ └── graphs/ # NetworkX graph files
├── models/
│ └── autoencoder.pt # Trained GNN model
├── scripts/
│ ├── download_data.py # CMS API data retrieval
│ ├── build_graph.py # Graph construction
│ ├── detect_anomalies.py # Run anomaly detection
│ └── generate_report.py # Create output report
├── src/
│ ├── features/ # Feature engineering
│ ├── models/ # GNN model definitions
│ └── utils/ # Helper functions
├── results/
│ ├── anomaly_rankings.csv # Ranked provider list
│ └── figures/ # Visualizations
└── docs/
└── USAGE.md # Detailed usage guide
See docs/USAGE.md for detailed guidance on:
- Running the model on specific service codes (e.g., urinalysis, DME)
- Interpreting anomaly scores
- Exporting results for case development
- Cross-referencing with LEIE and SAM exclusions
| Field | Description |
|---|---|
npi |
National Provider Identifier |
provider_name |
Provider name |
state |
Practice state |
specialty |
Provider specialty |
total_services |
Medicare services billed (year) |
total_paid |
Medicare payments received |
anomaly_score |
Combined anomaly score (0-1, higher = more suspicious) |
graph_score |
Graph-based anomaly component |
heuristic_score |
Rule-based anomaly component |
risk_flags |
Specific anomaly indicators |
The methodology works for any Medicare Part B service. To analyze a different service:
# Example: Urinalysis (CPT 81007)
python scripts/download_data.py --hcpcs 81007
python scripts/build_graph.py --input data/raw/81007_providers.csv
python scripts/detect_anomalies.pyServices with known fraud patterns:
- Skin substitutes: Q4100-Q4397
- Urinalysis: 81000-81099
- Genetic testing: 81161-81599
- DME (wheelchairs, etc.): E0100-E9999
| Metric | Value |
|---|---|
| Providers analyzed | 2,659 |
| Graph nodes | 2,791 |
| Graph edges | 647,859 |
| Processing time | ~38 seconds |
| Memory usage | <2GB |
- GAO-17-710: Medicare Fraud Prevention System evaluation
- NBER Working Paper 30946: Unsupervised ML for Healthcare Fraud Detection
- Yoo et al. (2023): Medicare Fraud Detection Using Graph Analysis — IEEE Access
- OIG Skin Substitutes Report (Sept 2025)
- Executive Order 14243: Stopping Waste, Fraud, and Abuse
MIT License — See LICENSE for details.
Anthony Abavelim
GitHub: @thatSandemaboy