Medicare Fraud Detection

Graph-based anomaly detection for Medicare Part B billing data. Identifies suspicious billing patterns in skin substitutes and other high-fraud-risk services.

Portfolio Note

This project is the strongest public technical/research anchor in this GitHub profile. It combines public healthcare data, graph construction, anomaly scoring, and validation against independent enforcement signals.

Overview

This tool uses graph neural networks to detect anomalous billing patterns in publicly available CMS data. Unlike traditional tabular analysis, graph-based methods reveal coordinated fraud networks — kickback rings, referral schemes, and geographic clusters that individual provider analysis would miss.

Validated Results

The model identified 7 providers in its top 67 anomalies who were independently charged by DOJ:

Rank	Provider	State	DOJ Action	Amount
#12	Ira Denny, NP	AZ	Indicted Jun 2025	$209M
#19	Kinds	AZ	Indicted	Part of $1.2B scheme
#36	Carlos Ching, MD	AZ	Guilty Plea 2024	Part of $1.2B scheme
#55	David Jenson, DPM	TX	Charged Jun 2025	$45M
#61	Bethany Jameson, NP	AZ	Guilty Plea 2024	Part of $1.2B scheme
#67	Gina Palacios, NP	AZ	Charged Jul 2025	$28M
#318	Alexander Frank	OK	LEIE Excluded Aug 2025	False claims

Detection rate: 9% of top-ranked anomalies were confirmed fraud — approximately 9x lift over random targeting.

Quick Start

# Clone repository
git clone https://github.com/thatSandemaboy/medicare-fraud-detection.git
cd medicare-fraud-detection

# Setup environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

# Download CMS data (public API)
python scripts/download_data.py

# Build provider graph
python scripts/build_graph.py

# Run anomaly detection
python scripts/detect_anomalies.py

# View results
python scripts/generate_report.py

Output: results/anomaly_rankings.csv — providers ranked by anomaly score.

Data Sources

All data is publicly available:

Source	Description	Link
CMS Provider Utilization	Medicare Part B claims by provider/service	data.cms.gov
HHS-OIG LEIE	Excluded providers list	oig.hhs.gov/exclusions
NPI Registry	Provider details	npiregistry.cms.hhs.gov

Methodology

1. Graph Construction

Builds a provider relationship graph:

Nodes: Providers (NPI), Products (HCPCS codes), States
Edges: Billing relationships, geographic co-location, product similarity

Provider A --[BILLS]--> Product X
Provider A --[LOCATED_IN]--> State Y
Provider A --[SAME_STATE]--> Provider B

2. Feature Engineering

For each provider:

Volume metrics: Total services, beneficiaries, average payment
Billing patterns: Charge-to-payment ratio, specialty alignment
Graph metrics: Degree centrality, PageRank, clustering coefficient

3. Anomaly Detection

Unsupervised approach (no labeled fraud data required):

Graph Autoencoder reconstruction error
Heuristic scoring (volume outliers, payment anomalies)
Combined score: 50% graph + 50% heuristic

Why Graph-Based?

Traditional analysis examines providers in isolation. Graph analysis reveals:

Coordinated billing patterns across multiple providers
Geographic clusters with unusual concentration
Referral networks that may indicate kickback schemes

The Arizona providers flagged by this model weren't isolated outliers — they were connected through the Gehrke/King scheme. Graph analysis surfaced this connection.

Project Structure

├── data/
│   ├── raw/                    # Downloaded CMS data
│   ├── processed/              # Cleaned datasets
│   └── graphs/                 # NetworkX graph files
├── models/
│   └── autoencoder.pt          # Trained GNN model
├── scripts/
│   ├── download_data.py        # CMS API data retrieval
│   ├── build_graph.py          # Graph construction
│   ├── detect_anomalies.py     # Run anomaly detection
│   └── generate_report.py      # Create output report
├── src/
│   ├── features/               # Feature engineering
│   ├── models/                 # GNN model definitions
│   └── utils/                  # Helper functions
├── results/
│   ├── anomaly_rankings.csv    # Ranked provider list
│   └── figures/                # Visualizations
└── docs/
    └── USAGE.md                # Detailed usage guide

For Investigators

See docs/USAGE.md for detailed guidance on:

Running the model on specific service codes (e.g., urinalysis, DME)
Interpreting anomaly scores
Exporting results for case development
Cross-referencing with LEIE and SAM exclusions

Key Output Fields

Field	Description
`npi`	National Provider Identifier
`provider_name`	Provider name
`state`	Practice state
`specialty`	Provider specialty
`total_services`	Medicare services billed (year)
`total_paid`	Medicare payments received
`anomaly_score`	Combined anomaly score (0-1, higher = more suspicious)
`graph_score`	Graph-based anomaly component
`heuristic_score`	Rule-based anomaly component
`risk_flags`	Specific anomaly indicators

Extending to Other Services

The methodology works for any Medicare Part B service. To analyze a different service:

# Example: Urinalysis (CPT 81007)
python scripts/download_data.py --hcpcs 81007
python scripts/build_graph.py --input data/raw/81007_providers.csv
python scripts/detect_anomalies.py

Services with known fraud patterns:

Skin substitutes: Q4100-Q4397
Urinalysis: 81000-81099
Genetic testing: 81161-81599
DME (wheelchairs, etc.): E0100-E9999

Performance

Metric	Value
Providers analyzed	2,659
Graph nodes	2,791
Graph edges	647,859
Processing time	~38 seconds
Memory usage	<2GB

References

DOJ Cases

Academic

GAO-17-710: Medicare Fraud Prevention System evaluation
NBER Working Paper 30946: Unsupervised ML for Healthcare Fraud Detection
Yoo et al. (2023): Medicare Fraud Detection Using Graph Analysis — IEEE Access

Policy

OIG Skin Substitutes Report (Sept 2025)
Executive Order 14243: Stopping Waste, Fraud, and Abuse

License

MIT License — See LICENSE for details.

Author

Anthony Abavelim
GitHub: @thatSandemaboy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Medicare Fraud Detection

Portfolio Note

Overview

Validated Results

Quick Start

Data Sources

Methodology

1. Graph Construction

2. Feature Engineering

3. Anomaly Detection

Why Graph-Based?

Project Structure

For Investigators

Key Output Fields

Extending to Other Services

Performance

References

DOJ Cases

Academic

Policy

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
data		data
docs		docs
models		models
results		results
scripts		scripts
src		src
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Medicare Fraud Detection

Portfolio Note

Overview

Validated Results

Quick Start

Data Sources

Methodology

1. Graph Construction

2. Feature Engineering

3. Anomaly Detection

Why Graph-Based?

Project Structure

For Investigators

Key Output Fields

Extending to Other Services

Performance

References

DOJ Cases

Academic

Policy

License

Author

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages