Skip to content

thatSandemaboy/medicare-fraud-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Medicare Fraud Detection

Graph-based anomaly detection for Medicare Part B billing data. Identifies suspicious billing patterns in skin substitutes and other high-fraud-risk services.

Portfolio Note

This project is the strongest public technical/research anchor in this GitHub profile. It combines public healthcare data, graph construction, anomaly scoring, and validation against independent enforcement signals.

Overview

This tool uses graph neural networks to detect anomalous billing patterns in publicly available CMS data. Unlike traditional tabular analysis, graph-based methods reveal coordinated fraud networks — kickback rings, referral schemes, and geographic clusters that individual provider analysis would miss.

Validated Results

The model identified 7 providers in its top 67 anomalies who were independently charged by DOJ:

Rank Provider State DOJ Action Amount
#12 Ira Denny, NP AZ Indicted Jun 2025 $209M
#19 Kinds AZ Indicted Part of $1.2B scheme
#36 Carlos Ching, MD AZ Guilty Plea 2024 Part of $1.2B scheme
#55 David Jenson, DPM TX Charged Jun 2025 $45M
#61 Bethany Jameson, NP AZ Guilty Plea 2024 Part of $1.2B scheme
#67 Gina Palacios, NP AZ Charged Jul 2025 $28M
#318 Alexander Frank OK LEIE Excluded Aug 2025 False claims

Detection rate: 9% of top-ranked anomalies were confirmed fraud — approximately 9x lift over random targeting.


Quick Start

# Clone repository
git clone https://github.com/thatSandemaboy/medicare-fraud-detection.git
cd medicare-fraud-detection

# Setup environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

# Download CMS data (public API)
python scripts/download_data.py

# Build provider graph
python scripts/build_graph.py

# Run anomaly detection
python scripts/detect_anomalies.py

# View results
python scripts/generate_report.py

Output: results/anomaly_rankings.csv — providers ranked by anomaly score.


Data Sources

All data is publicly available:

Source Description Link
CMS Provider Utilization Medicare Part B claims by provider/service data.cms.gov
HHS-OIG LEIE Excluded providers list oig.hhs.gov/exclusions
NPI Registry Provider details npiregistry.cms.hhs.gov

Methodology

1. Graph Construction

Builds a provider relationship graph:

  • Nodes: Providers (NPI), Products (HCPCS codes), States
  • Edges: Billing relationships, geographic co-location, product similarity
Provider A --[BILLS]--> Product X
Provider A --[LOCATED_IN]--> State Y
Provider A --[SAME_STATE]--> Provider B

2. Feature Engineering

For each provider:

  • Volume metrics: Total services, beneficiaries, average payment
  • Billing patterns: Charge-to-payment ratio, specialty alignment
  • Graph metrics: Degree centrality, PageRank, clustering coefficient

3. Anomaly Detection

Unsupervised approach (no labeled fraud data required):

  • Graph Autoencoder reconstruction error
  • Heuristic scoring (volume outliers, payment anomalies)
  • Combined score: 50% graph + 50% heuristic

Why Graph-Based?

Traditional analysis examines providers in isolation. Graph analysis reveals:

  • Coordinated billing patterns across multiple providers
  • Geographic clusters with unusual concentration
  • Referral networks that may indicate kickback schemes

The Arizona providers flagged by this model weren't isolated outliers — they were connected through the Gehrke/King scheme. Graph analysis surfaced this connection.


Project Structure

├── data/
│   ├── raw/                    # Downloaded CMS data
│   ├── processed/              # Cleaned datasets
│   └── graphs/                 # NetworkX graph files
├── models/
│   └── autoencoder.pt          # Trained GNN model
├── scripts/
│   ├── download_data.py        # CMS API data retrieval
│   ├── build_graph.py          # Graph construction
│   ├── detect_anomalies.py     # Run anomaly detection
│   └── generate_report.py      # Create output report
├── src/
│   ├── features/               # Feature engineering
│   ├── models/                 # GNN model definitions
│   └── utils/                  # Helper functions
├── results/
│   ├── anomaly_rankings.csv    # Ranked provider list
│   └── figures/                # Visualizations
└── docs/
    └── USAGE.md                # Detailed usage guide

For Investigators

See docs/USAGE.md for detailed guidance on:

  • Running the model on specific service codes (e.g., urinalysis, DME)
  • Interpreting anomaly scores
  • Exporting results for case development
  • Cross-referencing with LEIE and SAM exclusions

Key Output Fields

Field Description
npi National Provider Identifier
provider_name Provider name
state Practice state
specialty Provider specialty
total_services Medicare services billed (year)
total_paid Medicare payments received
anomaly_score Combined anomaly score (0-1, higher = more suspicious)
graph_score Graph-based anomaly component
heuristic_score Rule-based anomaly component
risk_flags Specific anomaly indicators

Extending to Other Services

The methodology works for any Medicare Part B service. To analyze a different service:

# Example: Urinalysis (CPT 81007)
python scripts/download_data.py --hcpcs 81007
python scripts/build_graph.py --input data/raw/81007_providers.csv
python scripts/detect_anomalies.py

Services with known fraud patterns:

  • Skin substitutes: Q4100-Q4397
  • Urinalysis: 81000-81099
  • Genetic testing: 81161-81599
  • DME (wheelchairs, etc.): E0100-E9999

Performance

Metric Value
Providers analyzed 2,659
Graph nodes 2,791
Graph edges 647,859
Processing time ~38 seconds
Memory usage <2GB

References

DOJ Cases

Academic

  • GAO-17-710: Medicare Fraud Prevention System evaluation
  • NBER Working Paper 30946: Unsupervised ML for Healthcare Fraud Detection
  • Yoo et al. (2023): Medicare Fraud Detection Using Graph Analysis — IEEE Access

Policy

  • OIG Skin Substitutes Report (Sept 2025)
  • Executive Order 14243: Stopping Waste, Fraud, and Abuse

License

MIT License — See LICENSE for details.


Author

Anthony Abavelim
GitHub: @thatSandemaboy

About

Graph-based anomaly detection for Medicare Part B billing fraud using public CMS data.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages