Summary
The current model trains on synthetic or limited sample data. To validate real-world performance and establish honest benchmarks, we should integrate well-known public fraud detection datasets.
Motivation
- Real-world fraud data has very different class imbalance ratios (~0.17% fraud in IEEE-CIS)
- Benchmarking against known datasets allows comparison with published research
- Exposes edge cases that synthetic data misses (merchant category patterns, time-of-day effects)
Proposed Approach
-
Add data loader modules for:
-
Preprocessing pipeline:
- Handle missing values and categorical encoding for IEEE-CIS
- Implement stratified train/test splitting preserving fraud ratio
- Add feature engineering for temporal and aggregation features
-
Benchmarking:
- Report precision, recall, F1, and AUC-PR (not just AUC-ROC, which is misleading with class imbalance)
- Compare against published baselines
- Document results in a benchmarks/ directory
Acceptance Criteria
Summary
The current model trains on synthetic or limited sample data. To validate real-world performance and establish honest benchmarks, we should integrate well-known public fraud detection datasets.
Motivation
Proposed Approach
Add data loader modules for:
Preprocessing pipeline:
Benchmarking:
Acceptance Criteria