Add support for real-world datasets (Kaggle IEEE-CIS, Credit Card Fraud)

## Summary

The current model trains on synthetic or limited sample data. To validate real-world performance and establish honest benchmarks, we should integrate well-known public fraud detection datasets.

## Motivation

- Real-world fraud data has very different class imbalance ratios (~0.17% fraud in IEEE-CIS)
- Benchmarking against known datasets allows comparison with published research
- Exposes edge cases that synthetic data misses (merchant category patterns, time-of-day effects)

## Proposed Approach

1. **Add data loader modules** for:
   - [Kaggle IEEE-CIS Fraud Detection](https://www.kaggle.com/c/ieee-fraud-detection) (~590K transactions)
   - [Kaggle Credit Card Fraud Detection](https://www.kaggle.com/mlg-ulb/creditcardfraud) (~284K transactions, PCA-transformed)

2. **Preprocessing pipeline:**
   - Handle missing values and categorical encoding for IEEE-CIS
   - Implement stratified train/test splitting preserving fraud ratio
   - Add feature engineering for temporal and aggregation features

3. **Benchmarking:**
   - Report precision, recall, F1, and AUC-PR (not just AUC-ROC, which is misleading with class imbalance)
   - Compare against published baselines
   - Document results in a benchmarks/ directory

## Acceptance Criteria

- [ ] Data loaders for both datasets with automatic download/caching
- [ ] Preprocessing handles missing values and encoding
- [ ] Benchmark results documented with honest metrics
- [ ] README updated with dataset instructions and results table

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for real-world datasets (Kaggle IEEE-CIS, Credit Card Fraud) #1

Summary

Motivation

Proposed Approach

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add support for real-world datasets (Kaggle IEEE-CIS, Credit Card Fraud) #1

Description

Summary

Motivation

Proposed Approach

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions