TrustCart — AI-Powered E-Commerce Fraud Detection

TrustCart is a real-time fraud detection system for e-commerce listings. It combines a trained XGBoost classifier, statistical anomaly scoring, Groq LLM reasoning, and semantic duplicate detection to surface risky listings from Google Shopping and eBay — before you buy.

Live Demo

Why I Built This

Online shopping fraud isn't a website problem — it's a listing problem. A legitimate site like eBay or Google Shopping can host thousands of fraudulent listings side by side with genuine ones. Existing tools don't solve this.

In July 2025, Mozilla shut down Fakespot — the closest tool to what I wanted. What remained were tools that either check whether a website is trustworthy (ScamAdviser, F-Secure) or analyse reviews for fakery (ReviewMeta) — neither of which tells you whether the specific iPhone listing you're about to click is a scam.

I wanted a tool that answers one question: is this particular listing safe to buy from? TrustCart is that tool.

What Already Exists — and Where It Falls Short

Tool	What it checks	Platform coverage	Still active?
Fakespot	Fake reviews	Amazon, eBay, Walmart	❌ Shut down Jul 2025
ReviewMeta	Fake reviews	Amazon only	✅
Camelizer	Price history / fake discounts	Amazon only	✅
Honey	Price comparison, coupons	Multi-platform	✅
ScamAdviser	Website reputation	Site-level only	✅
F-Secure	Website safety	Site-level only	✅
Counterfake	Counterfeit sellers	Enterprise SaaS	✅

Every buyer-facing tool on this list has at least one of these fundamental limitations:

Site-level, not listing-level — ScamAdviser tells you eBay.com is safe. It says nothing about the seller charging $49 for an "iPhone 15 Pro" in the listings.
Single platform — ReviewMeta and Camelizer only cover Amazon, where most complaints actually originate, but leave eBay and Google Shopping completely unchecked.
Reviews only — A listing can have zero reviews and still be fraudulent. Review-based tools are blind to brand-new scam listings.
No ML risk scoring — Price history trackers and coupon finders are savings tools, not fraud detectors. None of them run a trained classifier against listing features.

Why TrustCart Is Different

TrustCart is the only consumer-facing tool that operates at the individual listing level with a full ML pipeline — not review sentiment, not site reputation, not price history alone.

Three signal types, combined:

Signal	How TrustCart uses it	What others do
Statistical	Price percentile rank, outlier scoring, seller trust weights	Camelizer does price history (Amazon only)
ML Classification	XGBoost on 17 features — seller rating, sales volume, price percentile, condition, platform	Nobody else applies a trained classifier to individual listings
LLM Reasoning	Groq/LLaMA generates plain-English red flags and a buy recommendation	Fakespot had NLP for reviews; no tool explains listing-level fraud in plain English

What no competitor does at all:

Cross-platform duplicate detection — TF-IDF cosine similarity flags when the same listing appears across Google Shopping and eBay at different prices, exposing arbitrage scams and price manipulation.
Calibrated trust scoring — The LLM verdict gates the safety score. An AVOID recommendation hard-caps the score at 20%, preventing the model from contradicting itself.
Multi-platform scraping in one query — One search returns ranked, risk-scored results from both Google Shopping and eBay simultaneously.

The gap Fakespot's shutdown left is real. TrustCart fills it — not for review analysis, but for the harder problem of per-listing fraud risk at the point of search.

How It Works

A search query triggers a four-stage pipeline:

Scraping — Listings fetched live from Google Shopping and eBay via SerpAPI
Statistical Scoring — Percentile-based price analysis, seller trust signals, and a weighted 4-factor rule model
XGBoost Classification — 17-feature gradient-boosted classifier assigns a fraud probability to each listing
Groq LLM Explanation — Plain-English fraud reasoning, red flags, and a buy recommendation for the top risky items

Results are deduplicated using TF-IDF cosine similarity to surface cross-platform price comparisons.

Model Performance

Benchmarked on 714 real scraped listings (Google Shopping + eBay), labeled via Groq LLM and validated against rule-based scores.

XGBoost vs. Rule-Based Baseline

Metric	XGBoost	Rule-Based	Delta
F1 Score	91.6%	84.5%	+7.1%
Recall	98.0%	77.3%	+20.7%
Precision	86.0%	93.2%	−7.2%
Accuracy	88.8%	82.4%	+6.4%
AUC (ROC)	92.4%	93.6%	−1.2%

High recall (98%) is the priority — catching fraudulent listings matters more than the occasional false positive.

Per-Category F1 (XGBoost, real data)

Category	F1	Category	F1
Used Cars	100%	Gaming Laptop	95.2%
Luxury Watch	100%	Headphones	96.6%
Luxury Handbag	100%	PS5 / Console	96.6%
iPhone	91.6%	Books	85.6%
Hair Dryer	80.0%	Furniture	69.6%

Top Predictive Features (XGBoost Importance)

Rank	Feature	Importance
1	Seller Rating	60.0%
2	Quantity Sold (log)	14.8%
3	Product Rating	10.9%
4	Price Percentile	3.2%
5	Review Count (log)	2.8%
6	Log Price	2.4%
7	Condition (New)	1.9%
8	Seller Feedback %	1.8%
9	Platform (eBay)	1.1%

Architecture

User Query
    │
    ▼
┌─────────────────────────────────┐
│     FastAPI Backend             │
└──────────┬──────────────────────┘
           │
    ┌──────┴──────┐
    ▼             ▼
SerpAPI       Fraud Detection Pipeline
Scraping      │
│             ├─ 1. Statistical Scoring
│ Google         Price: 50% | Seller: 25%
│ Shopping       Attributes: 15% | History: 10%
│ eBay        │
│             ├─ 2. XGBoost Classifier (17 features)
│                seller_rating, quantity_sold,
│                price_percentile, rating, reviews,
│                seller_feedback_pct, platform,
│                condition, dynamic_trust flags
│             │
│             ├─ 3. Groq LLM Explanation
│                llama-3.1-8b-instant
│                Structured JSON · Cached
│             │
│             └─ 4. Duplicate Detection
│                TF-IDF cosine similarity (0.82)
│                Cross-platform pair matching
│
└──────────────────────────
    Tailwind CSS / Vanilla JS
    Glass-morphism UI
    Real-time animated stages

Tech Stack

Layer	Technology
Backend	FastAPI 0.128 (Python 3.11+)
ML Model	XGBoost 1.7+, scikit-learn
LLM	Groq API — LLaMA 3.1-8B Instant
Duplicate Detection	TF-IDF cosine similarity
Data Collection	SerpAPI (Google Shopping + eBay)
Frontend	HTML5, Tailwind CSS, Vanilla JavaScript
Deployment	Railway (CI/CD from GitHub)

Fraud Detection — Deep Dive

Stage 1 — Statistical Risk Scoring

Component	Weight	Signals
Price Analysis	50%	Percentile rank, outlier removal (>10× median), trusted seller discount
Seller Reputation	25%	Item rating, review count, seller rating, eBay feedback %, dynamic trust
Product Attributes	15%	Condition, title length and quality
Historical Patterns	10%	Platform risk, category-specific baselines

Dynamic trusted seller logic: eBay sellers with ≥1,000 ratings and ≥98% positive feedback are automatically trusted — no hardcoding needed.

Risk thresholds: LOW (< 0.25) · MEDIUM (0.25–0.55) · HIGH (≥ 0.55)

Stage 2 — XGBoost Classifier

Trained on 10,000 synthetic listings (35% fraud / 45% legitimate / 20% edge cases) and validated on 714 real scraped listings labeled via Groq LLM.

17 input features across price, seller trust, platform, condition, and sales volume. quantity_sold emerged as the #2 most important feature (14.8%) — high-volume listings are a strong legitimacy signal.

Stage 3 — Groq LLM Explanation

Generates structured fraud analysis for the top 5 risky items per search: scam probability, specific red flags, plain-English reasoning, and a buy recommendation. Trust score is capped by the LLM output — an AVOID verdict limits the safety score to 20%, preventing contradictory results.

Stage 4 — Semantic Duplicate Detection

TF-IDF vectorization with cosine similarity threshold of 0.82 and Union-Find clustering. Flags when the same item appears across multiple sellers or platforms so users can compare before buying.

Local Development

Prerequisites

Python 3.11+
libomp (macOS only): brew install libomp
SerpAPI key · Groq API key

Setup

git clone https://github.com/Msundara19/Trustcart.git
cd Trustcart

brew install libomp          # macOS only

python -m venv venv
source venv/bin/activate     # Windows: venv\Scripts\activate
pip install -r requirements.txt

cp .env.example .env         # Add SERPAPI_KEY and GROQ_API_KEY

uvicorn main:app --reload    # → http://localhost:8000

Roadmap

Security & Privacy

No user data collected — stateless API, no tracking or storage
API keys stored in environment variables, never committed
HTTPS enforced in production (Railway)

Author

Meenakshi Sridharan

📧 msridharansundaram@hawk.illinoistech.edu

Acknowledgments

Tool	Role
Groq	Ultra-fast LLM inference (LPU hardware)
SerpAPI	Google Shopping + eBay scraping
XGBoost	Gradient boosting classifier
FastAPI	Async Python web framework
Railway	Zero-config cloud deployment

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
app		app
data		data
static		static
.env.example		.env.example
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TrustCart — AI-Powered E-Commerce Fraud Detection

Why I Built This

What Already Exists — and Where It Falls Short

Why TrustCart Is Different

How It Works

Model Performance

XGBoost vs. Rule-Based Baseline

Per-Category F1 (XGBoost, real data)

Top Predictive Features (XGBoost Importance)

Architecture

Tech Stack

Fraud Detection — Deep Dive

Stage 1 — Statistical Risk Scoring

Stage 2 — XGBoost Classifier

Stage 3 — Groq LLM Explanation

Stage 4 — Semantic Duplicate Detection

Local Development

Prerequisites

Setup

Roadmap

Security & Privacy

Author

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TrustCart — AI-Powered E-Commerce Fraud Detection

Why I Built This

What Already Exists — and Where It Falls Short

Why TrustCart Is Different

How It Works

Model Performance

XGBoost vs. Rule-Based Baseline

Per-Category F1 (XGBoost, real data)

Top Predictive Features (XGBoost Importance)

Architecture

Tech Stack

Fraud Detection — Deep Dive

Stage 1 — Statistical Risk Scoring

Stage 2 — XGBoost Classifier

Stage 3 — Groq LLM Explanation

Stage 4 — Semantic Duplicate Detection

Local Development

Prerequisites

Setup

Roadmap

Security & Privacy

Author

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages