Skip to content

Msundara19/Trustcart

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TrustCart — AI-Powered E-Commerce Fraud Detection

Live Demo Python FastAPI XGBoost Groq LLM License

TrustCart is a real-time fraud detection system for e-commerce listings. It combines a trained XGBoost classifier, statistical anomaly scoring, Groq LLM reasoning, and semantic duplicate detection to surface risky listings from Google Shopping and eBay — before you buy.

Live Demo

image

Why I Built This

Online shopping fraud isn't a website problem — it's a listing problem. A legitimate site like eBay or Google Shopping can host thousands of fraudulent listings side by side with genuine ones. Existing tools don't solve this.

In July 2025, Mozilla shut down Fakespot — the closest tool to what I wanted. What remained were tools that either check whether a website is trustworthy (ScamAdviser, F-Secure) or analyse reviews for fakery (ReviewMeta) — neither of which tells you whether the specific iPhone listing you're about to click is a scam.

I wanted a tool that answers one question: is this particular listing safe to buy from? TrustCart is that tool.


What Already Exists — and Where It Falls Short

Tool What it checks Platform coverage Still active?
Fakespot Fake reviews Amazon, eBay, Walmart ❌ Shut down Jul 2025
ReviewMeta Fake reviews Amazon only
Camelizer Price history / fake discounts Amazon only
Honey Price comparison, coupons Multi-platform
ScamAdviser Website reputation Site-level only
F-Secure Website safety Site-level only
Counterfake Counterfeit sellers Enterprise SaaS

Every buyer-facing tool on this list has at least one of these fundamental limitations:

  • Site-level, not listing-level — ScamAdviser tells you eBay.com is safe. It says nothing about the seller charging $49 for an "iPhone 15 Pro" in the listings.
  • Single platform — ReviewMeta and Camelizer only cover Amazon, where most complaints actually originate, but leave eBay and Google Shopping completely unchecked.
  • Reviews only — A listing can have zero reviews and still be fraudulent. Review-based tools are blind to brand-new scam listings.
  • No ML risk scoring — Price history trackers and coupon finders are savings tools, not fraud detectors. None of them run a trained classifier against listing features.

Why TrustCart Is Different

TrustCart is the only consumer-facing tool that operates at the individual listing level with a full ML pipeline — not review sentiment, not site reputation, not price history alone.

Three signal types, combined:

Signal How TrustCart uses it What others do
Statistical Price percentile rank, outlier scoring, seller trust weights Camelizer does price history (Amazon only)
ML Classification XGBoost on 17 features — seller rating, sales volume, price percentile, condition, platform Nobody else applies a trained classifier to individual listings
LLM Reasoning Groq/LLaMA generates plain-English red flags and a buy recommendation Fakespot had NLP for reviews; no tool explains listing-level fraud in plain English

What no competitor does at all:

  • Cross-platform duplicate detection — TF-IDF cosine similarity flags when the same listing appears across Google Shopping and eBay at different prices, exposing arbitrage scams and price manipulation.
  • Calibrated trust scoring — The LLM verdict gates the safety score. An AVOID recommendation hard-caps the score at 20%, preventing the model from contradicting itself.
  • Multi-platform scraping in one query — One search returns ranked, risk-scored results from both Google Shopping and eBay simultaneously.

The gap Fakespot's shutdown left is real. TrustCart fills it — not for review analysis, but for the harder problem of per-listing fraud risk at the point of search.


How It Works

A search query triggers a four-stage pipeline:

  1. Scraping — Listings fetched live from Google Shopping and eBay via SerpAPI
  2. Statistical Scoring — Percentile-based price analysis, seller trust signals, and a weighted 4-factor rule model
  3. XGBoost Classification — 17-feature gradient-boosted classifier assigns a fraud probability to each listing
  4. Groq LLM Explanation — Plain-English fraud reasoning, red flags, and a buy recommendation for the top risky items

Results are deduplicated using TF-IDF cosine similarity to surface cross-platform price comparisons.


Model Performance

Benchmarked on 714 real scraped listings (Google Shopping + eBay), labeled via Groq LLM and validated against rule-based scores.

XGBoost vs. Rule-Based Baseline

Metric XGBoost Rule-Based Delta
F1 Score 91.6% 84.5% +7.1%
Recall 98.0% 77.3% +20.7%
Precision 86.0% 93.2% −7.2%
Accuracy 88.8% 82.4% +6.4%
AUC (ROC) 92.4% 93.6% −1.2%

High recall (98%) is the priority — catching fraudulent listings matters more than the occasional false positive.

Per-Category F1 (XGBoost, real data)

Category F1 Category F1
Used Cars 100% Gaming Laptop 95.2%
Luxury Watch 100% Headphones 96.6%
Luxury Handbag 100% PS5 / Console 96.6%
iPhone 91.6% Books 85.6%
Hair Dryer 80.0% Furniture 69.6%

Top Predictive Features (XGBoost Importance)

Rank Feature Importance
1 Seller Rating 60.0%
2 Quantity Sold (log) 14.8%
3 Product Rating 10.9%
4 Price Percentile 3.2%
5 Review Count (log) 2.8%
6 Log Price 2.4%
7 Condition (New) 1.9%
8 Seller Feedback % 1.8%
9 Platform (eBay) 1.1%

Architecture

User Query
    │
    ▼
┌─────────────────────────────────┐
│     FastAPI Backend             │
└──────────┬──────────────────────┘
           │
    ┌──────┴──────┐
    ▼             ▼
SerpAPI       Fraud Detection Pipeline
Scraping      │
│             ├─ 1. Statistical Scoring
│ Google         Price: 50% | Seller: 25%
│ Shopping       Attributes: 15% | History: 10%
│ eBay        │
│             ├─ 2. XGBoost Classifier (17 features)
│                seller_rating, quantity_sold,
│                price_percentile, rating, reviews,
│                seller_feedback_pct, platform,
│                condition, dynamic_trust flags
│             │
│             ├─ 3. Groq LLM Explanation
│                llama-3.1-8b-instant
│                Structured JSON · Cached
│             │
│             └─ 4. Duplicate Detection
│                TF-IDF cosine similarity (0.82)
│                Cross-platform pair matching
│
└──────────────────────────
    Tailwind CSS / Vanilla JS
    Glass-morphism UI
    Real-time animated stages

Tech Stack

Layer Technology
Backend FastAPI 0.128 (Python 3.11+)
ML Model XGBoost 1.7+, scikit-learn
LLM Groq API — LLaMA 3.1-8B Instant
Duplicate Detection TF-IDF cosine similarity
Data Collection SerpAPI (Google Shopping + eBay)
Frontend HTML5, Tailwind CSS, Vanilla JavaScript
Deployment Railway (CI/CD from GitHub)

Fraud Detection — Deep Dive

Stage 1 — Statistical Risk Scoring

Component Weight Signals
Price Analysis 50% Percentile rank, outlier removal (>10× median), trusted seller discount
Seller Reputation 25% Item rating, review count, seller rating, eBay feedback %, dynamic trust
Product Attributes 15% Condition, title length and quality
Historical Patterns 10% Platform risk, category-specific baselines

Dynamic trusted seller logic: eBay sellers with ≥1,000 ratings and ≥98% positive feedback are automatically trusted — no hardcoding needed.

Risk thresholds: LOW (< 0.25) · MEDIUM (0.25–0.55) · HIGH (≥ 0.55)

Stage 2 — XGBoost Classifier

Trained on 10,000 synthetic listings (35% fraud / 45% legitimate / 20% edge cases) and validated on 714 real scraped listings labeled via Groq LLM.

17 input features across price, seller trust, platform, condition, and sales volume. quantity_sold emerged as the #2 most important feature (14.8%) — high-volume listings are a strong legitimacy signal.

Stage 3 — Groq LLM Explanation

Generates structured fraud analysis for the top 5 risky items per search: scam probability, specific red flags, plain-English reasoning, and a buy recommendation. Trust score is capped by the LLM output — an AVOID verdict limits the safety score to 20%, preventing contradictory results.

Stage 4 — Semantic Duplicate Detection

TF-IDF vectorization with cosine similarity threshold of 0.82 and Union-Find clustering. Flags when the same item appears across multiple sellers or platforms so users can compare before buying.


Local Development

Prerequisites

Setup

git clone https://github.com/Msundara19/Trustcart.git
cd Trustcart

brew install libomp          # macOS only

python -m venv venv
source venv/bin/activate     # Windows: venv\Scripts\activate
pip install -r requirements.txt

cp .env.example .env         # Add SERPAPI_KEY and GROQ_API_KEY

uvicorn main:app --reload    # → http://localhost:8000

Roadmap

  • Multi-platform scraping — Google Shopping + eBay
  • Statistical fraud scoring — weighted 4-factor model
  • XGBoost classifier — 17 features, 91.6% F1 / 98.0% recall
  • Groq LLM explanations with calibrated risk thresholds
  • Semantic duplicate detection — cross-platform pairing
  • Prediction logging for continuous dataset growth
  • Production deployment on Railway (CI/CD from GitHub)
  • Browser extension (Chrome / Firefox)
  • Historical price tracking
  • Amazon + AliExpress integration
  • Image-based counterfeit detection

Security & Privacy

  • No user data collected — stateless API, no tracking or storage
  • API keys stored in environment variables, never committed
  • HTTPS enforced in production (Railway)

Author

Meenakshi Sridharan

GitHub LinkedIn Portfolio

📧 msridharansundaram@hawk.illinoistech.edu


Acknowledgments

Tool Role
Groq Ultra-fast LLM inference (LPU hardware)
SerpAPI Google Shopping + eBay scraping
XGBoost Gradient boosting classifier
FastAPI Async Python web framework
Railway Zero-config cloud deployment

Releases

No releases published

Packages

 
 
 

Contributors