LLM Sports Quant

An educational project exploring how LLMs can be prompt-engineered into quantitative sports betting analysts — backed by Python-native mathematical governance to prevent hallucinated bets.

The system finds positive Expected Value (+EV) by comparing bookmaker odds against Python-computed true probabilities derived from real match data (form, H2H, Poisson-modeled expected goals). It uses Fractional Kelly Criterion to manage a virtual $1,000 bankroll.

Tech Stack

Language & Database: Python 3.13, PostgreSQL, SQLAlchemy ORM
AI Engine: Local or remote LLM inference via OpenAI-compatible API (Ollama, Claude, etc.)
Data Sources:
- The Odds API — Live odds across bookmakers (h2h, totals) for Argentine Primera, Copa Libertadores
- SofaScore — Real-time team form (last 10 matches, W/D/L, goals) and H2H history via cloudscraper
Key Dependencies: cloudscraper, openai, sqlalchemy, python-dotenv

How It Works

1. Market Sweeping

Pulls live odds for fixtures within 72 hours across configured leagues using a bulk API call per league (~3 API tokens per full run).

2. Data Gathering (SofaScore)

Python scrapes real match data directly from SofaScore:

Team Form: Last 10 completed matches with scores, W/D/L record, goals scored/conceded, and standard deviation
H2H: Cross-references both teams' recent events to find direct meetings

3. Python-Native True Probability (Authoritative)

All probability calculations happen in Python — the LLM never computes these:

Step 1: Form Cross-Reference
   Home Win = avg(home_win%, away_loss%)
   Draw     = avg(home_draw%, away_draw%)
   Away Win = avg(away_win%, home_loss%)

Step 2: H2H Adjustment (dynamic weight: 2 matches=15%, 3=20%, 5+=30%)
   Blended = form × (1-weight) + h2h × weight

Step 3: Home Advantage (+5% home, -5% away)

Step 4: Normalize to sum to 1.0

Step 5: Poisson Distribution for totals
   λ = expected_total_goals
   P(Under X.5) = Poisson CDF(floor(X.5), λ)

4. Python EV Scanner

Python scans every market outcome (home/draw/away/over/under) against its computed true probabilities:

Computes EV = true_prob - implied_prob for each outcome
Requires EV ≥ +5% to qualify
Computes Quarter-Kelly stake, capped at 5% of bankroll

5. LLM Advisory Role

The LLM receives all data and Python's computed probabilities. Its role is advisory — providing qualitative reasoning (injuries context, motivation, etc.). Python is the authority on:

✅ Which outcome to bet
✅ True probability values
✅ Kelly stake sizing
✅ EV threshold enforcement

6. Mathematical Validation & Logging

Every bet is fully traced:

📋 [BET DETAIL] Market: h2h | Odds: 1.73
✅ [+EV FOUND] Rosario Central (Home Win) @ 1.73 | True: 0.550 | Implied: 0.578 | EV: +7.2%
📊 [PYTHON STAKE] Quarter-Kelly 4.25% = $42.45
🎯 BET PLACED: Rosario Central (Home Win) @ odds 1.73 for $42.45 (EV: +7.2%)
💰 Bankroll: $957.55

7. Results Tracking

After matches finish, run check_results.py to:

Fetch final scores from SofaScore
Mark bets as win/loss
Calculate P&L per bet
Credit winnings back to bankroll

Running the Project

Prerequisites

Python 3.13+ with venv
PostgreSQL database
Ollama (or any OpenAI-compatible endpoint)

Setup

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Environment Variables (`.env`)

ODDS_API_KEY=your_key_here
LLM_BASE_URL=http://localhost:11434/v1  # Ollama default
LLM_API_KEY=ollama                       # or your API key
LLM_MODEL=qwen2.5:14b                   # or any model

Run

# Place bets on upcoming matches
python main.py

# Check results after matches finish
python check_results.py

# Test data sources
python tests/test_sources.py

Swap LLM engine

LLM_MODEL=llama3 python main.py

Architecture Safeguards

Gate	Rule
Minimum Form Data	Requires ≥5 recent matches per team
EV Threshold	Rejects any bet with EV < +5%
Fractional Kelly (0.25×)	Quarter-Kelly sizing to reduce variance
Bankroll Cap	Max 5% of bankroll per bet
Implied Prob Sanity	Validates implied_prob ≈ 1/odds
Python Authority	True probability computed in Python, never by the LLM

Database Schema

Table	Purpose
`matches`	Fixtures with home/away teams, scores, status
`market_odds`	Live bookmaker odds per outcome
`bets`	Placed bets with stake, odds, EV%, true prob, result, P&L
`bet_logs`	Bet lifecycle events (Placed → Resolved)
`bankroll`	Current balance
`bankroll_logs`	Transaction history (bets placed, winnings credited)
`betting_reasoning`	LLM reasoning text per match

Project Structure

├── main.py              # Entry point — fetches odds, scrapes form, places bets
├── agent.py             # LLM prompt engineering and advisory analysis
├── search.py            # SofaScore scraping + Python true probability engine
├── sofa_scraper.py      # Low-level SofaScore API via cloudscraper
├── pipeline.py          # Odds API fetching and parsing
├── check_results.py     # Post-match results resolution and P&L tracking
├── models.py            # SQLAlchemy ORM models
├── database.py          # DB connection and session management
├── tests/
│   └── test_sources.py  # Data source validation tests
├── requirements.txt
└── .env                 # API keys (not committed)

Future Improvements

Backtesting: Simulate the pipeline over previous seasons to validate the +EV strategy's long-term ROI.
Line Movement Tracking: Store periodic odds snapshots to detect sharp market moves and bet before retail books adjust.
Multi-Bookmaker Arbitrage: Scan pricing gaps across bookmakers for risk-free arbitrage opportunities.
Enhanced xG: Integrate shot-quality data (Opta/StatsBomb) for more accurate Poisson modeling instead of goals-based averages.
Rate Limiting: Add time.sleep() delays or proxy rotation for SofaScore scraping at scale.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Sports Quant

Tech Stack

How It Works

1. Market Sweeping

2. Data Gathering (SofaScore)

3. Python-Native True Probability (Authoritative)

4. Python EV Scanner

5. LLM Advisory Role

6. Mathematical Validation & Logging

7. Results Tracking

Running the Project

Prerequisites

Setup

Environment Variables (`.env`)

Run

Swap LLM engine

Architecture Safeguards

Database Schema

Project Structure

Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
tests		tests
.gitignore		.gitignore
README.md		README.md
agent.py		agent.py
check_results.py		check_results.py
database.py		database.py
main.py		main.py
models.py		models.py
pipeline.py		pipeline.py
requirements.txt		requirements.txt
search.py		search.py
sofa_scraper.py		sofa_scraper.py
test_apisports.py		test_apisports.py

Folders and files

Latest commit

History

Repository files navigation

LLM Sports Quant

Tech Stack

How It Works

1. Market Sweeping

2. Data Gathering (SofaScore)

3. Python-Native True Probability (Authoritative)

4. Python EV Scanner

5. LLM Advisory Role

6. Mathematical Validation & Logging

7. Results Tracking

Running the Project

Prerequisites

Setup

Environment Variables (.env)

Run

Swap LLM engine

Architecture Safeguards

Database Schema

Project Structure

Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Environment Variables (`.env`)

Packages