An educational project exploring how LLMs can be prompt-engineered into quantitative sports betting analysts — backed by Python-native mathematical governance to prevent hallucinated bets.
The system finds positive Expected Value (+EV) by comparing bookmaker odds against Python-computed true probabilities derived from real match data (form, H2H, Poisson-modeled expected goals). It uses Fractional Kelly Criterion to manage a virtual $1,000 bankroll.
- Language & Database: Python 3.13, PostgreSQL, SQLAlchemy ORM
- AI Engine: Local or remote LLM inference via OpenAI-compatible API (Ollama, Claude, etc.)
- Data Sources:
- The Odds API — Live odds across bookmakers (h2h, totals) for Argentine Primera, Copa Libertadores
- SofaScore — Real-time team form (last 10 matches, W/D/L, goals) and H2H history via
cloudscraper
- Key Dependencies:
cloudscraper,openai,sqlalchemy,python-dotenv
Pulls live odds for fixtures within 72 hours across configured leagues using a bulk API call per league (~3 API tokens per full run).
Python scrapes real match data directly from SofaScore:
- Team Form: Last 10 completed matches with scores, W/D/L record, goals scored/conceded, and standard deviation
- H2H: Cross-references both teams' recent events to find direct meetings
All probability calculations happen in Python — the LLM never computes these:
Step 1: Form Cross-Reference
Home Win = avg(home_win%, away_loss%)
Draw = avg(home_draw%, away_draw%)
Away Win = avg(away_win%, home_loss%)
Step 2: H2H Adjustment (dynamic weight: 2 matches=15%, 3=20%, 5+=30%)
Blended = form × (1-weight) + h2h × weight
Step 3: Home Advantage (+5% home, -5% away)
Step 4: Normalize to sum to 1.0
Step 5: Poisson Distribution for totals
λ = expected_total_goals
P(Under X.5) = Poisson CDF(floor(X.5), λ)
Python scans every market outcome (home/draw/away/over/under) against its computed true probabilities:
- Computes
EV = true_prob - implied_probfor each outcome - Requires EV ≥ +5% to qualify
- Computes Quarter-Kelly stake, capped at 5% of bankroll
The LLM receives all data and Python's computed probabilities. Its role is advisory — providing qualitative reasoning (injuries context, motivation, etc.). Python is the authority on:
- ✅ Which outcome to bet
- ✅ True probability values
- ✅ Kelly stake sizing
- ✅ EV threshold enforcement
Every bet is fully traced:
📋 [BET DETAIL] Market: h2h | Odds: 1.73
✅ [+EV FOUND] Rosario Central (Home Win) @ 1.73 | True: 0.550 | Implied: 0.578 | EV: +7.2%
📊 [PYTHON STAKE] Quarter-Kelly 4.25% = $42.45
🎯 BET PLACED: Rosario Central (Home Win) @ odds 1.73 for $42.45 (EV: +7.2%)
💰 Bankroll: $957.55
After matches finish, run check_results.py to:
- Fetch final scores from SofaScore
- Mark bets as win/loss
- Calculate P&L per bet
- Credit winnings back to bankroll
- Python 3.13+ with
venv - PostgreSQL database
- Ollama (or any OpenAI-compatible endpoint)
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtODDS_API_KEY=your_key_here
LLM_BASE_URL=http://localhost:11434/v1 # Ollama default
LLM_API_KEY=ollama # or your API key
LLM_MODEL=qwen2.5:14b # or any model# Place bets on upcoming matches
python main.py
# Check results after matches finish
python check_results.py
# Test data sources
python tests/test_sources.pyLLM_MODEL=llama3 python main.py| Gate | Rule |
|---|---|
| Minimum Form Data | Requires ≥5 recent matches per team |
| EV Threshold | Rejects any bet with EV < +5% |
| Fractional Kelly (0.25×) | Quarter-Kelly sizing to reduce variance |
| Bankroll Cap | Max 5% of bankroll per bet |
| Implied Prob Sanity | Validates implied_prob ≈ 1/odds |
| Python Authority | True probability computed in Python, never by the LLM |
| Table | Purpose |
|---|---|
matches |
Fixtures with home/away teams, scores, status |
market_odds |
Live bookmaker odds per outcome |
bets |
Placed bets with stake, odds, EV%, true prob, result, P&L |
bet_logs |
Bet lifecycle events (Placed → Resolved) |
bankroll |
Current balance |
bankroll_logs |
Transaction history (bets placed, winnings credited) |
betting_reasoning |
LLM reasoning text per match |
├── main.py # Entry point — fetches odds, scrapes form, places bets
├── agent.py # LLM prompt engineering and advisory analysis
├── search.py # SofaScore scraping + Python true probability engine
├── sofa_scraper.py # Low-level SofaScore API via cloudscraper
├── pipeline.py # Odds API fetching and parsing
├── check_results.py # Post-match results resolution and P&L tracking
├── models.py # SQLAlchemy ORM models
├── database.py # DB connection and session management
├── tests/
│ └── test_sources.py # Data source validation tests
├── requirements.txt
└── .env # API keys (not committed)
- Backtesting: Simulate the pipeline over previous seasons to validate the +EV strategy's long-term ROI.
- Line Movement Tracking: Store periodic odds snapshots to detect sharp market moves and bet before retail books adjust.
- Multi-Bookmaker Arbitrage: Scan pricing gaps across bookmakers for risk-free arbitrage opportunities.
- Enhanced xG: Integrate shot-quality data (Opta/StatsBomb) for more accurate Poisson modeling instead of goals-based averages.
- Rate Limiting: Add
time.sleep()delays or proxy rotation for SofaScore scraping at scale.