How It Works
Hyper Leezus runs a fully automated ML pipeline — ingesting real odds and game results every two hours, retraining models nightly, and blending statistical predictions with live market consensus to surface genuine edges.
The Pipeline
Data Collection
- Live pre-game odds from 20+ sportsbooks via The-Odds-API
- Completed game scores stored with matching game UUIDs
- Consensus win probability averaged across all bookmakers
- ESPN historical scores backfilled for rolling team stats
Model Training
- 10-game rolling team averages prevent data leakage
- XGBoost classifier outputs win probability per league
- Gradient Boosting regressors predict home & away scores
- 5 leagues trained independently: NBA, NHL, MLB, NFL, NCAAB
Prediction & Edge
- ML output blended 60% with 40% market consensus probability
- Edge flagged when blended probability deviates >3.5% from market
- O/U edge detected when projected total deviates >4% from league avg
- Confidence derived from how far probability sits from 50%
Live Model Status
| League | Status | Training Samples | Accuracy | Log Loss | Calibration (ECE) | Est. ROI | Last Trained |
|---|---|---|---|---|---|---|---|
Accuracy ≥ 55% and positive ROI indicate the model is beating the market. Calibration (ECE) measures how well predicted probabilities match actual outcomes — lower is better.
Training Features
13 features across 4 categories feed each league's model. All performance features use rolling 10-game averages computed from games before the target game — no data leakage.
Rolling avg point differential (home minus away), last 10 games
Points per possession differential across recent games
Points allowed per possession differential across recent games
Possessions per minute difference — predicts total scoring
Days of rest between games, home minus away
Miles traveled divided by rest days — penalizes cross-country back-to-backs
Summed injury impact scores (out=1.0, questionable=0.45) per team
Consensus home win probability averaged across all bookmakers (vig-adjusted)
Spread change from open — sharp movement signals informed action
Percentage of public bets on home team — contrarian signal
Percentage of sharp (high-limit) bets on home — strongest market signal
Reddit post sentiment score for home team minus away team
Composite of wind speed, precipitation, and temperature deviation (outdoor sports only)
Why not use the ML model at 100%?
Sportsbook lines aggregate information from thousands of sharp bettors and professional syndicates. A model trained on weeks of data cannot systematically beat that signal — but it can add value on top of it.
By blending, the system inherits the market's information advantage while letting the model contribute where market odds are slow to adjust: rest differentials, travel schedules, and recent form.
How picks are surfaced
blended_prob − market_implied_prob > 3.5%Model sees home team as meaningfully more likely than the market implies.
|spread_diff| > 1.5 pts when confidence > 62%Predicted margin deviates from the line and the model is confident.
|projected_total − league_avg| / league_avg > 4%Projected total deviates significantly from the league season average — mean reversion signal.