System Transparency

How It Works

Hyper Leezus runs a fully automated ML pipeline — ingesting real odds and game results every two hours, retraining models nightly, and blending statistical predictions with live market consensus to surface genuine edges.

The Pipeline

Step 1

Every 2 hours

Data Collection

Live pre-game odds from 20+ sportsbooks via The-Odds-API
Completed game scores stored with matching game UUIDs
Consensus win probability averaged across all bookmakers
ESPN historical scores backfilled for rolling team stats

Step 2

Daily at 4 AM

Model Training

10-game rolling team averages prevent data leakage
XGBoost classifier outputs win probability per league
Gradient Boosting regressors predict home & away scores
5 leagues trained independently: NBA, NHL, MLB, NFL, NCAAB

Step 3

Real-time

Prediction & Edge

ML output blended 60% with 40% market consensus probability
Edge flagged when blended probability deviates >3.5% from market
O/U edge detected when projected total deviates >4% from league avg
Confidence derived from how far probability sits from 50%

Live Model Status

Updated after each training run

League	Status	Training Samples	Accuracy	Log Loss	Calibration (ECE)	Est. ROI	Last Trained

Accuracy ≥ 55% and positive ROI indicate the model is beating the market. Calibration (ECE) measures how well predicted probabilities match actual outcomes — lower is better.

Training Features

13 features across 4 categories feed each league's model. All performance features use rolling 10-game averages computed from games before the target game — no data leakage.

Performance

Power Rating Diff

Rolling avg point differential (home minus away), last 10 games

Offensive Rating Diff

Points per possession differential across recent games

Defensive Rating Diff

Points allowed per possession differential across recent games

Pace Differential

Possessions per minute difference — predicts total scoring

Situational

Rest Days Diff

Days of rest between games, home minus away

Travel Fatigue

Miles traveled divided by rest days — penalizes cross-country back-to-backs

Injury Impact Diff

Summed injury impact scores (out=1.0, questionable=0.45) per team

Market Signals

Market Implied Prob

Consensus home win probability averaged across all bookmakers (vig-adjusted)

Line Movement

Spread change from open — sharp movement signals informed action

Public Betting %

Percentage of public bets on home team — contrarian signal

Sharp Money %

Percentage of sharp (high-limit) bets on home — strongest market signal

Environmental

Sentiment Diff

Reddit post sentiment score for home team minus away team

Weather Severity

Composite of wind speed, precipitation, and temperature deviation (outdoor sports only)

The 60 / 40 Blend

Why not use the ML model at 100%?

60% ML

40% Market

Sportsbook lines aggregate information from thousands of sharp bettors and professional syndicates. A model trained on weeks of data cannot systematically beat that signal — but it can add value on top of it.

By blending, the system inherits the market's information advantage while letting the model contribute where market odds are slow to adjust: rest differentials, travel schedules, and recent form.

Edge Detection

How picks are surfaced

Moneyline Edge

blended_prob − market_implied_prob > 3.5%

Model sees home team as meaningfully more likely than the market implies.

Spread Edge

|spread_diff| > 1.5 pts when confidence > 62%

Predicted margin deviates from the line and the model is confident.

O/U Edge

|projected_total − league_avg| / league_avg > 4%

Projected total deviates significantly from the league season average — mean reversion signal.