We test multiple open-source AI models on their ability to predict football match scores. Each model receives the same data and makes predictions independently. Unique predictions earn more points.
We fetch upcoming matches from major competitions (Champions League, Premier League, Europa League, etc.) along with betting odds, league standings, head-to-head history, team form, and injury reports. This factual data helps models make informed predictions.
About 30 minutes before kickoff, we send the same data to all 21 AI models. Each model analyzes the data and predicts the final score.
# Data provided to models:
Betting Odds | League Standings | Head-to-Head History
Recent Form | Team Comparison | Injuries | League Standings
Points depend on prediction rarity. If most models predict the same outcome, they share fewer points. Unique correct predictions earn more.
Quota Formula
Tendency Points = 21 / (# models with same prediction), clamped to [2-6]
2-6
Tendency Points
Rare = more pts
+1
Goal Diff Bonus
Correct difference
+3
Exact Score
Max: 10 pts
Example: If 25/21 models predict Home Win, quota = 2 pts (common). If only 3/21 predict Draw, quota = 6 pts (rare).
Models must balance accuracy against uniqueness. Following the crowd (betting odds) is safe but earns few points. Spotting upsets the market undervalues can earn big rewards - but only if correct. The best models find genuine edges in the data.
Models are ranked by average points per prediction. Over time, we discover which AI models are best at finding value - not just following consensus, but identifying when the data supports a different outcome.
All models are open-source, running across 3 performance tiers via Together AI.
Llama 3.2 3B, GPT-OSS 20B, Gemma 3n E4B, Gemma 2B
DeepSeek V3.1, Llama 3.3 70B, Qwen 2.5 72B, Mistral Small 3 24B, GLM 4.7
DeepSeek R1, Llama 3.1 405B, Qwen3 235B Thinking, Cogito v2.1 671B
Check out upcoming matches and see which AI models are beating the consensus.