Premier League Week 24 AI Model Audit
Llama 3.3 70B Turbo led with 2.50 avg points/match, followed by Kimi K2 Instruct (2.44) and Gemma 3n E4B (2.44). Models achieved 46.73% correct tendency overall. Aston Villa vs Brentford (0-1) was a significant upset, with 0% correct tendency.
Llama 3.3 70B Turbo led with 2.50 avg points/match, followed by Kimi K2 Instruct (2.44) and Gemma 3n E4B (2.44). Models achieved 46.73% correct tendency overall. Aston Villa vs Brentford (0-1) was a significant upset, with 0% correct tendency.
Top 10 Models
| # | Model | Matches | Total Points | Avg Pts/Match | Tendency % | Exact % |
|---|---|---|---|---|---|---|
| 1 | Llama 3.3 70B Turbo (Meta) | 10 | 25 | 2.50 | 50.0% | 20.0% |
| 2 | Kimi K2 Instruct (Moonshot) | 9 | 22 | 2.44 | 33.3% | 33.3% |
| 3 | Gemma 3n E4B (Google) | 9 | 22 | 2.44 | 66.7% | 11.1% |
| 4 | DeepSeek V3.1 | 10 | 23 | 2.30 | 50.0% | 30.0% |
| 5 | Llama 3 8B Lite (Meta) | 10 | 23 | 2.30 | 70.0% | 10.0% |
| 6 | Rnj-1 Instruct (Essential AI) | 10 | 23 | 2.30 | 70.0% | 10.0% |
| 7 | Llama 4 Maverick (Meta) | 10 | 22 | 2.20 | 60.0% | 20.0% |
| 8 | Llama 3.1 405B Turbo (Meta) | 10 | 22 | 2.20 | 60.0% | 20.0% |
| 9 | Qwen3 235B Instruct (Alibaba) | 4 | 8 | 2.00 | 50.0% | 25.0% |
| 10 | Ministral 3 14B (Mistral) | 10 | 20 | 2.00 | 50.0% | 20.0% |
Match-by-Match Audit
- Sunderland vs Burnley: 38.5% correct tendency, 0% exact score hits
- Tottenham vs Manchester City: 7.7% correct tendency, 3.8% exact score hits
- Aston Villa vs Brentford: 0% correct tendency, 0% exact score hits
- Manchester United vs Fulham: 33.3% correct tendency, 0% exact score hits
- Nottingham Forest vs Crystal Palace: 80% correct tendency, 80% exact score hits
- Liverpool vs Newcastle: 28% correct tendency, 0% exact score hits
- Chelsea vs West Ham: 64% correct tendency, 0% exact score hits
- Brighton vs Everton: 61.5% correct tendency, 57.7% exact score hits
- Wolves vs Bournemouth: 65.4% correct tendency, 15.4% exact score hits
- Leeds vs Arsenal: 88.9% correct tendency, 0% exact score hits
Biggest Consensus Misses
- Tottenham vs Manchester City (2-2) | Consensus: A (88.5%)
- Aston Villa vs Brentford (0-1) | Consensus: D (64.0%)
- Manchester United vs Fulham (3-2) | Consensus: D (62.5%)
- Sunderland vs Burnley (3-0) | Consensus: D (57.7%)
- Liverpool vs Newcastle (4-1) | Consensus: D (48.0%)
Methodology
Models are ranked by average points per match. Points are awarded as follows: 3 points for correct match tendency (win/draw/loss), 6 points for exact score. Consensus is determined by the most common predicted outcome (H/D/A) among all models.
Generation cost: $0.0022
Tokens: 4,721 input + 1,103 output
Frequently Asked Questions
What is this article about?
You might also like
Premier League Week 27 AI Model Prediction Audit
Mistral Small 3.2 24B led Premier League predictions with 2.11 points per match, followed by Devstral 2 (1.89) and Gemma 3 27B (1.56). Models achieved 41.52% correct tendency overall, though Aston Villa's 1-1 draw with Leeds caught 94.7% consensus wrong.
Feb 23, 2026
Premier League Round 26 AI Model Predictions Audit
GLM 4.7 Flash led Premier League predictions this week with 4.67 points per match, followed by DeepSeek R1 0528 (4.63) and Kimi K2 Instruct (3.88). Models achieved 46.15% correct tendency overall, though Chelsea's 2-2 draw with Leeds caught most models off guard.
Feb 16, 2026
UEFA Europa League Round of 32 AI Model Performance Audit
GLM-5 (OpenRouter) led UEFA Europa League predictions this week with 3.25 points per match, followed by Llama 4 Scout (OpenRouter) at 2.88 and Mistral Small 3.2 24B (OpenRouter) at 2.25. Models achieved 52.63% correct tendency overall, though Ludogorets vs Ferencvarosi TC (2-1) caught most models off guard.