League Roundup

Premier League Week 24 AI Model Audit

February 4, 2026

2 min read

Generated by: meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8

Llama 3.3 70B Turbo led with 2.50 avg points/match, followed by Kimi K2 Instruct (2.44) and Gemma 3n E4B (2.44). Models achieved 46.73% correct tendency overall. Aston Villa vs Brentford (0-1) was a significant upset, with 0% correct tendency.

Top 10 Models

#	Model	Matches	Total Points	Avg Pts/Match	Tendency %	Exact %
1	Llama 3.3 70B Turbo (Meta)	10	25	2.50	50.0%	20.0%
2	Kimi K2 Instruct (Moonshot)	9	22	2.44	33.3%	33.3%
3	Gemma 3n E4B (Google)	9	22	2.44	66.7%	11.1%
4	DeepSeek V3.1	10	23	2.30	50.0%	30.0%
5	Llama 3 8B Lite (Meta)	10	23	2.30	70.0%	10.0%
6	Rnj-1 Instruct (Essential AI)	10	23	2.30	70.0%	10.0%
7	Llama 4 Maverick (Meta)	10	22	2.20	60.0%	20.0%
8	Llama 3.1 405B Turbo (Meta)	10	22	2.20	60.0%	20.0%
9	Qwen3 235B Instruct (Alibaba)	4	8	2.00	50.0%	25.0%
10	Ministral 3 14B (Mistral)	10	20	2.00	50.0%	20.0%

Match-by-Match Audit

Sunderland vs Burnley: 38.5% correct tendency, 0% exact score hits
Tottenham vs Manchester City: 7.7% correct tendency, 3.8% exact score hits
Aston Villa vs Brentford: 0% correct tendency, 0% exact score hits
Manchester United vs Fulham: 33.3% correct tendency, 0% exact score hits
Nottingham Forest vs Crystal Palace: 80% correct tendency, 80% exact score hits
Liverpool vs Newcastle: 28% correct tendency, 0% exact score hits
Chelsea vs West Ham: 64% correct tendency, 0% exact score hits
Brighton vs Everton: 61.5% correct tendency, 57.7% exact score hits
Wolves vs Bournemouth: 65.4% correct tendency, 15.4% exact score hits
Leeds vs Arsenal: 88.9% correct tendency, 0% exact score hits

Biggest Consensus Misses

Tottenham vs Manchester City (2-2) | Consensus: A (88.5%)
Aston Villa vs Brentford (0-1) | Consensus: D (64.0%)
Manchester United vs Fulham (3-2) | Consensus: D (62.5%)
Sunderland vs Burnley (3-0) | Consensus: D (57.7%)
Liverpool vs Newcastle (4-1) | Consensus: D (48.0%)

Methodology

Models are ranked by average points per match. Points are awarded as follows: 3 points for correct match tendency (win/draw/loss), 6 points for exact score. Consensus is determined by the most common predicted outcome (H/D/A) among all models.

Generation cost: $0.0022

Tokens: 4,721 input + 1,103 output

Frequently Asked Questions

What is this article about?

League Roundup

Premier League Week 27 AI Model Prediction Audit

Mistral Small 3.2 24B led Premier League predictions with 2.11 points per match, followed by Devstral 2 (1.89) and Gemma 3 27B (1.56). Models achieved 41.52% correct tendency overall, though Aston Villa's 1-1 draw with Leeds caught 94.7% consensus wrong.

Feb 23, 2026

League Roundup

Premier League Round 26 AI Model Predictions Audit

GLM 4.7 Flash led Premier League predictions this week with 4.67 points per match, followed by DeepSeek R1 0528 (4.63) and Kimi K2 Instruct (3.88). Models achieved 46.15% correct tendency overall, though Chelsea's 2-2 draw with Leeds caught most models off guard.

Feb 16, 2026

League Roundup

UEFA Europa League Round of 32 AI Model Performance Audit

GLM-5 (OpenRouter) led UEFA Europa League predictions this week with 3.25 points per match, followed by Llama 4 Scout (OpenRouter) at 2.88 and Mistral Small 3.2 24B (OpenRouter) at 2.25. Models achieved 52.63% correct tendency overall, though Ludogorets vs Ferencvarosi TC (2-1) caught most models off guard.

Premier League Week 24 AI Model Audit

Top 10 Models

Match-by-Match Audit

Biggest Consensus Misses

Methodology

Frequently Asked Questions

You might also like

Premier League Week 27 AI Model Prediction Audit

Premier League Round 26 AI Model Predictions Audit

UEFA Europa League Round of 32 AI Model Performance Audit