Serie A Week 23 AI Model Audit
Llama 3 8B Lite led Serie A predictions with 3.63 avg points/match, followed by Qwen 2.5 72B Turbo (2.88) and Llama 4 Maverick (2.75). Models achieved 34.44% correct tendency overall. Udinese vs AS Roma (1-0) was a major upset, defying 84.6% consensus for AS Roma.
Llama 3 8B Lite led Serie A predictions with 3.63 avg points/match, followed by Qwen 2.5 72B Turbo (2.88) and Llama 4 Maverick (2.75). Models achieved 34.44% correct tendency overall. Udinese vs AS Roma (1-0) was a major upset, defying 84.6% consensus for AS Roma.
Top 10 Models
| # | Model | Matches | Total Points | Avg Pts/Match | Tendency % | Exact % |
|---|---|---|---|---|---|---|
| 1 | Llama 3 8B Lite (Meta) | 8 | 29 | 3.63 | 75.0% | 25.0% |
| 2 | Qwen 2.5 72B Turbo (Alibaba) | 8 | 23 | 2.88 | 75.0% | 12.5% |
| 3 | Llama 4 Maverick (Meta) | 8 | 22 | 2.75 | 62.5% | 25.0% |
| 4 | Llama 3.2 3B Turbo (Meta) | 8 | 19 | 2.38 | 50.0% | 12.5% |
| 5 | Marin 8B Instruct (Marin Community) | 8 | 18 | 2.25 | 50.0% | 12.5% |
| 6 | Cogito v2 70B (Deep Cogito) | 8 | 16 | 2.00 | 50.0% | 12.5% |
| 7 | Qwen 2.5 7B Turbo (Alibaba) | 8 | 15 | 1.88 | 50.0% | 12.5% |
| 8 | Rnj-1 Instruct (Essential AI) | 8 | 15 | 1.88 | 50.0% | 12.5% |
| 9 | Llama 3.1 8B Turbo (Meta) | 7 | 13 | 1.86 | 42.9% | 14.3% |
| 10 | Nemotron Nano 9B v2 (NVIDIA) | 6 | 10 | 1.67 | 50.0% | 16.7% |
Match-by-Match Audit
- Bologna vs AC Milan: 0-3 | 0 models predicted | Correct tendency: 0.0% | Exact score hits: 0.0% | Consensus: H (0.0%)
- Udinese vs AS Roma: 1-0 | 26 models | Correct tendency: 0.0% | Exact score hits: 0.0% | Consensus: A (84.6%)
- Parma vs Juventus: 1-4 | 27 models | Correct tendency: 85.2% | Exact score hits: 0.0% | Consensus: A (85.2%)
- Como vs Atalanta: 0-0 | 27 models | Correct tendency: 48.1% | Exact score hits: 0.0% | Consensus: D (48.1%)
- Torino vs Lecce: 1-0 | 25 models | Correct tendency: 20.0% | Exact score hits: 12.0% | Consensus: D (52.0%)
- Cagliari vs Verona: 4-0 | 25 models | Correct tendency: 20.0% | Exact score hits: 0.0% | Consensus: D (68.0%)
- Napoli vs Fiorentina: 2-1 | 27 models | Correct tendency: 48.1% | Exact score hits: 48.1% | Consensus: H (48.1%)
- Pisa vs Sassuolo: 1-3 | 26 models | Correct tendency: 61.5% | Exact score hits: 0.0% | Consensus: A (61.5%)
- Lazio vs Genoa: 3-2 | 26 models | Correct tendency: 26.9% | Exact score hits: 0.0% | Consensus: A (50.0%)
Biggest Consensus Misses
- Udinese vs AS Roma (1-0) | Consensus: A (84.6%)
- Cagliari vs Verona (4-0) | Consensus: D (68.0%)
- Torino vs Lecce (1-0) | Consensus: D (52.0%)
- Lazio vs Genoa (3-2) | Consensus: A (50.0%)
- Bologna vs AC Milan (0-3) | Consensus: H (0.0%)
Methodology
Models are scored based on their match-level predictions. Correct tendency is awarded 1 point for correct win/draw/loss prediction, with additional bonus points for exact score prediction. Average points per match are calculated for models that made at least 3 predictions this week.
Generation cost: $0.0022
Tokens: 4,184 input + 1,301 output
Frequently Asked Questions
What is this article about?
You might also like
Serie A Week 26 AI Predictions: DeepSeek Leads, 37.5% Tendency Accuracy
DeepSeek R1-0528 topped Serie A predictions with 2.38 avg points/match, followed by MiniMax M2.5 and GPT-OSS 20B at 1.88. Models achieved 37.50% correct tendency overall. The biggest upset was AC Milan's 0-1 home loss to Parma, missed by 89.5% of models.
Feb 23, 2026
Serie A Round 25 AI Model Performance Audit
Ministral 3 14B led Serie A predictions with 4.75 points per match, followed by Devstral 2 (3.20) and Phi-4 (3.00). Models achieved 41.75% correct tendency overall. The Udinese vs Sassuolo result was the biggest consensus miss, with 81.8% expecting a draw.
Feb 16, 2026
UEFA Europa League Round of 32 AI Model Performance Audit
GLM-5 (OpenRouter) led UEFA Europa League predictions this week with 3.25 points per match, followed by Llama 4 Scout (OpenRouter) at 2.88 and Mistral Small 3.2 24B (OpenRouter) at 2.25. Models achieved 52.63% correct tendency overall, though Ludogorets vs Ferencvarosi TC (2-1) caught most models off guard.