UEFA Champions League AI Model Audit - Week 8
Llama 4 Maverick led UEFA Champions League predictions with 2.06 avg points/match, followed by Mistral 7B v0.2 (2.00) and Llama 3.1 405B Turbo (1.89). Models achieved 45.86% correct tendency overall. Biggest upset: Atletico Madrid vs Bodo/Glimt (1-2).
Llama 4 Maverick led UEFA Champions League predictions with 2.06 avg points/match, followed by Mistral 7B v0.2 (2.00) and Llama 3.1 405B Turbo (1.89). Models achieved 45.86% correct tendency overall. Biggest upset: Atletico Madrid vs Bodo/Glimt (1-2).
Top 10 Models
| # | Model | Matches | Total Points | Avg Pts/Match | Tendency % | Exact % |
|---|---|---|---|---|---|---|
| 1 | Llama 4 Maverick (Meta) | 18 | 37 | 2.06 | 61.1% | 11.1% |
| 2 | Mistral 7B v0.2 (Mistral) | 14 | 28 | 2.00 | 64.3% | 7.1% |
| 3 | Llama 3.1 405B Turbo (Meta) | 18 | 34 | 1.89 | 55.6% | 11.1% |
| 4 | Llama 3.2 3B Turbo (Meta) | 18 | 34 | 1.89 | 50.0% | 11.1% |
| 5 | DeepSeek R1 (Reasoning) | 16 | 30 | 1.88 | 50.0% | 12.5% |
| 6 | Qwen 2.5 72B Turbo (Alibaba) | 18 | 33 | 1.83 | 55.6% | 11.1% |
| 7 | Llama 3 8B Lite (Meta) | 18 | 32 | 1.78 | 55.6% | 11.1% |
| 8 | Mistral 7B v0.3 (Mistral) | 18 | 32 | 1.78 | 61.1% | 5.6% |
| 9 | Cogito v2 405B (Deep Cogito) | 17 | 28 | 1.65 | 47.1% | 11.8% |
| 10 | Llama 4 Scout (Meta) | 18 | 28 | 1.56 | 61.1% | 5.6% |
Match-by-Match Audit
- Napoli vs Chelsea: 79.2% correct tendency (19/24 models)
- Pafos vs Slavia Praha: 19.2% correct tendency (5/26 models)
- Atletico Madrid vs Bodo/Glimt: 0.0% correct tendency (0/25 models)
- Manchester City vs Galatasaray: 48.0% correct tendency (12/25 models)
- PSV Eindhoven vs Bayern MΓΌnchen: 83.3% correct tendency (20/24 models)
- Athletic Club vs Sporting CP: 61.5% correct tendency (16/26 models)
- Ajax vs Olympiakos Piraeus: 57.7% correct tendency (15/26 models)
- Union St. Gilloise vs Atalanta: 0.0% correct tendency (0/26 models)
- Arsenal vs Kairat Almaty: 100.0% correct tendency (23/23 models)
- Liverpool vs Qarabag: 76.0% correct tendency (19/25 models)
- Barcelona vs FC Copenhagen: 61.5% correct tendency (16/26 models)
- Benfica vs Real Madrid: 0.0% correct tendency (0/25 models)
- Borussia Dortmund vs Inter: 51.9% correct tendency (14/27 models)
- Monaco vs Juventus: 32.0% correct tendency (8/25 models)
- Paris Saint Germain vs Newcastle: 39.1% correct tendency (9/23 models)
- Eintracht Frankfurt vs Tottenham: 52.0% correct tendency (13/25 models)
- Club Brugge KV vs Marseille: 16.0% correct tendency (4/25 models)
- Bayer Leverkusen vs Villarreal: 48.0% correct tendency (12/25 models)
Biggest Consensus Misses
- Benfica vs Real Madrid (4-2) | Consensus: A (92.0%)
- Union St. Gilloise vs Atalanta (1-0) | Consensus: A (88.5%)
- Atletico Madrid vs Bodo/Glimt (1-2) | Consensus: H (76.0%)
- Monaco vs Juventus (0-0) | Consensus: A (68.0%)
- Pafos vs Slavia Praha (4-1) | Consensus: A (50.0%)
Methodology
Avg points/match is calculated based on model predictions for each match. Correct tendency is defined as predicting the correct match outcome (H/D/A). Exact score hits count models that correctly predicted the final scoreline. Points are awarded as follows: correct tendency (3 pts for win/draw/loss), exact score (additional 3 pts for exact scoreline).
Generation cost: $0.0032
Tokens: 7,640 input + 1,326 output
Frequently Asked Questions
What is this article about?
You might also like
UEFA Champions League Round of 32 AI Model Accuracy Audit
Llama 4 Scout and GLM-4.7 led UEFA Champions League Round of 32 predictions with 2.50 points per match, followed by Gemma 3 27B and GLM-5 (2.25). Models achieved a 34.87% correct tendency overall, with the 2-2 draw between Paris Saint Germain and Monaco being the biggest consensus miss.
Mar 2, 2026
UEFA Conference League Round of 32 AI Prediction Audit
GPT-OSS 20B led UEFA Conference League predictions with 2.88 points per match, followed by Trinity Large Preview (2.63) and GLM-5 (2.25). Models achieved 38.16% correct tendency overall, with Fiorentina vs Jagiellonia (2-4) as the biggest upset.
Mar 2, 2026
UEFA Europa League Round of 32 AI Model Performance Audit
Mistral Small 3.2 24B led predictions with 3.38 avg points/match, followed by Phi-4 (2.88) and Llama 4 Scout (2.75). Models achieved 38.82% correct tendency. VfB Stuttgart's 0-1 loss to Celtic was the biggest consensus miss.