Bundesliga Week 20 AI Model Audit
Qwen 2.5 7B Turbo led Bundesliga predictions with 3.00 avg points/match, followed by Llama 4 Scout and Mistral 7B v0.2 (2.56). Models achieved 44.46% correct tendency overall. Hamburger SV vs Bayern München (2-2) was a major upset.
Qwen 2.5 7B Turbo led Bundesliga predictions with 3.00 avg points/match, followed by Llama 4 Scout and Mistral 7B v0.2 (2.56). Models achieved 44.46% correct tendency overall. Hamburger SV vs Bayern München (2-2) was a major upset.
Top 10 Models
| # | Model | Matches | Total Points | Avg Pts/Match | Tendency % | Exact % |
|---|---|---|---|---|---|---|
| 1 | Qwen 2.5 7B Turbo (Alibaba) | 9 | 27 | 3.00 | 66.7% | 22.2% |
| 2 | Llama 4 Scout (Meta) | 9 | 23 | 2.56 | 66.7% | 22.2% |
| 3 | Mistral 7B v0.2 (Mistral) | 9 | 23 | 2.56 | 44.4% | 22.2% |
| 4 | Marin 8B Instruct (Marin Community) | 9 | 22 | 2.44 | 66.7% | 22.2% |
| 5 | Cogito v2 109B MoE (Deep Cogito) | 9 | 20 | 2.22 | 55.6% | 22.2% |
| 6 | Rnj-1 Instruct (Essential AI) | 9 | 20 | 2.22 | 66.7% | 11.1% |
| 7 | Llama 3.2 3B Turbo (Meta) | 9 | 17 | 1.89 | 55.6% | 0.0% |
| 8 | Llama 3.1 8B Turbo (Meta) | 9 | 16 | 1.78 | 55.6% | 11.1% |
| 9 | Mistral 7B v0.3 (Mistral) | 9 | 16 | 1.78 | 44.4% | 11.1% |
| 10 | DeepSeek R1 (Reasoning) | 8 | 13 | 1.63 | 50.0% | 0.0% |
Match-by-Match Audit
- Borussia Dortmund vs 1. FC Heidenheim: 88.0% correct tendency (22/25 models)
- VfB Stuttgart vs SC Freiburg: 67.9% correct tendency (19/28 models)
- Hamburger SV vs Bayern München: 0.0% correct tendency (0/26 models)
- Werder Bremen vs Borussia Mönchengladbach: 38.5% correct tendency (10/26 models)
- RB Leipzig vs FSV Mainz 05: 3.8% correct tendency (1/26 models)
- Eintracht Frankfurt vs Bayer Leverkusen: 88.5% correct tendency (23/26 models)
- FC Augsburg vs FC St. Pauli: 37.5% correct tendency (9/24 models)
- 1899 Hoffenheim vs Union Berlin: 64.0% correct tendency (16/25 models)
-
- FC Köln vs VfL Wolfsburg: 12.0% correct tendency (3/25 models)
Biggest Consensus Misses
- Hamburger SV vs Bayern München (2-2): Consensus predicted away win (96.2%) but result was draw
- Werder Bremen vs Borussia Mönchengladbach (1-1): Consensus predicted away win (57.7%) but result was draw
- FC Augsburg vs FC St. Pauli (2-1): Consensus predicted draw (54.2%) but result was home win
-
- FC Köln vs VfL Wolfsburg (1-0): Consensus predicted draw (52.0%) but result was home win
- RB Leipzig vs FSV Mainz 05 (1-2): Consensus predicted home win (50.0%) but result was away win
Methodology
Models were evaluated based on average points per match, with 3 points for correct score, 1 point for correct tendency. Tendency accuracy measures correct prediction of match outcome (home win, away win, or draw).
Generation cost: $0.0022
Tokens: 4,502 input + 1,152 output
Frequently Asked Questions
What is this article about?
You might also like
Bundesliga Round 23 AI Model Performance Audit
Llama 3.3 70B Instruct led Bundesliga predictions with 3.13 points per match, followed by MiniMax M2.1 (2.50) and GLM-5 (2.25). Models achieved 32.75% correct tendency overall, though the 1. FC Heidenheim vs VfB Stuttgart 3-3 draw caught most models off guard.
Feb 23, 2026
Bundesliga Round 22 AI Model Accuracy: DeepSeek R1 Leads
DeepSeek R1-0528 topped Bundesliga predictions with 3.00 points per match, followed by Llama 3.2 3B (2.71) and Trinity Large Preview (2.00). Models achieved 57.17% correct tendency overall. Hamburger SV's 3-2 win over Union Berlin was the biggest upset.
Feb 16, 2026
UEFA Europa League Round of 32 AI Model Performance Audit
GLM-5 (OpenRouter) led UEFA Europa League predictions this week with 3.25 points per match, followed by Llama 4 Scout (OpenRouter) at 2.88 and Mistral Small 3.2 24B (OpenRouter) at 2.25. Models achieved 52.63% correct tendency overall, though Ludogorets vs Ferencvarosi TC (2-1) caught most models off guard.