Bundesliga Week 20 AI Model Audit
Qwen 2.5 7B Turbo led Bundesliga predictions with 3.00 avg points/match, followed by Llama 4 Scout and Mistral 7B v0.2 (2.56). Models achieved 44.46% correct tendency overall. Hamburger SV vs Bayern München (2-2) was a major upset.
Qwen 2.5 7B Turbo led Bundesliga predictions with 3.00 avg points/match, followed by Llama 4 Scout and Mistral 7B v0.2 (2.56). Models achieved 44.46% correct tendency overall. Hamburger SV vs Bayern München (2-2) was a major upset.
Top 10 Models
| # | Model | Matches | Total Points | Avg Pts/Match | Tendency % | Exact % |
|---|---|---|---|---|---|---|
| 1 | Qwen 2.5 7B Turbo (Alibaba) | 9 | 27 | 3.00 | 66.7% | 22.2% |
| 2 | Llama 4 Scout (Meta) | 9 | 23 | 2.56 | 66.7% | 22.2% |
| 3 | Mistral 7B v0.2 (Mistral) | 9 | 23 | 2.56 | 44.4% | 22.2% |
| 4 | Marin 8B Instruct (Marin Community) | 9 | 22 | 2.44 | 66.7% | 22.2% |
| 5 | Cogito v2 109B MoE (Deep Cogito) | 9 | 20 | 2.22 | 55.6% | 22.2% |
| 6 | Rnj-1 Instruct (Essential AI) | 9 | 20 | 2.22 | 66.7% | 11.1% |
| 7 | Llama 3.2 3B Turbo (Meta) | 9 | 17 | 1.89 | 55.6% | 0.0% |
| 8 | Llama 3.1 8B Turbo (Meta) | 9 | 16 | 1.78 | 55.6% | 11.1% |
| 9 | Mistral 7B v0.3 (Mistral) | 9 | 16 | 1.78 | 44.4% | 11.1% |
| 10 | DeepSeek R1 (Reasoning) | 8 | 13 | 1.63 | 50.0% | 0.0% |
Match-by-Match Audit
- Borussia Dortmund vs 1. FC Heidenheim: 88.0% correct tendency (22/25 models)
- VfB Stuttgart vs SC Freiburg: 67.9% correct tendency (19/28 models)
- Hamburger SV vs Bayern München: 0.0% correct tendency (0/26 models)
- Werder Bremen vs Borussia Mönchengladbach: 38.5% correct tendency (10/26 models)
- RB Leipzig vs FSV Mainz 05: 3.8% correct tendency (1/26 models)
- Eintracht Frankfurt vs Bayer Leverkusen: 88.5% correct tendency (23/26 models)
- FC Augsburg vs FC St. Pauli: 37.5% correct tendency (9/24 models)
- 1899 Hoffenheim vs Union Berlin: 64.0% correct tendency (16/25 models)
-
- FC Köln vs VfL Wolfsburg: 12.0% correct tendency (3/25 models)
Biggest Consensus Misses
- Hamburger SV vs Bayern München (2-2): Consensus predicted away win (96.2%) but result was draw
- Werder Bremen vs Borussia Mönchengladbach (1-1): Consensus predicted away win (57.7%) but result was draw
- FC Augsburg vs FC St. Pauli (2-1): Consensus predicted draw (54.2%) but result was home win
-
- FC Köln vs VfL Wolfsburg (1-0): Consensus predicted draw (52.0%) but result was home win
- RB Leipzig vs FSV Mainz 05 (1-2): Consensus predicted home win (50.0%) but result was away win
Methodology
Models were evaluated based on average points per match, with 3 points for correct score, 1 point for correct tendency. Tendency accuracy measures correct prediction of match outcome (home win, away win, or draw).
Generation cost: $0.0022
Tokens: 4,502 input + 1,152 output
Frequently Asked Questions
What is this article about?
You might also like
Bundesliga AI Model Performance Audit - Regular Season 24
MiniMax M2.5 led Bundesliga predictions with 2.89 points per match, followed by GPT-OSS 20B (2.33) and Llama 4 Scout (2.11). Models achieved 34.24% correct tendency overall, with the 1899 Hoffenheim vs FC St. Pauli upset (0-1) catching 94.7% consensus predictions wrong.
Mar 2, 2026
Bundesliga Round 23 AI Model Performance Audit
Llama 3.3 70B Instruct led Bundesliga predictions with 3.13 points per match, followed by MiniMax M2.1 (2.50) and GLM-5 (2.25). Models achieved 32.75% correct tendency overall, though the 1. FC Heidenheim vs VfB Stuttgart 3-3 draw caught most models off guard.
Feb 23, 2026
UEFA Conference League Round of 32 AI Prediction Audit
GPT-OSS 20B led UEFA Conference League predictions with 2.88 points per match, followed by Trinity Large Preview (2.63) and GLM-5 (2.25). Models achieved 38.16% correct tendency overall, with Fiorentina vs Jagiellonia (2-4) as the biggest upset.