League Roundup

Bundesliga Week 20 AI Model Audit

February 4, 2026

2 min read

Generated by: meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8

Qwen 2.5 7B Turbo led Bundesliga predictions with 3.00 avg points/match, followed by Llama 4 Scout and Mistral 7B v0.2 (2.56). Models achieved 44.46% correct tendency overall. Hamburger SV vs Bayern München (2-2) was a major upset.

Top 10 Models

#	Model	Matches	Total Points	Avg Pts/Match	Tendency %	Exact %
1	Qwen 2.5 7B Turbo (Alibaba)	9	27	3.00	66.7%	22.2%
2	Llama 4 Scout (Meta)	9	23	2.56	66.7%	22.2%
3	Mistral 7B v0.2 (Mistral)	9	23	2.56	44.4%	22.2%
4	Marin 8B Instruct (Marin Community)	9	22	2.44	66.7%	22.2%
5	Cogito v2 109B MoE (Deep Cogito)	9	20	2.22	55.6%	22.2%
6	Rnj-1 Instruct (Essential AI)	9	20	2.22	66.7%	11.1%
7	Llama 3.2 3B Turbo (Meta)	9	17	1.89	55.6%	0.0%
8	Llama 3.1 8B Turbo (Meta)	9	16	1.78	55.6%	11.1%
9	Mistral 7B v0.3 (Mistral)	9	16	1.78	44.4%	11.1%
10	DeepSeek R1 (Reasoning)	8	13	1.63	50.0%	0.0%

Match-by-Match Audit

Borussia Dortmund vs 1. FC Heidenheim: 88.0% correct tendency (22/25 models)
VfB Stuttgart vs SC Freiburg: 67.9% correct tendency (19/28 models)
Hamburger SV vs Bayern München: 0.0% correct tendency (0/26 models)
Werder Bremen vs Borussia Mönchengladbach: 38.5% correct tendency (10/26 models)
RB Leipzig vs FSV Mainz 05: 3.8% correct tendency (1/26 models)
Eintracht Frankfurt vs Bayer Leverkusen: 88.5% correct tendency (23/26 models)
FC Augsburg vs FC St. Pauli: 37.5% correct tendency (9/24 models)
1899 Hoffenheim vs Union Berlin: 64.0% correct tendency (16/25 models)
1. FC Köln vs VfL Wolfsburg: 12.0% correct tendency (3/25 models)

Biggest Consensus Misses

Hamburger SV vs Bayern München (2-2): Consensus predicted away win (96.2%) but result was draw
Werder Bremen vs Borussia Mönchengladbach (1-1): Consensus predicted away win (57.7%) but result was draw
FC Augsburg vs FC St. Pauli (2-1): Consensus predicted draw (54.2%) but result was home win
1. FC Köln vs VfL Wolfsburg (1-0): Consensus predicted draw (52.0%) but result was home win
RB Leipzig vs FSV Mainz 05 (1-2): Consensus predicted home win (50.0%) but result was away win

Methodology

Models were evaluated based on average points per match, with 3 points for correct score, 1 point for correct tendency. Tendency accuracy measures correct prediction of match outcome (home win, away win, or draw).

Generation cost: $0.0022

Tokens: 4,502 input + 1,152 output

Frequently Asked Questions

What is this article about?

League Roundup

Bundesliga Round 23 AI Model Performance Audit

Llama 3.3 70B Instruct led Bundesliga predictions with 3.13 points per match, followed by MiniMax M2.1 (2.50) and GLM-5 (2.25). Models achieved 32.75% correct tendency overall, though the 1. FC Heidenheim vs VfB Stuttgart 3-3 draw caught most models off guard.

Feb 23, 2026

League Roundup

Bundesliga Round 22 AI Model Accuracy: DeepSeek R1 Leads

DeepSeek R1-0528 topped Bundesliga predictions with 3.00 points per match, followed by Llama 3.2 3B (2.71) and Trinity Large Preview (2.00). Models achieved 57.17% correct tendency overall. Hamburger SV's 3-2 win over Union Berlin was the biggest upset.

Feb 16, 2026

League Roundup

UEFA Europa League Round of 32 AI Model Performance Audit

GLM-5 (OpenRouter) led UEFA Europa League predictions this week with 3.25 points per match, followed by Llama 4 Scout (OpenRouter) at 2.88 and Mistral Small 3.2 24B (OpenRouter) at 2.25. Models achieved 52.63% correct tendency overall, though Ludogorets vs Ferencvarosi TC (2-1) caught most models off guard.

Bundesliga Week 20 AI Model Audit

Top 10 Models

Match-by-Match Audit

Biggest Consensus Misses

Methodology

Frequently Asked Questions

You might also like

Bundesliga Round 23 AI Model Performance Audit

Bundesliga Round 22 AI Model Accuracy: DeepSeek R1 Leads

UEFA Europa League Round of 32 AI Model Performance Audit