Eredivisie Week 24 AI Model Audit: Llama 4 Scout Leads
Llama 4 Scout (OpenRouter) performed best with 4.00 avg points/match, followed by Trinity Large Preview (3.22) and Llama 3.3 70B Instruct (3.00). Overall accuracy was 52.05% correct tendency. The biggest upset was Twente's 2-1 win over Groningen, which fooled 84.2% of models predicting a draw.
Llama 4 Scout (OpenRouter) performed best with 4.00 avg points/match, followed by Trinity Large Preview (3.22) and Llama 3.3 70B Instruct (3.00). Overall accuracy was 52.05% correct tendency. The biggest upset was Twente's 2-1 win over Groningen, which fooled 84.2% of models predicting a draw. This audit covers Eredivisie Regular Season - 24, featuring 9 matches. Accurate AI predictions are crucial for assessing model reliability in key fixtures. The data reveals significant performance variations.
Top 10 Models
| # | Model | Matches | Total Points | Avg Pts/Match | Tendency % | Exact % |
|---|---|---|---|---|---|---|
| 1 | Llama 4 Scout (OpenRouter) | 9 | 36 | 4.00 | 88.9% | 33.3% |
| 2 | Trinity Large Preview (OpenRouter) | 9 | 29 | 3.22 | 77.8% | 22.2% |
| 3 | Llama 3.3 70B Instruct (OpenRouter) | 9 | 27 | 3.00 | 66.7% | 33.3% |
| 4 | GPT-OSS 20B (OpenRouter) | 9 | 26 | 2.89 | 66.7% | 33.3% |
| 5 | Gemma 3 12B (OpenRouter) | 9 | 24 | 2.67 | 55.6% | 22.2% |
| 6 | Qwen3 30B A3B (OpenRouter) | 9 | 18 | 2.00 | 55.6% | 11.1% |
| 7 | GLM-5 (OpenRouter) | 9 | 18 | 2.00 | 66.7% | 11.1% |
| 8 | MiniMax M2.1 (OpenRouter) | 9 | 17 | 1.89 | 55.6% | 11.1% |
| 9 | MiniMax M2.5 (OpenRouter) | 9 | 16 | 1.78 | 55.6% | 11.1% |
| 10 | DeepSeek V3.2 (OpenRouter) | 9 | 16 | 1.78 | 33.3% | 22.2% |
Match-by-Match Audit
- Feyenoord vs Telstar (2-1): Correct tendency 89.5%, exact score hits 10.5%, consensus H (89.5%) correct.
- AZ Alkmaar vs Sparta Rotterdam (3-1): Correct tendency 21.1%, exact score hits 0.0%, consensus D (57.9%) incorrect.
- Utrecht vs PEC Zwolle (1-1): Correct tendency 36.8%, exact score hits 26.3%, consensus H (57.9%) incorrect.
- GO Ahead Eagles vs Heracles (4-0): Correct tendency 68.4%, exact score hits 0.0%, consensus H (68.4%) correct.
- Twente vs Groningen (2-1): Correct tendency 15.8%, exact score hits 15.8%, consensus D (84.2%) incorrect.
- Ajax vs NEC Nijmegen (1-1): Correct tendency 57.9%, exact score hits 10.5%, consensus D (57.9%) correct.
- NAC Breda vs FC Volendam (1-0): Correct tendency 57.9%, exact score hits 0.0%, consensus H (57.9%) correct.
- PSV Eindhoven vs Heerenveen (3-1): Correct tendency 89.5%, exact score hits 26.3%, consensus H (89.5%) correct.
- Fortuna Sittard vs Excelsior (2-1): Correct tendency 31.6%, exact score hits 21.1%, consensus D (52.6%) incorrect.
Biggest Consensus Misses
- Twente vs Groningen (2-1): Consensus D (84.2%) incorrect.
- AZ Alkmaar vs Sparta Rotterdam (3-1): Consensus D (57.9%) incorrect.
- Utrecht vs PEC Zwolle (1-1): Consensus H (57.9%) incorrect.
- Fortuna Sittard vs Excelsior (2-1): Consensus D (52.6%) incorrect.
Methodology
kroam.xyz uses a quota-based scoring system that rewards both accuracy and boldness:
Tendency Points (2-6 points): Models earn points for correctly predicting the match outcome (home win, draw, or away win). The points awarded depend on prediction rarityβif most models predicted a home win but the away team won, models who correctly predicted the away win earn more points (up to 6). Common predictions earn fewer points (minimum 2).
Goal Difference Bonus (+1 point): If the model predicts the correct goal difference (e.g., predicted 2-1 and result was 3-2, both +1 difference), they earn a bonus point.
Exact Score Bonus (+3 points): Predicting the exact final score earns 3 additional points.
Maximum: 10 points per prediction (6 tendency + 1 goal diff + 3 exact).
This system ensures that models taking calculated risks on unlikely outcomes are rewarded when correct, while also recognizing precision in exact score predictions. Learn more about our methodology.
Frequently Asked Questions
Q: Which AI model performed best in Eredivisie Regular Season - 24? A: Llama 4 Scout (OpenRouter) performed best with an average of 4.00 points per match.
Q: How accurate were AI predictions for Eredivisie this round? A: Models achieved 52.05% correct tendency and 12.28% exact score hit rate on average.
Q: What was the biggest upset in Eredivisie Regular Season - 24? A: Twente's 2-1 win over Groningen was the biggest upset, with 84.2% of models incorrectly predicting a draw.
Q: How does kroam.xyz score AI football predictions? A: kroam.xyz uses a quota-based system awarding up to 10 points per match for tendency, goal difference, and exact score accuracy.
Generation cost: $0.0020
Tokens: 4,926 input + 1,744 output
Frequently Asked Questions
What is this article about?
Which AI model performed best in Eredivisie Regular Season - 24?**?
Q: Which AI model performed best in Eredivisie Regular Season - 24?
Q: How accurate were AI predictions for Eredivisie this round?
You might also like
Eredivisie Week 25 AI Model Performance Audit
Gemma 3 12B led Eredivisie predictions with 1.78 avg points/match, followed by Llama 4 Scout and MiniMax M2.1 at 1.44. Models achieved 19.30% correct tendency overall, with the 2-3 upset in NEC Nijmegen vs Fortuna Sittard being the biggest surprise.
Mar 2, 2026
UEFA Conference League Round of 32 AI Prediction Audit
GPT-OSS 20B led UEFA Conference League predictions with 2.88 points per match, followed by Trinity Large Preview (2.63) and GLM-5 (2.25). Models achieved 38.16% correct tendency overall, with Fiorentina vs Jagiellonia (2-4) as the biggest upset.
Mar 2, 2026
UEFA Europa League Round of 32 AI Model Performance Audit
Mistral Small 3.2 24B led predictions with 3.38 avg points/match, followed by Phi-4 (2.88) and Llama 4 Scout (2.75). Models achieved 38.82% correct tendency. VfB Stuttgart's 0-1 loss to Celtic was the biggest consensus miss.
Mar 2, 2026