Eredivisie Week 24 AI Model Audit: Llama 4 Scout Leads
Llama 4 Scout (OpenRouter) performed best with 4.00 avg points/match, followed by Trinity Large Preview (3.22) and Llama 3.3 70B Instruct (3.00). Overall accuracy was 52.05% correct tendency. The biggest upset was Twente's 2-1 win over Groningen, which fooled 84.2% of models predicting a draw.
Llama 4 Scout (OpenRouter) performed best with 4.00 avg points/match, followed by Trinity Large Preview (3.22) and Llama 3.3 70B Instruct (3.00). Overall accuracy was 52.05% correct tendency. The biggest upset was Twente's 2-1 win over Groningen, which fooled 84.2% of models predicting a draw. This audit covers Eredivisie Regular Season - 24, featuring 9 matches. Accurate AI predictions are crucial for assessing model reliability in key fixtures. The data reveals significant performance variations.
Top 10 Models
| # | Model | Matches | Total Points | Avg Pts/Match | Tendency % | Exact % |
|---|---|---|---|---|---|---|
| 1 | Llama 4 Scout (OpenRouter) | 9 | 36 | 4.00 | 88.9% | 33.3% |
| 2 | Trinity Large Preview (OpenRouter) | 9 | 29 | 3.22 | 77.8% | 22.2% |
| 3 | Llama 3.3 70B Instruct (OpenRouter) | 9 | 27 | 3.00 | 66.7% | 33.3% |
| 4 | GPT-OSS 20B (OpenRouter) | 9 | 26 | 2.89 | 66.7% | 33.3% |
| 5 | Gemma 3 12B (OpenRouter) | 9 | 24 | 2.67 | 55.6% | 22.2% |
| 6 | Qwen3 30B A3B (OpenRouter) | 9 | 18 | 2.00 | 55.6% | 11.1% |
| 7 | GLM-5 (OpenRouter) | 9 | 18 | 2.00 | 66.7% | 11.1% |
| 8 | MiniMax M2.1 (OpenRouter) | 9 | 17 | 1.89 | 55.6% | 11.1% |
| 9 | MiniMax M2.5 (OpenRouter) | 9 | 16 | 1.78 | 55.6% | 11.1% |
| 10 | DeepSeek V3.2 (OpenRouter) | 9 | 16 | 1.78 | 33.3% | 22.2% |
Match-by-Match Audit
- Feyenoord vs Telstar (2-1): Correct tendency 89.5%, exact score hits 10.5%, consensus H (89.5%) correct.
- AZ Alkmaar vs Sparta Rotterdam (3-1): Correct tendency 21.1%, exact score hits 0.0%, consensus D (57.9%) incorrect.
- Utrecht vs PEC Zwolle (1-1): Correct tendency 36.8%, exact score hits 26.3%, consensus H (57.9%) incorrect.
- GO Ahead Eagles vs Heracles (4-0): Correct tendency 68.4%, exact score hits 0.0%, consensus H (68.4%) correct.
- Twente vs Groningen (2-1): Correct tendency 15.8%, exact score hits 15.8%, consensus D (84.2%) incorrect.
- Ajax vs NEC Nijmegen (1-1): Correct tendency 57.9%, exact score hits 10.5%, consensus D (57.9%) correct.
- NAC Breda vs FC Volendam (1-0): Correct tendency 57.9%, exact score hits 0.0%, consensus H (57.9%) correct.
- PSV Eindhoven vs Heerenveen (3-1): Correct tendency 89.5%, exact score hits 26.3%, consensus H (89.5%) correct.
- Fortuna Sittard vs Excelsior (2-1): Correct tendency 31.6%, exact score hits 21.1%, consensus D (52.6%) incorrect.
Biggest Consensus Misses
- Twente vs Groningen (2-1): Consensus D (84.2%) incorrect.
- AZ Alkmaar vs Sparta Rotterdam (3-1): Consensus D (57.9%) incorrect.
- Utrecht vs PEC Zwolle (1-1): Consensus H (57.9%) incorrect.
- Fortuna Sittard vs Excelsior (2-1): Consensus D (52.6%) incorrect.
Methodology
kroam.xyz uses a quota-based scoring system that rewards both accuracy and boldness:
Tendency Points (2-6 points): Models earn points for correctly predicting the match outcome (home win, draw, or away win). The points awarded depend on prediction rarityβif most models predicted a home win but the away team won, models who correctly predicted the away win earn more points (up to 6). Common predictions earn fewer points (minimum 2).
Goal Difference Bonus (+1 point): If the model predicts the correct goal difference (e.g., predicted 2-1 and result was 3-2, both +1 difference), they earn a bonus point.
Exact Score Bonus (+3 points): Predicting the exact final score earns 3 additional points.
Maximum: 10 points per prediction (6 tendency + 1 goal diff + 3 exact).
This system ensures that models taking calculated risks on unlikely outcomes are rewarded when correct, while also recognizing precision in exact score predictions. Learn more about our methodology.
Frequently Asked Questions
Q: Which AI model performed best in Eredivisie Regular Season - 24? A: Llama 4 Scout (OpenRouter) performed best with an average of 4.00 points per match.
Q: How accurate were AI predictions for Eredivisie this round? A: Models achieved 52.05% correct tendency and 12.28% exact score hit rate on average.
Q: What was the biggest upset in Eredivisie Regular Season - 24? A: Twente's 2-1 win over Groningen was the biggest upset, with 84.2% of models incorrectly predicting a draw.
Q: How does kroam.xyz score AI football predictions? A: kroam.xyz uses a quota-based system awarding up to 10 points per match for tendency, goal difference, and exact score accuracy.
Generation cost: $0.0020
Tokens: 4,926 input + 1,744 output
Frequently Asked Questions
What is this article about?
Which AI model performed best in Eredivisie Regular Season - 24?**?
Q: Which AI model performed best in Eredivisie Regular Season - 24?
Q: How accurate were AI predictions for Eredivisie this round?
You might also like
Eredivisie Round 23 AI Model Accuracy: Devstral Small Leads
Devstral Small (OpenRouter) led Eredivisie predictions this week with 3.43 points per match, followed by Llama 3.1 8B (OpenRouter) at 3.40 and Llama 3.3 70B Instruct (OpenRouter) at 3.25. Models achieved 36.86% correct tendency overall, though FC Volendam's 2-1 win over PSV Eindhoven caught most models off guard.
Feb 16, 2026
Eredivisie AI Predictions Audit - Regular Season 22
Gemma 3n E4B led with 3.11 avg points/match, followed by Mistral 7B v0.3 and Kimi K2 Thinking (3.00). Models achieved 50.62% correct tendency overall. Twente vs Heerenveen (5-0) was a major upset, with only 12% correct tendency.
Feb 9, 2026
UEFA Europa League Round of 32 AI Model Performance Audit
GLM-5 (OpenRouter) led UEFA Europa League predictions this week with 3.25 points per match, followed by Llama 4 Scout (OpenRouter) at 2.88 and Mistral Small 3.2 24B (OpenRouter) at 2.25. Models achieved 52.63% correct tendency overall, though Ludogorets vs Ferencvarosi TC (2-1) caught most models off guard.