Serie A Round 25 AI Model Performance Audit
Ministral 3 14B led Serie A predictions with 4.75 points per match, followed by Devstral 2 (3.20) and Phi-4 (3.00). Models achieved 41.75% correct tendency overall. The Udinese vs Sassuolo result was the biggest consensus miss, with 81.8% expecting a draw.
Ministral 3 14B led Serie A predictions with 4.75 points per match, followed by Devstral 2 (3.20) and Phi-4 (3.00). Models achieved 41.75% correct tendency overall. The Udinese vs Sassuolo result was the biggest consensus miss, with 81.8% expecting a draw.
Serie A Regular Season - 25 featured 9 matches, including high-profile fixtures like Inter vs Juventus. Accurate AI predictions are critical for assessing model reliability during competitive phases. This audit examines the statistical performance across all matches.
Top 10 Models
| # | Model | Matches | Total Points | Avg Pts/Match | Tendency % | Exact % |
|---|---|---|---|---|---|---|
| 1 | Ministral 3 14B (OpenRouter) | 4 | 19 | 4.75 | 75.0% | 50.0% |
| 2 | Devstral 2 (OpenRouter) | 5 | 16 | 3.20 | 60.0% | 20.0% |
| 3 | Phi-4 (OpenRouter) | 8 | 24 | 3.00 | 62.5% | 12.5% |
| 4 | Llama 4 Maverick (OpenRouter) | 9 | 25 | 2.78 | 55.6% | 22.2% |
| 5 | DeepSeek R1-0528 (OpenRouter) | 5 | 13 | 2.60 | 60.0% | 20.0% |
| 6 | GLM-4.7 (OpenRouter) | 4 | 10 | 2.50 | 25.0% | 25.0% |
| 7 | Qwen 2.5 7B (OpenRouter) | 4 | 9 | 2.25 | 75.0% | 0.0% |
| 8 | Kimi K2.5 (OpenRouter) | 5 | 11 | 2.20 | 60.0% | 0.0% |
| 9 | Trinity Large Preview (OpenRouter) | 8 | 17 | 2.13 | 50.0% | 12.5% |
| 10 | Devstral Small (OpenRouter) | 8 | 17 | 2.13 | 50.0% | 12.5% |
Match-by-Match Audit
- Napoli vs AS Roma: Result 2-2. Correct tendency 75.0%, exact score hits 0.0%. Consensus: D (75.0%) correct.
- Torino vs Bologna: Result 1-2. Correct tendency 34.8%, exact score hits 26.1%. Consensus: D (65.2%) incorrect.
- Cremonese vs Genoa: Result 0-0. Correct tendency 59.1%, exact score hits 0.0%. Consensus: D (59.1%) correct.
- Parma vs Verona: Result 2-1. Correct tendency 8.7%, exact score hits 8.7%. Consensus: D (69.6%) incorrect.
- Udinese vs Sassuolo: Result 1-2. Correct tendency 13.6%, exact score hits 9.1%. Consensus: D (81.8%) incorrect.
- Inter vs Juventus: Result 3-2. Correct tendency 46.2%, exact score hits 0.0%. Consensus: H (46.2%) correct.
- Lazio vs Atalanta: Result 0-2. Correct tendency 40.0%, exact score hits 0.0%. Consensus: D (56.0%) incorrect.
- Como vs Fiorentina: Result 1-2. Correct tendency 8.7%, exact score hits 8.7%. Consensus: H (73.9%) incorrect.
- Pisa vs AC Milan: Result 1-2. Correct tendency 89.7%, exact score hits 20.7%. Consensus: A (89.7%) correct.
Biggest Consensus Misses
- Udinese vs Sassuolo (1-2) | Consensus: D (81.8%) | Counts H/D/A: 1/18/3
- Como vs Fiorentina (1-2) | Consensus: H (73.9%) | Counts H/D/A: 17/4/2
- Parma vs Verona (2-1) | Consensus: D (69.6%) | Counts H/D/A: 2/16/5
- Torino vs Bologna (1-2) | Consensus: D (65.2%) | Counts H/D/A: 0/15/8
- Lazio vs Atalanta (0-2) | Consensus: D (56.0%) | Counts H/D/A: 1/14/10
Methodology
kroam.xyz uses a quota-based scoring system that rewards both accuracy and boldness:
Tendency Points (2-6 points): Models earn points for correctly predicting the match outcome (home win, draw, or away win). The points awarded depend on prediction rarityβif most models predicted a home win but the away team won, models who correctly predicted the away win earn more points (up to 6). Common predictions earn fewer points (minimum 2).
Goal Difference Bonus (+1 point): If the model predicts the correct goal difference (e.g., predicted 2-1 and result was 3-2, both +1 difference), they earn a bonus point.
Exact Score Bonus (+3 points): Predicting the exact final score earns 3 additional points.
Maximum: 10 points per prediction (6 tendency + 1 goal diff + 3 exact).
This system ensures that models taking calculated risks on unlikely outcomes are rewarded when correct, while also recognizing precision in exact score predictions. Learn more about our methodology.
Frequently Asked Questions
Q: Which AI model performed best in Serie A Regular Season - 25? A: Ministral 3 14B performed best with 4.75 average points per match across 4 matches.
Q: How accurate were AI predictions for Serie A this round? A: Models achieved 41.75% correct tendency and 8.14% exact score hit rate across 217 total predictions.
Q: What was the biggest upset in Serie A Regular Season - 25? A: Udinese vs Sassuolo was the biggest consensus miss, with 81.8% of models incorrectly predicting a draw instead of the 1-2 away win.
Q: How does kroam.xyz score AI football predictions? A: kroam.xyz uses a quota-based system awarding 2-6 points for correct tendency, +1 for correct goal difference, and +3 for exact score, with a maximum of 10 points per prediction.
Generation cost: $0.0021
Tokens: 4,860 input + 1,787 output
Frequently Asked Questions
What is this article about?
Which AI model performed best in Serie A Regular Season - 25?**?
Q: Which AI model performed best in Serie A Regular Season - 25?
Q: How accurate were AI predictions for Serie A this round?
You might also like
Serie A Week 26 AI Predictions: DeepSeek Leads, 37.5% Tendency Accuracy
DeepSeek R1-0528 topped Serie A predictions with 2.38 avg points/match, followed by MiniMax M2.5 and GPT-OSS 20B at 1.88. Models achieved 37.50% correct tendency overall. The biggest upset was AC Milan's 0-1 home loss to Parma, missed by 89.5% of models.
Feb 23, 2026
UEFA Europa League Round of 32 AI Model Performance Audit
GLM-5 (OpenRouter) led UEFA Europa League predictions this week with 3.25 points per match, followed by Llama 4 Scout (OpenRouter) at 2.88 and Mistral Small 3.2 24B (OpenRouter) at 2.25. Models achieved 52.63% correct tendency overall, though Ludogorets vs Ferencvarosi TC (2-1) caught most models off guard.
Feb 23, 2026
UEFA Conference League Round of 32 AI Prediction Audit
Trinity Large Preview led with 3.13 points per match, followed by Phi-4 (2.38) and Kimi K2.5 (2.13). Models achieved 33.19% correct tendency overall, with FC Noah's 1-0 win over AZ Alkmaar being the biggest surprise.