Primary competition visual

World Cup 2026 Goal Prediction Challenge

$1 000 USD
Reveal coming soon!
Prediction
Feature Engineering
742 joined
393 active
Starti
Jun 12, 26
Closei
Jun 19, 26
Reveali
Jul 19, 26
User avatar
MediumChungus
Inclusion of 2026 match line-ups players etc.
13 Jun 2026, 17:31 · 8

Hey Folks :)

I wanted some clarity on the rules. I noted that the rules state that we may only use the datasets provided. These datasets only go up to 2022, however, are we able to include the match line-ups, squads etc from the 2026 world cup?

I wanted to check, because technically we know part of this information at this point. Or is it a hard rule to say we can only use the datasets provided and no other info?

Discussion 8 answers
  1. "You may use only the datasets provided for this challenge." Full stop. The Fjelstul DB ends at 2022; 2026 squads aren't in it.
  2. "Any other information generated during FIFA World Cup 2026 is strictly prohibited." Squad announcements, lineups, even the group draw are 2026-tournament information by any reasonable reading.
  3. "Solutions found to incorporate information from the 2026 tournament will be disqualified." — and the top 10 face mandatory code review where Zindi verifies "only the permitted historical data was used."
13 Jun 2026, 17:38
Upvotes 3
User avatar
MediumChungus

Well that clarifies things 😅 thank you for the TL;DR

User avatar
Kamenialexnea
Ecole nationale superieure polytechnique yaounde

It makes absolutely no sense

Constraining this will just makes all extra-dataset useless !

User avatar
Kamenialexnea
Ecole nationale superieure polytechnique yaounde

We need the group stage information at least if we prediction to be anything else than random shit model !

the group stage draw should be part of the dataset, otherwise the final models wouldn't even be predicting what they are supposed to predict...this is akin to being told to escape from a maze in an open field...there is no maze to escape from in the first place

I think including information from the 2026 world cup would lead to data leakage which gives you what would look like good results initially, but the model performs poorly when tested against truly unseen data. So best to avoid.

14 Jun 2026, 07:14
Upvotes 1

what i think the author is trying to do is to set up a more complex External dataset taking into consideration the lineups, how performative the lineup are in thier normal teams and several other informations.

think of it if we can set up an external dataset where each players has a value which was gotten from thier performance in thier respective leagues combining those values to form a line-up total per country team. seems like a cool concept :-)

User avatar
MediumChungus

100% I wanted to use the data at a per-player-match level to predict goals per player per match but oh well :'D