Hey Folks :)
I wanted some clarity on the rules. I noted that the rules state that we may only use the datasets provided. These datasets only go up to 2022, however, are we able to include the match line-ups, squads etc from the 2026 world cup?
I wanted to check, because technically we know part of this information at this point. Or is it a hard rule to say we can only use the datasets provided and no other info?
Well that clarifies things 😅 thank you for the TL;DR
It makes absolutely no sense
Constraining this will just makes all extra-dataset useless !
We need the group stage information at least if we prediction to be anything else than random shit model !
the group stage draw should be part of the dataset, otherwise the final models wouldn't even be predicting what they are supposed to predict...this is akin to being told to escape from a maze in an open field...there is no maze to escape from in the first place
I think including information from the 2026 world cup would lead to data leakage which gives you what would look like good results initially, but the model performs poorly when tested against truly unseen data. So best to avoid.
what i think the author is trying to do is to set up a more complex External dataset taking into consideration the lineups, how performative the lineup are in thier normal teams and several other informations.
think of it if we can set up an external dataset where each players has a value which was gotten from thier performance in thier respective leagues combining those values to form a line-up total per country team. seems like a cool concept :-)
100% I wanted to use the data at a per-player-match level to predict goals per player per match but oh well :'D