
I'd like to clarify what submission predictions should be based on. We have game-wise stats for both training and test sets. However, I did not figure out how test set must be used. I am hesitating between two options.
Should we aggregate features for each team over 2 seasons and this is the only data we are eligible to feed into an algorithm on the test set? Or we can calculate features for each game from the test set and feed them into the algorithm? The latter approach is team-invariant and is based on game stats only.
Just download start notebook from competition and you will see how to use test data. But notice that you can do it however you want. Try different ways.
Thanks for your reply.
It was the first thing I did before creating my own notebook. And this is one of the reasons why this question popped into my head. The way they do in the starter notebook doesn't make any sense to me as they are trying to predict the outcome of a game based on its stats, not on previous games stats which isn't a real-world use case. Starter notebooks are usually more about EDA rather than anything else.
Moreover, I interpret this sentence "The objective of this challenge is to predict the outcome of a football match, based on historical match and player data" as "you should predict the outcome of game n based on data from games up to n-1, inclusively".