Laduma Analytics Football League Winners Prediction Challenge
Can you predict the outcome of a football match based on historical data?
Prize
$2 000 USD
Time
Ended 5 months ago
Participants
154 active · 700 enrolled
Good for beginners
Prediction
Sport
How can you get score below 1? What was your approach?
Help · 3 Aug 2022, 07:50 · 4

First I tried to do some data pre-processing and feature engineering.

Then trained a RF model for feature importance calculating on a subset of my data and then filtered out all those non-imprtant features.

In the end, I trained XGB on my pre-processed data and I got train score of 0.17231 and test score of 0.17228.

But after submission, I got score of 1.383, which is very different from local test, train scores.

Did I miss something?

What was your approach (pre-proc, algorithm, etc)?

thanks

Discussion 4 answers

To LB correspondent cv score, use the two seasons as validation i.e train your model on season 1and validate it on season 2 and vice versal.

3 Aug 2022, 07:57
Upvotes 4

@JONNY how do you mean vice versa, please

You can have two splits.

1. training on season1 validation on season2
2. training on season2 validation on season1

I haven't looked too much into the competition yet, but this is probably partly a time series problem, if it's the case the standard train-test-split method will perform extremely poorly. J0NNY gave a good solution to the problem in the comment above.

There's also an implementation of time series split in sklearn that might be useful https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html . Anyway, if you google "time series train test split", you can find various explanations on how to deal with this kind of problem.

2 Sep 2022, 05:24
Upvotes 0