Primary competition visual

Laduma Analytics Football League Winners Prediction Challenge

Helping Africa
$2 000 USD
Challenge completed ~3 years ago
Prediction
732 joined
154 active
Starti
Jun 06, 22
Closei
Sep 04, 22
Reveali
Sep 04, 22
How can you get score below 1? What was your approach?
Help · 3 Aug 2022, 07:50 · 4

First I tried to do some data pre-processing and feature engineering.

Then trained a RF model for feature importance calculating on a subset of my data and then filtered out all those non-imprtant features.

In the end, I trained XGB on my pre-processed data and I got train score of 0.17231 and test score of 0.17228.

But after submission, I got score of 1.383, which is very different from local test, train scores.

Did I miss something?

What was your approach (pre-proc, algorithm, etc)?

thanks

Discussion 4 answers
User avatar
J0NNY
Adama science and technology university

To LB correspondent cv score, use the two seasons as validation i.e train your model on season 1and validate it on season 2 and vice versal.

3 Aug 2022, 07:57
Upvotes 3
User avatar
MichaelOmosebi
Explore data science academy

@JONNY how do you mean vice versa, please

User avatar
J0NNY
Adama science and technology university

You can have two splits.

1. training on season1 validation on season2
2. training on season2 validation on season1

I haven't looked too much into the competition yet, but this is probably partly a time series problem, if it's the case the standard train-test-split method will perform extremely poorly. J0NNY gave a good solution to the problem in the comment above.

There's also an implementation of time series split in sklearn that might be useful https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html . Anyway, if you google "time series train test split", you can find various explanations on how to deal with this kind of problem.

2 Sep 2022, 05:24
Upvotes 0