Did anyone experience huge discrepancy between local cross-validation score and leaderboard score?
I'm doing 5 folds k-fold crossvalidation, here's what I've been getting in terms of CV vs LB scores: 0.029 -> 0.081
0.027 -> 0.067
0,049 -> 0,058
I think you mean 0.019 not 0.049 ?
Short Message is that its hard to trust your local CV since Zindi competitions always have 50/50 on Public leaderboard, if its was say 30% on Public Leaderboard then you would highly trust your Local CV. Take the message for competition purpose only. thanks
No typo there: 0.049-something. Correctly building the target variable is (IMO shouldn't be) the first step in solving the problem.
Finding a proper cross-validation process is the next part of the problem. Hint: K-fold cross-validation might not be the fittest here due to the temporal dimension in the dataset.
One thing i forgot is that there might be no correlation between CV and LB score which might make it difficult to build a model that generalises well