I don't know if I'm doing something wrong but .... has anyone managed to get a CV score in line with the result in standings
for example: f1_CV = 0.65 -> f1_pubblic = 0.66
In my case (24th) I got f1_pubblic = 0.66 with a heavy overfit of the classifier (f1_CV = 0.32) and oversampling only training folds (for not introduce bias information into test).
how did you split your data? like 70% vs 30% or 5-fold CV?
stratified 10-fold CV for parameters opt. but for me there are no relationship between f1_CV and f1_pubblic
Which loss/metric you use? I use logloss/F1 and observe a correlation (f1_cv 0.41 -> f1_public 0.44) .
Right now I use f1 (metric) and ECB (loss), but idk whether to trust in my CV score or my public score.
Perhaps there are some leack in the public test portion and the high scores are too optimistic.
If we trust CV, the final ranking will be shaken ..... ?
It is strange. If there is leakage hiding in some features, your local CV score should be high too. Maybe public/private test data is quite different from training data, e.g. percantage of positive samples, that make score varies.