AutoInland Vehicle Insurance Claim Challenge
$1,000 USD
Can you predict if a client will submit a vehicle insurance claim in the next 3 months?
583 data scientists enrolled, 233 on the leaderboard
InsuranceFinancial ServicesPredictionStructured
26 March—27 June
Ends in 2 months
CV score vs Public score
published 7 Apr 2021, 12:49

I don't know if I'm doing something wrong but .... has anyone managed to get a CV score in line with the result in standings

for example: f1_CV = 0.65 -> f1_pubblic = 0.66

In my case (24th) I got f1_pubblic = 0.66 with a heavy overfit of the classifier (f1_CV = 0.32) and oversampling only training folds (for not introduce bias information into test).

how did you split your data? like 70% vs 30% or 5-fold CV?

stratified 10-fold CV for parameters opt. but for me there are no relationship between f1_CV and f1_pubblic

Which loss/metric you use? I use logloss/F1 and observe a correlation (f1_cv 0.41 -> f1_public 0.44) .

Right now I use f1 (metric) and ECB (loss), but idk whether to trust in my CV score or my public score.

Perhaps there are some leack in the public test portion and the high scores are too optimistic.

If we trust CV, the final ranking will be shaken ..... ?

It is strange. If there is leakage hiding in some features, your local CV score should be high too. Maybe public/private test data is quite different from training data, e.g. percantage of positive samples, that make score varies.