Primary competition visual

GeoAI Ground-level NO2 Estimation Challenge by ITU

Helping Italy
1 000 CHF
Challenge completed 12 months ago
Prediction
804 joined
372 active
Starti
May 22, 24
Closei
Nov 15, 24
Reveali
Nov 15, 24
User avatar
Satti_Tareq
LB-CV Correlation?
Help · 16 Oct 2024, 08:52 · 4

Are your scores in lb and cv correlated?I think my cv split is similar to the train test split, but actually my best lb score is my worst cv score, that is just a baseline with no added features, no datetime features and letting the missing values to be handeld by gb models..most of the times better cv scores results in worse scores in lb.

Discussion 4 answers
User avatar
ahmedo42

How are you doing cv? , you should be using different groups for train/val , look upvGroupKFold

17 Oct 2024, 15:14
Upvotes 1
User avatar
Satti_Tareq

I am not using sklearn groupfold function, but I think I am using the same idea, I am validating on 15 folds, at each fold I keep the data of six locations based on coordinates as val set and train on the rest, I use the same divisions for all my work, the six locations for each fold is differnent, and I tried to make the validation set coordinates like the test set coordinates , the process of choosing location at the first place was random.

For me, models around 9 show high correlation, so cv scores in low 9.2x also have low lb 9.3x, same with values in the high 9s (say 9.5 vs 9.6). But most of the models with local cv scores below 9 are not stable at all.

So I have models with cv of around 8.8 but the lb scores are all over the place, some in the low 9s, others mid 10s, with no clear trend. I'm using simple kfold, tried `GroupKFold` but it significantly worsened both cv and lb.

18 Oct 2024, 10:42
Upvotes 1
User avatar
Satti_Tareq

sometimes a better cv means a worse lb,I reached 7.xx in cv but in lb I got 1x.xx which is very wierd, I thought the reason may be because of the way I fill missing data, but I guess I was wrong, also dropping the training instances where target is null have a significant effect in both cv and lb.

I am monitoring both mse and mae and the standard deviation of scores is reasonable but I have no clue what to do.