Primary competition visual

Turtle Recall: Conservation Challenge

Helping Kenya
$10 000 USD
Completed (almost 4 years ago)
Classification
Computer Vision
753 joined
247 active
Starti
Nov 19, 21
Closei
Apr 21, 22
Reveali
Apr 21, 22
User avatar
marching_learning
Nostalgic Mathematics
Let's share CV vs LB
Help · 15 Feb 2022, 10:37 · 13

I notice that there is a great gap between CV & LB. I have seen some great discussions about that topic. So this gap can be explained by the fact that less than 20% of the test dataset is actually scored. But Anyway, would you like to share your CV against your LB ?

So I dive first:

CV = 0,2632 & LB=0,0158

CV = 0,3602 & LB=0,0438

Thanks

Discussion 13 answers

Please expplain further what you mean by "So this gap can be explained by the fact that less than 20% of the test dataset is actually scored"

15 Feb 2022, 17:33
Upvotes 0
User avatar
marching_learning
Nostalgic Mathematics

Somebosy tried a model submitting new_turlte all the time and it scored 0.01360544217687075 and he deduced that there 2 / 0.01360544217687075 = 147 images scored on the current leaderbord. This represents 20% of test data. Hope it helps.

I don't know if only the 20% thing can fully explain it. Separating around 30% of the images I have seen validations >0.8 which on scoring on LB have got 0.03. Random samples of the images don't get those low scores. I have not seen many ways to get a consistently higher score on LB.

17 Feb 2022, 02:44
Upvotes 0
User avatar
marching_learning
Nostalgic Mathematics

You're right. I also think that the 20% estimate of pulic LB is probably wrong and this is to be lower. For me i validate on roughly 600 images per fold and train & val score are close. That's why i suspect that the percentage of images involved in public LB is even lower maybe less than 5%. But if this not the case, we should seriously worry, because private LB could be lotery.

For sure there is something strange in LB scores (I tried to tell zindi, but they say that is OK). I think the best option is trust CV (make submissions doesnt make sense).

CV: 0.79 LB: 0.028

Don't know what's going on here. Can anybody help me with some suggestions? Thanks.

17 Feb 2022, 04:01
Upvotes 0
User avatar
marching_learning
Nostalgic Mathematics

I think you're doing good. All you have to do, is to trust CV as long as your cross-validation is ok.

One thing that maight help is to keep track of other metrics that might be 'harder' in conjunction with the Apk, such as macro accuracy and the sorts. I've seen the apk go up in validation yet these other metrics fail, and models that have done better also in the other metrics tend to do better on the leaderboard

User avatar
marching_learning
Nostalgic Mathematics

Hi @astenuz what is apk ?

It's the average precision at k, I think that's the one we're using to score entries right?

User avatar
marching_learning
Nostalgic Mathematics

Okay thks

Is there any new class in the test data that is not present in the training dataset? I guess new class data are assigned to `new_turtle` but there might be more classes. I hope I'm wrong.

MAPK on validation set (~430 images): 0.5587412587412587

Submission score: 0.03129251700680272

Copied and pasted MAPK code from the tutorial. Looking at the test images next to the predicted turtle's images I can believe an accuracy closer to 50% than 5% and yet that leaderboard score implies very bad performance.

One hypothesis: They take the sum of the APK scores for the public test set (assuming a 30/70 split that is 147, plausible for other reasons noted elsewhere in this thread). But they divide this by the total number of images 2635 instead of the 147 figure that would make sense. 0.5587*147/2635 = 0.0312. This could just be a numerical coincidence though.

Whatever the case, let's hope the issue is cleared up by the team and in the meantime, local validation seems the way to go.

19 Feb 2022, 14:38
Upvotes 0