🐼 Join the Buzz: Let's share CV vs LB

Turtle Recall: Conservation Challenge

Helping Kenya

$10 000 USD

Completed (~4 years ago)

Skills you will learn

Classification

Computer Vision

756 joined

246 active

Info Data Chat Leaderboard

Start

Nov 19, 21

Apr 21, 22

Reveal

Apr 21, 22

marching_learning

Nostalgic Mathematics

Let's share CV vs LB

Help · 15 Feb 2022, 10:37 · 13

I notice that there is a great gap between CV & LB. I have seen some great discussions about that topic. So this gap can be explained by the fact that less than 20% of the test dataset is actually scored. But Anyway, would you like to share your CV against your LB ?

So I dive first:

CV = 0,2632 & LB=0,0158

CV = 0,3602 & LB=0,0438

Thanks

Discussion 13 answers

sqripter

Please expplain further what you mean by "So this gap can be explained by the fact that less than 20% of the test dataset is actually scored"

15 Feb 2022, 17:33

Upvotes 0

marching_learning

Nostalgic Mathematics

Somebosy tried a model submitting new_turlte all the time and it scored 0.01360544217687075 and he deduced that there 2 / 0.01360544217687075 = 147 images scored on the current leaderbord. This represents 20% of test data. Hope it helps.

replied to sqripter15 Feb 2022, 17:52 (edited 2 minutes later)

Upvotes 0

astenuz

I don't know if only the 20% thing can fully explain it. Separating around 30% of the images I have seen validations >0.8 which on scoring on LB have got 0.03. Random samples of the images don't get those low scores. I have not seen many ways to get a consistently higher score on LB.

17 Feb 2022, 02:44

Upvotes 0

marching_learning

Nostalgic Mathematics

You're right. I also think that the 20% estimate of pulic LB is probably wrong and this is to be lower. For me i validate on roughly 600 images per fold and train & val score are close. That's why i suspect that the percentage of images involved in public LB is even lower maybe less than 5%. But if this not the case, we should seriously worry, because private LB could be lotery.

replied to astenuz17 Feb 2022, 11:19

Upvotes 0

Fnoa

For sure there is something strange in LB scores (I tried to tell zindi, but they say that is OK). I think the best option is trust CV (make submissions doesnt make sense).

replied to marching_learning17 Feb 2022, 15:25

Upvotes 0

tahsin

CV: 0.79 LB: 0.028

Don't know what's going on here. Can anybody help me with some suggestions? Thanks.

17 Feb 2022, 04:01

Upvotes 0

marching_learning

Nostalgic Mathematics

I think you're doing good. All you have to do, is to trust CV as long as your cross-validation is ok.

replied to tahsin17 Feb 2022, 12:45

Upvotes 0

astenuz

One thing that maight help is to keep track of other metrics that might be 'harder' in conjunction with the Apk, such as macro accuracy and the sorts. I've seen the apk go up in validation yet these other metrics fail, and models that have done better also in the other metrics tend to do better on the leaderboard

replied to tahsin17 Feb 2022, 15:43 (edited 1 minute later)

Upvotes 0

marching_learning

Nostalgic Mathematics

Hi @astenuz what is apk ?

replied to astenuz17 Feb 2022, 17:15

Upvotes 0

astenuz

It's the average precision at k, I think that's the one we're using to score entries right?

replied to marching_learning17 Feb 2022, 17:35

Upvotes 0

marching_learning

Nostalgic Mathematics

Okay thks

replied to astenuz17 Feb 2022, 19:21

Upvotes 0

tahsin

Is there any new class in the test data that is not present in the training dataset? I guess new class data are assigned to `new_turtle` but there might be more classes. I hope I'm wrong.

replied to marching_learning18 Feb 2022, 02:22

Upvotes 0

Johnowhitaker

MAPK on validation set (~430 images): 0.5587412587412587

Submission score: 0.03129251700680272

Copied and pasted MAPK code from the tutorial. Looking at the test images next to the predicted turtle's images I can believe an accuracy closer to 50% than 5% and yet that leaderboard score implies very bad performance.

One hypothesis: They take the sum of the APK scores for the public test set (assuming a 30/70 split that is 147, plausible for other reasons noted elsewhere in this thread). But they divide this by the total number of images 2635 instead of the 147 figure that would make sense. 0.5587*147/2635 = 0.0312. This could just be a numerical coincidence though.

Whatever the case, let's hope the issue is cleared up by the team and in the meantime, local validation seems the way to go.

19 Feb 2022, 14:38

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status