🌾 Challenge Chat: LB vs CV score

CGIAR Crop Damage Classification Challenge

Helping Africa

$10 000 USD

Completed (over 2 years ago)

Skills you will learn

Classification

1148 joined

346 active

Info Data Chat Leaderboard

Start

Oct 27, 23

Jan 28, 24

Reveal

Jan 28, 24

AhmedTambal

Sudan University of Science and Technology

LB vs CV score

Notebooks · 9 Jan 2024, 02:23 · 10

I did oversampling with skf and checked my validation set twice there's no data leaks in it, I used data that weren't oversample to not introduce data leak, however the gap between my cv and lb is big

CV = 45

lb = 64

Is it possible that the way the public lb and the private lb splited resulted in un equal representation of classes, for example maybe there are classes not present in the public lb and present in the private lb, or there's another explanation for my gap between cv and lb?

Discussion 10 answers

nymfree

did you oversample only on the train set? I.e after splitting train and validation sets. if not you have

9 Jan 2024, 03:07

Upvotes 0

AhmedTambal

Sudan University of Science and Technology

Yes that exactly what I did, the oversampling was Done just to the train set of the skf, the validation set contained only original data and there's no overlap between the two sets

replied to nymfree9 Jan 2024, 05:15

Upvotes 0

AhmedTambal

Sudan University of Science and Technology

I even mapped the ids of the submission file to the test csv file to see if my predictions made sense to the eye

Most of my predictions were true, at least the results I've seen won't give me a 0.64

replied to AhmedTambal9 Jan 2024, 05:19

Upvotes 0

nymfree

what is your cv / LB without oversampling?

replied to AhmedTambal9 Jan 2024, 06:23

Upvotes 0

AhmedTambal

Sudan University of Science and Technology

0.62--->0.61

replied to nymfree9 Jan 2024, 07:18

Upvotes 0

eliethesaiyan

I think curating(by hand checking images in classes and removing the ambiguous ones) the daatset might help, I think that is what .25 guys was saiying.there is a huge overlap in the data, .64 is close to the baseline,I am not sure what everyone is calling "data leak"

9 Jan 2024, 04:33

Upvotes 0