🛡️ AI in Focus: In ur opinion what is the...

CGIAR Eyes on the Ground Challenge

Helping Africa

$10 000 USD

Completed (over 2 years ago)

Skills you will learn

Prediction

872 joined

137 active

Info Data Chat Leaderboard

Start

Jul 21, 23

Nov 03, 23

Reveal

Nov 03, 23

TAUIL_Abdelilah

university abdelmalek essaadi

In ur opinion what is the best public score without using data leackage ?

Help · 25 Oct 2023, 10:03 · 9

Discussion 9 answers

Bartek

I think that ~9 is achievable. It is based on my estimation that using a leak I'm able to get ~6.8 and the leader is 0.7 better than me. And without a leak, my score is like ~10.5.

But also my test with label-leak shows that it is more crucial to predict where is `0` rather than the extent for DR. I mean that I try to label if the extent is 0 or not (not using the damage column, just the extent) and having 90% accuracy is not enough, I need ~like 95%.

25 Oct 2023, 13:14

Upvotes 2

doItLikeThis99

Thank you for your insight!

1. are you doing something other than setting predictions to 0 for damage type != DR? I'm not seeing as big of a decrease in rmse by using the leak

2. you can get to 95%? or that's just a theoretical bound for where you think it becomes useful

replied to Bartek27 Oct 2023, 08:33

Upvotes 0

Bartek

1. Exactly this is what I'm doing. Non-DR get always 0 as extent

2. Just my hypothesis, I'm getting 90%. And just looking into data, a lot of them are just incorrectly labeled data (my own evaluation). I'm thinking about removing such kind of data from training and see if it would

replied to doItLikeThis9927 Oct 2023, 13:21

Upvotes 0

doItLikeThis99

so you get almost a 4 point decrease in RMSE? do you see the same 4 pt decrease on CV rmse?

interesting. Incorrectly labeled other than the 0 zero inflated portion? that's fair, some of the labels quite bad. some sort of psuedo labeling might be interesting.

There are some other approaches as well I'm considering to deal with low quality labels

replied to Bartek27 Oct 2023, 20:17

Upvotes 0

Bartek

About lokal CV, yes, I see the same thing here.

replied to doItLikeThis9928 Oct 2023, 09:52

Upvotes 0

lyumax

Hey!

Thank you for sharing your experience!

I am a little bit confused about using damage type during the training procedure - are you using it?

Because from the official message it's not clear, if it is allowed for train

replied to Bartek28 Oct 2023, 18:53

Upvotes 0

Bartek

No, I don't use Damage column for training. I used it just for inference to check the leak. But I'm not gonna select this submission.

I try to re-create damage column by learning a classifier if extent is 0 or non-0.

replied to lyumax29 Oct 2023, 10:40

Upvotes 0

Bartek

Sorry for the confusion. I double-checked my 6.x score and it was obtained differently from what I described.

To get 6.x score I learned a model only using entries from DR (drought). During prediction only predict DR, all others are 0.

If I learn a model on entire data and then zero out non-DR, I get a score ~8.5 from ~10.5 (no leak prediction).

replied to doItLikeThis991 Nov 2023, 08:27

Upvotes 1

doItLikeThis99

oh thank you for updating! I had suspected there was two "levels" to the leak, as I was only seeing 8.x scores when postprocessing non-DR to be 0

replied to Bartek1 Nov 2023, 15:36

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status