🚜 Data Talk: Why not take these outliers in...

Digital Green Crop Yield Estimate Challenge

Helping India

€9 400 EUR

Completed (over 2 years ago)

Skills you will learn

Prediction

1370 joined

677 active

Info Data Chat Leaderboard

Start

Sep 04, 23

Dec 03, 23

Reveal

Dec 03, 23

kamelyamani

Why not take these outliers into account?

Help · 4 Dec 2023, 11:56 · 4

Hello again,

As @tomy4reel and @Koleshjr have pointed out, there are some IDs that have not been taken into account in the private leaderboard. I think even if you didn’t detect those IDs as outliers, it’s frustrating to know that all our efforts were directed in the wrong way. I also believe that a solution robust to outliers is better than one that is not. So, I think that this is a problem needs to be resolved.

Discussion 4 answers

Koleshjr

Multimedia university of kenya

seconded

4 Dec 2023, 12:02

Upvotes 2

IshankAgarwal

I think, competitors may have created a lot better models, if tge public leaderboard have not directed us in wrong direction. Getting bad score on public lb while improving in cv due to misleading test data, was not letting us go in right direction.

And one thing i wnna know, bad scores on public lb like 400+ become a best in private lb, as errors were removed and those models cv was also good.

And some people also got very good public lb around 120 , also similar score on private lb. How?. Want to know , what practices you performed in the code to do that. And whose models should be considered better. 🤷‍♂️

4 Dec 2023, 12:08

Upvotes 0

Koleshjr

Multimedia university of kenya

I will try to answer this in my opinion: the guys with high rmse and very good private lb never post- processed the outliers in the public leaderboard . A good example is this ID: ID_PMSOXFT4FYDW change it by multiplying by 10 for it to be near 8000 and you are at 140 to 150 range , that was the easiest of them all and thats why you see many people at the 140... range. so the trick was to find the other outlier ids in public to get a good public score and some guys were just so so good at it , hats off to them , and since the private, they have not used the outlier ids in calculation of the metric you don't see any signiicant changes. But if @Zindi decides to include them, trust me you will see a very different top 20.

Models were generally good to find non-outlier preds, but the outlier preds is what was the whole point of this competition and it's sad that they have decided not to take them into consideration for the metric calculation

replied to IshankAgarwal4 Dec 2023, 12:27

Upvotes 1

IshankAgarwal

Thanks. Yeah it was brilliant seaching for such ids. I saw the discussions about this and tried by myself also. But post processing the data like that doesnt felt right to me. So i choosed the score Good in my traintestsplit around 130, but was 450 + on public lb and got 149 on private lb. Great learning btw.

replied to Koleshjr4 Dec 2023, 12:59

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status