🚜 Must-Read: Private dataset has been chang...

Digital Green Crop Yield Estimate Challenge

Helping India

€9 400 EUR

Completed (over 2 years ago)

Skills you will learn

Prediction

1370 joined

677 active

Info Data Chat Leaderboard

Start

Sep 04, 23

Dec 03, 23

Reveal

Dec 03, 23

mchahhou

Private dataset has been changed

Data · 4 Dec 2023, 09:21 · 10

It's obvious that Zindi changed the private dataset by removing all the outliers without announcing it to the participants. I stopped this competition long ago because of the settings of this competition (bad data and bad metric) so I don't feel very concerned; but for all the other participants : what a waste of time!!!!

Discussion 10 answers

testgorilla

That's a very brave statements. Do you have any evidence that would back it up?

4 Dec 2023, 09:23

Upvotes 0

mchahhou

the private score (~100) is the score that you will get if you remove all outliers from your training dataset and do a CV

replied to testgorilla4 Dec 2023, 09:27

Upvotes 0

hark99

Self-employed

Yes, removing outliers is giving contrasting results on both leaderboards. They should balance the test data. Otherwise, you cannot draw good conclusions.

4 Dec 2023, 09:23

Upvotes 0

yanteixeira

The villain of the competition is the 'Acre' feature. Without it, we would never have discovered that the outliers were actually data entry errors, and Zindi wouldn't have had to change the private test set. Also, I still strongly believe that this feature has the target leaked into it.

4 Dec 2023, 09:42

Upvotes 0

Professor

Carnegie Mellon University Africa

While I agree that the Acre feature is strongly correlated, On the contrary, I think that's how it is in real life. The crop yield is strongly a function of the land size. So the feature by default explains what the target may look like, but isn't a target leak.

replied to yanteixeira4 Dec 2023, 10:22

Upvotes 0

yanteixeira

I strongly disagree with your statement. If there were a single feature with a 1:1 correlation to Yield, and we had access to this feature before predicting Yield, then there would be no need for any competition. In fact, this single feature could solve many world problems. We wouldn't need to know about the weather or the soil type for planting; we would just need to know about 'Acre'.

replied to Professor4 Dec 2023, 10:29

Upvotes 0

Professor

Carnegie Mellon University Africa

No, no no, that's why the correlation wasn't 1:1, it would be high in real life of course maybe even up to 90% correlation. Plus this is a feature we have access to in real life before predicting the yield, that's why I think it isn't a leak.

replied to yanteixeira4 Dec 2023, 10:35

Upvotes 0

Professor

Carnegie Mellon University Africa

Hi mchahhou, I understand your perspective, however the public leaderboard is different from the private and they do not intersect. The Zindi team most likely did not change anything. The public LB contained outliers, but the private did not. The public LB deceived us all, CV was the way.

4 Dec 2023, 10:30

Upvotes 0

mchahhou

Proofs have been shown in other threads that they indeed changed the private data on purpose. This changes the whole purpose of the competition. Now instead of an outlier detection problem, we have a standard regression problem

replied to Professor4 Dec 2023, 11:25

Upvotes 1

Sourabh

I strongly agree with your statement! This challenge was wasted too much time...

this is just like a lucky draw🎉

4 Dec 2023, 12:09

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status