Primary competition visual

Digital Green Crop Yield Estimate Challenge

Helping India
€9 400 EUR
Completed (over 2 years ago)
Prediction
1368 joined
678 active
Starti
Sep 04, 23
Closei
Dec 03, 23
Reveali
Dec 03, 23
Private dataset has been changed
Data · 4 Dec 2023, 09:21 · 10

It's obvious that Zindi changed the private dataset by removing all the outliers without announcing it to the participants. I stopped this competition long ago because of the settings of this competition (bad data and bad metric) so I don't feel very concerned; but for all the other participants : what a waste of time!!!!

Discussion 10 answers

That's a very brave statements. Do you have any evidence that would back it up?

4 Dec 2023, 09:23
Upvotes 0

the private score (~100) is the score that you will get if you remove all outliers from your training dataset and do a CV

User avatar
hark99
Self-employed

Yes, removing outliers is giving contrasting results on both leaderboards. They should balance the test data. Otherwise, you cannot draw good conclusions.

4 Dec 2023, 09:23
Upvotes 0
User avatar
yanteixeira

The villain of the competition is the 'Acre' feature. Without it, we would never have discovered that the outliers were actually data entry errors, and Zindi wouldn't have had to change the private test set. Also, I still strongly believe that this feature has the target leaked into it.

4 Dec 2023, 09:42
Upvotes 0
User avatar
Professor
Carnegie Mellon University Africa

While I agree that the Acre feature is strongly correlated, On the contrary, I think that's how it is in real life. The crop yield is strongly a function of the land size. So the feature by default explains what the target may look like, but isn't a target leak.

User avatar
yanteixeira

I strongly disagree with your statement. If there were a single feature with a 1:1 correlation to Yield, and we had access to this feature before predicting Yield, then there would be no need for any competition. In fact, this single feature could solve many world problems. We wouldn't need to know about the weather or the soil type for planting; we would just need to know about 'Acre'.

User avatar
Professor
Carnegie Mellon University Africa

No, no no, that's why the correlation wasn't 1:1, it would be high in real life of course maybe even up to 90% correlation. Plus this is a feature we have access to in real life before predicting the yield, that's why I think it isn't a leak.

User avatar
Professor
Carnegie Mellon University Africa

Hi mchahhou, I understand your perspective, however the public leaderboard is different from the private and they do not intersect. The Zindi team most likely did not change anything. The public LB contained outliers, but the private did not. The public LB deceived us all, CV was the way.

4 Dec 2023, 10:30
Upvotes 0

Proofs have been shown in other threads that they indeed changed the private data on purpose. This changes the whole purpose of the competition. Now instead of an outlier detection problem, we have a standard regression problem

User avatar
Sourabh

I strongly agree with your statement! This challenge was wasted too much time...

this is just like a lucky draw🎉

4 Dec 2023, 12:09
Upvotes 0