Primary competition visual

Digital Green Crop Yield Estimate Challenge

Helping India
€9 400 EUR
Completed (over 2 years ago)
Prediction
1368 joined
678 active
Starti
Sep 04, 23
Closei
Dec 03, 23
Reveali
Dec 03, 23
User avatar
kamelyamani
No outliers in the private set !
Help Ā· 4 Dec 2023, 09:51 Ā· 17

Hello everyone,

In the discussion titled “Digging Deeper: Investigating Potential Data Entry Errors”, @amyflorida626 stated, “To guide you in your approach to this problem, we will reveal that the private leaderboard will show a distribution that will be useful to the client, where potential outliers are taken into consideration. It is up to you to determine the best way to deal with outliers in the datasets, both in your modeling and your predictions.” I have put considerable effort into detecting and treating outliers. It’s frustrating to learn that outliers have been removed after being told that they are taken into consideration.

I hope that @Zindi can look into this issue and resolve it, please.

Discussion 17 answers

Nice confirmation on how Zindi mislead all the competitors

4 Dec 2023, 09:52
Upvotes 5
User avatar
kamelyamani

Yes, why saying ‘deal with outliers’ if they will not be considered as outliers in the test set?

User avatar
Koleshjr
Multimedia university of kenya

Zindi maintains a public and private leaderboard. The ID you mention was in public lb, public lb is not used in calculating your final private lb, the other id's not present in public are the ones used to calculate your private score, and I can confidently say that the private lb had no outliers.

4 Dec 2023, 09:55
Upvotes 1
User avatar
kamelyamani

Thank you for this clarification @Koleshjr. However, this doesn’t change the fact that there was a misleading statement.

User avatar
Koleshjr
Multimedia university of kenya

Well , in as much as I also had not read the discussion by amy indepth, I don't see any misleading statement since she said, "in the private leaderboard not the public leaderboard " so public lb still had the outliers but since the private lb is what matters, they said they are going to take it into consideration which they have . Well wish I had read that discussion in details , too late now

User avatar
kamelyamani

Yes, you could see it that way. However, for me, the statement ‘It is up to you to determine the best way to deal with outliers in the datasets, both in your modeling and your predictions’ was quite misleading.

User avatar
Koleshjr
Multimedia university of kenya

Btw great work in detecting the outliers, How did you go about it? I would really love to know, because that was impressive

User avatar
yanteixeira

It was misleading because, for some reason, they didn't want to directly tell us that the data contained errors which would render any model useless.

The way I interpreted this statement was: 'We can't remove outliers from the public dataset; it's too late for that. However, they will be removed from the private test.'

To be honest, the proper way to handle the situation would have been to cancel the competition, fix the data, and start over.

Regardless, the best strategy to win this competition was to probe the entire public test and, for the private test, make predictions as if the outliers didn’t exist.

User avatar
kamelyamani

Thank you @Koleshjr. I will organize my work and share it in another thread!

If they did care about the valuable time of all competitors, they could simply state explicitly in the forum that outliers would be removed from private data.

User avatar
kamelyamani

Your interpretation was correct @yanteixeira , but I believe that in such a competition, clarity is essential to avoid any misunderstanding.

User avatar
Koleshjr
Multimedia university of kenya

Thank you @kamelyamani , will be waiting for it

User avatar
Koleshjr
Multimedia university of kenya

@yanteixeira yeah sure , I agree with you. But the problem was in misinterpretation. And after@kamelyamani has clarified how he interpreted is , true that statement could have been interpreted in different ways. A more straightforward answer like: "The private test set does not contain outliers" would have helped and atleast with that the best model could have won ( a fair ground for all). I feel bad for all of us who assumed that the private test also had outliers and decided to post process. I am pretty sure some solutions withput post processing the private test would have ranked way higher.

User avatar
yanteixeira

Yes, the statement should be clear.

I think Zindi and other platforms underestimate the time we spend in competitions. We pour our souls into the problem and suddenly find ourselves affected by a miscommunication issue. It's not fair at all.

User avatar
Juliuss
Freelance

This is very accurate and honest position😅

User avatar
100i
Ghana Health Service

Rather unfortunate that all the efforts put into handling outliers yielded nothing substantial. Nevertheless, it has been a wonderful learning experience designing clever ways to detect and deal with outliers. Going forward, it will be really helpful if @Zindi makes it their priority to accurately inform and keep participants updated as competitions progress to give participants fair chance at seeing their best models win.

4 Dec 2023, 10:13
Upvotes 2
User avatar
kamelyamani

I agree with you. I believe all serious competitors here now know all the techniques to deal with outliers x)