🚜 AI in Focus: No outliers in the private set...

Digital Green Crop Yield Estimate Challenge

Helping India

€9 400 EUR

Completed (over 2 years ago)

Skills you will learn

Prediction

1369 joined

677 active

Info Data Chat Leaderboard

Start

Sep 04, 23

Dec 03, 23

Reveal

Dec 03, 23

kamelyamani

No outliers in the private set !

Help · 4 Dec 2023, 09:51 · 17

Hello everyone,

In the discussion titled “Digging Deeper: Investigating Potential Data Entry Errors”, @amyflorida626 stated, “To guide you in your approach to this problem, we will reveal that the private leaderboard will show a distribution that will be useful to the client, where potential outliers are taken into consideration. It is up to you to determine the best way to deal with outliers in the datasets, both in your modeling and your predictions.” I have put considerable effort into detecting and treating outliers. It’s frustrating to learn that outliers have been removed after being told that they are taken into consideration.

I hope that @Zindi can look into this issue and resolve it, please.

Discussion 17 answers

mchahhou

Nice confirmation on how Zindi mislead all the competitors

4 Dec 2023, 09:52

Upvotes 5

kamelyamani

Yes, why saying ‘deal with outliers’ if they will not be considered as outliers in the test set?

replied to mchahhou4 Dec 2023, 09:56

Upvotes 0

Koleshjr

Multimedia university of kenya

Zindi maintains a public and private leaderboard. The ID you mention was in public lb, public lb is not used in calculating your final private lb, the other id's not present in public are the ones used to calculate your private score, and I can confidently say that the private lb had no outliers.

4 Dec 2023, 09:55

Upvotes 1

kamelyamani

Thank you for this clarification @Koleshjr. However, this doesn’t change the fact that there was a misleading statement.

replied to Koleshjr4 Dec 2023, 09:59

Upvotes 0

Koleshjr

Multimedia university of kenya

Well , in as much as I also had not read the discussion by amy indepth, I don't see any misleading statement since she said, "in the private leaderboard not the public leaderboard " so public lb still had the outliers but since the private lb is what matters, they said they are going to take it into consideration which they have . Well wish I had read that discussion in details , too late now

replied to kamelyamani4 Dec 2023, 10:03

Upvotes 0

kamelyamani

Yes, you could see it that way. However, for me, the statement ‘It is up to you to determine the best way to deal with outliers in the datasets, both in your modeling and your predictions’ was quite misleading.

replied to Koleshjr4 Dec 2023, 10:08

Upvotes 2

Koleshjr

Multimedia university of kenya

Btw great work in detecting the outliers, How did you go about it? I would really love to know, because that was impressive

replied to kamelyamani4 Dec 2023, 10:14

Upvotes 1

yanteixeira

It was misleading because, for some reason, they didn't want to directly tell us that the data contained errors which would render any model useless.

The way I interpreted this statement was: 'We can't remove outliers from the public dataset; it's too late for that. However, they will be removed from the private test.'

To be honest, the proper way to handle the situation would have been to cancel the competition, fix the data, and start over.

Regardless, the best strategy to win this competition was to probe the entire public test and, for the private test, make predictions as if the outliers didn’t exist.

replied to kamelyamani4 Dec 2023, 10:16

Upvotes 4

kamelyamani

Thank you @Koleshjr. I will organize my work and share it in another thread!

replied to Koleshjr4 Dec 2023, 10:20

Upvotes 0

mchahhou

If they did care about the valuable time of all competitors, they could simply state explicitly in the forum that outliers would be removed from private data.

replied to yanteixeira4 Dec 2023, 10:21

Upvotes 2

kamelyamani

Your interpretation was correct @yanteixeira , but I believe that in such a competition, clarity is essential to avoid any misunderstanding.

replied to yanteixeira4 Dec 2023, 10:23

Upvotes 0

Koleshjr

Multimedia university of kenya

Thank you @kamelyamani , will be waiting for it

replied to Koleshjr4 Dec 2023, 10:24

Upvotes 1

Koleshjr

Multimedia university of kenya

@yanteixeira yeah sure , I agree with you. But the problem was in misinterpretation. And after@kamelyamani has clarified how he interpreted is , true that statement could have been interpreted in different ways. A more straightforward answer like: "The private test set does not contain outliers" would have helped and atleast with that the best model could have won ( a fair ground for all). I feel bad for all of us who assumed that the private test also had outliers and decided to post process. I am pretty sure some solutions withput post processing the private test would have ranked way higher.

replied to yanteixeira4 Dec 2023, 10:29

Upvotes 2

yanteixeira

Yes, the statement should be clear.

I think Zindi and other platforms underestimate the time we spend in competitions. We pour our souls into the problem and suddenly find ourselves affected by a miscommunication issue. It's not fair at all.

replied to Koleshjr4 Dec 2023, 10:33

Upvotes 2

Juliuss

Freelance

This is very accurate and honest position😅

replied to yanteixeira4 Dec 2023, 11:11

Upvotes 1

100i

Ghana Health Service

Rather unfortunate that all the efforts put into handling outliers yielded nothing substantial. Nevertheless, it has been a wonderful learning experience designing clever ways to detect and deal with outliers. Going forward, it will be really helpful if @Zindi makes it their priority to accurately inform and keep participants updated as competitions progress to give participants fair chance at seeing their best models win.

4 Dec 2023, 10:13

Upvotes 2

kamelyamani

I agree with you. I believe all serious competitors here now know all the techniques to deal with outliers x)

replied to 100i4 Dec 2023, 10:25

Upvotes 1

Join the largest network for
data scientists and AI builders

About FAQs

Status