🚜 Data Talk: Post Processing Killed me😔

Digital Green Crop Yield Estimate Challenge

Helping India

€9 400 EUR

Completed (over 2 years ago)

Skills you will learn

Prediction

1369 joined

677 active

Info Data Chat Leaderboard

Start

Sep 04, 23

Dec 03, 23

Reveal

Dec 03, 23

Koleshjr

Multimedia university of kenya

Post Processing Killed me😔

Platform · 4 Dec 2023, 08:35 · 13

What a competition!!!

I thought post processing was going to be the winning trick,

Note: turns out the private test set had no outliers at allllll ------- Misleading

Update: There were outliers , only that their IDs were not used in calculating the private score

Proof: Change this id ["ID_ECWVAC40SNWB"] to 0: Notice no change in both private and public, very concerningg!!!!!

with the above concerns, its evident that I post processed an Id which was not an outlier and it hurt my score but others correctly post processed the right ID's they surely deserve better

Anyways my solution without post processing would rank top 10 and the cv vs private lb rhymed almost perfectly, funny huh.

Here is the link to the github repo with the solution, If you find it helpul please star it. I would really appreciate it

https://github.com/koleshjr/Digital-Green-Crop-Yield-Estimate-Challenge/tree/main

Discussion 13 answers

yanteixeira

My theory is that there were outliers in the private test, but @VIRADUS post and all the comments made Zindi realize that the host would end up with a useless model.

If you look at Amy's response to that post, he said, '...we will reveal that the private leaderboard will show a distribution that will be useful to the client, where potential outliers are taken into consideration...'

This statement made it clear to me that they would remove the outliers from the private test.

4 Dec 2023, 08:48

Upvotes 0

Koleshjr

Multimedia university of kenya

@yanteixeira wish I had made that conclusion too, anyways great learning experience and I would also love you to post all the insights you uncovered , you really made some very insightful discussions. Summing it all up would be very greatt

replied to yanteixeira4 Dec 2023, 08:52

Upvotes 0

Raheem_Nasirudeen

Great one as always.

4 Dec 2023, 08:59

Upvotes 1

Professor

Carnegie Mellon University Africa

Yeah, great comp. The outliers in the public LB confused most people. This is my first time seeing outliers intentionally injected into the test set (public). But again, really interesting, and models a real world scenario pretty well.

#keeplearning

4 Dec 2023, 09:19

Upvotes 1

Koleshjr

Multimedia university of kenya

Yes, we keep learning!

replied to Professor4 Dec 2023, 09:52

Upvotes 0

TokyAxel

For me, I was too focused on how to manage outliers at the pre-processing level that I forgot that the main thing is to have a solid model that manages to generalize across the entire dataset. An error of judgment on my part especially since Zindi tells us "where potential outliers are taken into consideration..."

So disapointed knowing that my first submissions with simple boosting model which generalize well lead you to a private score of 125.

But keep learning from this competition. Great experience.

4 Dec 2023, 09:50

Upvotes 2

Koleshjr

Multimedia university of kenya

Yes great learning experience!

replied to TokyAxel4 Dec 2023, 09:52

Upvotes 0

Koleshjr

Multimedia university of kenya

Not useless efforts, we were just unlucky we took the outliers in the public set differently by assuming they will be present in private set.... others did too but they were clever about it , ome to think about it , I would have chosen one without post processing and one with post processing , anyways no regrets , we learnt !

4 Dec 2023, 09:51

Upvotes 1

Koleshjr

Multimedia university of kenya

Everyone in this post: please find the updated discussion with new findings. Feel free to comment @yanteixeira what do you think about the updated findings?

4 Dec 2023, 11:04

Upvotes 0