Primary competition visual

Digital Green Crop Yield Estimate Challenge

Helping India
€9 400 EUR
Completed (over 2 years ago)
Prediction
1368 joined
678 active
Starti
Sep 04, 23
Closei
Dec 03, 23
Reveali
Dec 03, 23
User avatar
Koleshjr
Multimedia university of kenya
Post Processing Killed me😔
Platform · 4 Dec 2023, 08:35 · 14

What a competition!!!

I thought post processing was going to be the winning trick,

Note: turns out the private test set had no outliers at allllll ------- Misleading

Update: There were outliers , only that their IDs were not used in calculating the private score

Proof: Change this id ["ID_ECWVAC40SNWB"] to 0: Notice no change in both private and public, very concerningg!!!!!

with the above concerns, its evident that I post processed an Id which was not an outlier and it hurt my score but others correctly post processed the right ID's they surely deserve better

Anyways my solution without post processing would rank top 10 and the cv vs private lb rhymed almost perfectly, funny huh.

Here is the link to the github repo with the solution, If you find it helpul please star it. I would really appreciate it

https://github.com/koleshjr/Digital-Green-Crop-Yield-Estimate-Challenge/tree/main

Discussion 14 answers
User avatar
yanteixeira

My theory is that there were outliers in the private test, but @VIRADUS post and all the comments made Zindi realize that the host would end up with a useless model.

If you look at Amy's response to that post, he said, '...we will reveal that the private leaderboard will show a distribution that will be useful to the client, where potential outliers are taken into consideration...'

This statement made it clear to me that they would remove the outliers from the private test.

4 Dec 2023, 08:48
Upvotes 0
User avatar
Koleshjr
Multimedia university of kenya

@yanteixeira wish I had made that conclusion too, anyways great learning experience and I would also love you to post all the insights you uncovered , you really made some very insightful discussions. Summing it all up would be very greatt

User avatar
Raheem_Nasirudeen
The polytechnic ibadan

Great one as always.

4 Dec 2023, 08:59
Upvotes 1
User avatar
Professor
Carnegie Mellon University Africa

Yeah, great comp. The outliers in the public LB confused most people. This is my first time seeing outliers intentionally injected into the test set (public). But again, really interesting, and models a real world scenario pretty well.

#keeplearning

4 Dec 2023, 09:19
Upvotes 1
User avatar
Koleshjr
Multimedia university of kenya

Yes, we keep learning!

User avatar
Yisakberhanu
wachemo university

The boosting method was the best; I tested my CV; bagging was so much better, but when I saw seen private score, my first benchmark catboost was the best (105) with only num_cols. I don't know how to test and train data correlations. useless effort!

4 Dec 2023, 09:22
Upvotes 0
User avatar
Koleshjr
Multimedia university of kenya

Not useless efforts, we were just unlucky we took the outliers in the public set differently by assuming they will be present in private set.... others did too but they were clever about it , ome to think about it , I would have chosen one without post processing and one with post processing , anyways no regrets , we learnt !

For me, I was too focused on how to manage outliers at the pre-processing level that I forgot that the main thing is to have a solid model that manages to generalize across the entire dataset. An error of judgment on my part especially since Zindi tells us "where potential outliers are taken into consideration..."

So disapointed knowing that my first submissions with simple boosting model which generalize well lead you to a private score of 125.

But keep learning from this competition. Great experience.

4 Dec 2023, 09:50
Upvotes 2
User avatar
Koleshjr
Multimedia university of kenya

Yes great learning experience!

User avatar
Koleshjr
Multimedia university of kenya

Everyone in this post: please find the updated discussion with new findings. Feel free to comment @yanteixeira what do you think about the updated findings?

4 Dec 2023, 11:04
Upvotes 0
User avatar
Professor
Carnegie Mellon University Africa

This is getting more interesting 😅. I don't even know what to say.

User avatar
yanteixeira

I'm actually speechless.

User avatar
Juliuss
Freelance

This is disturbing! to be fair for host and participants, they should have changed the datasets yea, but communicate to us

4 Dec 2023, 11:59
Upvotes 1
User avatar
MICADEE
LAHASCOM

@JuliusFx Exactly.... I agreed to that. Infact i am also speechless like @yanteixeira said earlier. 😵😕