I would like to report a constant lack of communication that Zindi has been exhibiting towards the competitors since the first day of this competition. Many problems arose, but nearly nothing was done to make things clear. On top of that, it's clear that the test set was changed. The lack of transparent communication caused several competitors to focus in the wrong direction, which would have been the right direction had the data not been changed. What is happening is not fair.
Let's review some problems that arose during the competition:
1) Ensemble rule
From the early days of the competition, participants complained about unclear resource restrictions. I myself made two posts about it:
https://zindi.africa/competitions/digital-green-crop-yield-estimate-challenge/discussions/18493
https://zindi.africa/competitions/digital-green-crop-yield-estimate-challenge/discussions/18558
The competition just ended, and this still remains a valid question. We remain uncertain about what is permissible and what is not.
2) Metric
The metric was also a topic of discussion here:
https://zindi.africa/competitions/digital-green-crop-yield-estimate-challenge/discussions/18645
Although RMSE makes sense from a business perspective, it did not make sense given the characteristics of the data. The high correlation (near 100%) between Acre and Yield creates a situation where it is impossible to predict extreme values. Suddenly, the problem was no longer about regression, but anomaly detection.
3) Data entry errors
The following post marked a turning point:
https://zindi.africa/competitions/digital-green-crop-yield-estimate-challenge/discussions/19492
Kudos to @VIRADUS for highlighting what many competitors were experiencing. We discovered that the outliers were actually data entry errors, which implies that a model scoring well could turn out to be ineffective in real-world applications. Zindi's response added to the confusion. It turns out we needed to decipher the enigmatic answer to realize that the private test could change.
---
Although I'm glad to have learned a lot in this competition, it should be noted that the experience of everyone here would be so much better with better communication.
Truee
Thank you @yanteixeira, for highlighting all those valid points! You've captured all of the important feedback that many of us have been sharing throughout this competition.
You've been impressively communicative and have consistently shared your thoughts since the beginning of the competition. Should @Zindi be in need of extra data science / communication personnel, I would highly vouch for @yanteixeira!
@Zindi I'm definitely available hahaha