Dear Zindi Team,
Best wishes for the New Year! Let me give some thoughts, which perhaps you find useful when organizing new competitions in 2019.
1) My most important point is how the test set is organized and used by participants. For now, submissions are evaluated on 80% of the whole test set (remaining 20% is used at the final evaluation). If you allow, say, 10 submissions per day, and the competition lasts a couple of months, it leads to the situation where participants can submit hundreds of times and clearly solutions are overfitted to the test set. My point is that the proportion should be reversed, or at least 50-50. Great examples are Kaggle competitions, where only a small fraction of test set examples are available and even sometimes the final evaluation is done on a completely different set.
Participants should be encouraged to build a good, possibly representative validation set on their own. Unfortunately, now it is the opposite. Even if one terribly overfits to the test set (sending hundreds of submissions), with 80% examples included in the final evaluation, he or she will be doing pretty good on the final leaderboard. I am sure you get my point. Yes, sometimes we deal with a very small datasets and it would be difficult to give only 20% of examples in the "first-phase" test set. But very often it's easy (e.g. traffic in Nairobi or the SDG text classification).
2) Before you launch a competition, you can ask the company behind what level of solution they would find useful. And then publish this information. For example, in the Nairobi traffic competition they would say (hypothetically) that prediction with error 4 (or lower) they would find very useful, particularly for busses. It gives a nice feedback for people who end competition on, say, 40th place, and can feel a bit confusing whether their model is any good (well 39 people were better) or not.
3) As Zindi gets more popular and some competitions will offer very attractive prices, they will appear "professional" Kaggle teams, who grab all the prices. From the competition standpoint, it is nothing wrong (level is higher, better models, better solutions). However, from the African data scientists community it might not be ideal. So maybe a separate price for African people or for someone from the city, which is connected to the competition.
Anyway, you're doing a great job! Good luck in 2019!
great suggestion @pawel
that right. all
Very sensible feedback.
Well said @pawel, we need as well big datasets so that we can leverage the power of deep network.
Great feedback indeed. Agree 100% on the overfitting.
The 5 day submission limit has been implemented.
Thank you for the considered and constructive suggestions. Feedback from the Zindi community is extremely valuable to us as we continue to evolve Zindi. We will definitely consider these points going forward.
The Zindi Team