"Note that there is Public and Private Leaderboards. The Public Leaderboard excludes approximately 50% of the test dataset. While the competition is open, the Public Leaderboard will rank the submitted solutions by the accuracy score they achieve. Upon close of the competition, the Private Leaderboard, which covers 100% of the test dataset, will be made public and will constitute the final ranking for the competition."
Just to be clear, does this mean that the final standings are the score on the entire test set i.e. Public AND Private? If yes, why is it done this way? What's to stop people from probing the Public LB for the answers, or otherwise exploit the Public LB? Other Zindi competitions seem to have the same paragraph in the rules. Every other competitive data science platform that I know of bases the final scores only on the Private LB.
ohh. that is interesting and slightly peculiar. thanks for the headsup.
i guess this approach makes it less limiting that "Your highest-scoring solution will be the one by which you are judged." (taken from paragraph just before the one you were referencing). theoretically, if the public and private where mutually exclusive that couldve been annyoing. one could possibly have been afriad that one is overfitting the LB relative to your best CV model, but be unable to change once prefered submission. less of a risk now.
also likely to lead to less of a shake up when competition ends. whether that is preferable probably depends on your position on the LB. :P
after the competition ends they will release the score on the full test data and from @cobusburger zindi will be the one to choose the best private score from all your submission because sometimes best score public leaderboard may overfit and causes leaderboard shake up. though shake up will still happen 100%
While we endeavor to do a complete Public/Private Leaderboard split of the test data, many of our competition datasets have been limited in size. This is the reality of working on real challenges for various organizations. For this reason, you are right, in many competitions, we’ve used the full test set for the Private Leaderboard. How we do the split is noted in the rules of each competition. In the future, we will aim to do more complete Public/Private splits.