Ever wondered what the public and private leaderboard are on Zindi?
Today we’ll be looking at why the test set labels are divided in two and what these two sets mean for you as you take part in competitions.
Along with a train set used to train your model, most competitions on Zindi will come with a test set that resembles the training data but from which labels have been removed. It is this set that is used to evaluate the performance of your model and ultimately rank it based on how it scores against the metric in a particular competition.
So why split test set labels?
Well, models you develop are often employed in real-life scenarios that are constantly changing and as such, it’s not only important that the model performs well, but that it also performs given a new set of conditions, that is, it generalizes well.
Throughout the course of the competition, your model is evaluated on a portion of the test set and its score is made public on the Public Leaderboard offering you, as well as other competitors, an opportunity to tweak your model and improve on its performance.
Concurrently, your submissions are run against a holdout set of labels whose score is only made public at the close of the competition through the Private Leaderboard. This is one of several measures employed at Zindi to minimize overfitting of models on the data.
Neat! So what can I do to ensure I’m not overfitting the Public Leaderboard set?
Just as Zindi splits the test set labels, you too can split the training data into two or more sets and create hold out sets to use in evaluating your model’s performance. A trick so famously known as Cross Validation. Using a hold out (validation) set, you can ensure that your model doesn’t overfit the data and this even allows you to fine tune your model’s parameters without having to continuously score it on the leaderboard.
Let us know if you have any questions!
This is great information.
very informative points