Dear Zindians,
As some of you have noted, there are duplicate images that appear in the train and test set. We acknowledge that this is not an ideal situation, so please accept our apologies to the whole Zindi community. Zindi is still a young platform, and we are learning and trying to be better with every challenge we release.
We have updated the reference file to exclude the "data leak" images. Over the next 48 hours your scores might change as we rescore your submissions based on the new reference files.
You can still use the Sample Submission as a template to submit as the scoring system will only score the IDs in the updated reference file.
Please note that you cannot use file metadata such as EXIF data. This is a computer vision, machine learning challenge. If you are in the top 3 of this challenge you will be required to submit your code for validation.
Thank you for keeping us accountable, and for helping to make Zindi better. We couldn’t do it without you.
The Zindi team
Hi,
could you please inform us after rescoring is done?
Are new submissions evaluated without the "leak"?
Thx
can you explain the "submit code for validation"? what code will the winner suppose to upload? inference code and model checkpoints?
I hope they will ask for a whole script that can reproduce the training fully. Otherwise someone can just handlabel test set and include it into training. Not?
yeah. it's really hard to say in competition with small test set like this. someone with expertise can handlabel the data then submit 0.000000 loss csv. but if full reproducible is required, i might have to do everything over again since i can't remember all the experiments and even if i can, randomness will beat the crap out of me :D
Given the quality of labels, I doubt that somebody can hand label it and get 0 loss :D
Will Zindi update the leaderboard manually or we should just wait?
I agree with Val_An. Fully reproducible training and inference code should be provided by top teams. This is a standard requirement in challenges to provide trust so that people will even enter the competition. Moderators understand that there is an element of randomness to runs when performing final evaluation.
In my code I currently check each image's EXIF data for an orientation tag to ensure they are input the right way up, is this now not allowed?
You may use EXIF data for orientation and regularization. However, you cannot use it in any modelling or for prediction.