Congrats to the winners, and thanks to Deep mind and @Zindi for hosting. I had a great time in this competition, I however noticed many folks were unable to cross over to 0.9+, so I made a simple notebook that takes you to 0.94+ on the private LB, with ideas for improvement using simple classification. feel free to access it here: https://github.com/osinkolu/Turtle-Recall-Conservation-Challenge
Great post. Thanks
Welcome aninda_bitm
Thanks for the share @Professor ! Can I ask how long this model takes to train?
Hi @DanielBruintjies, it depends on your hardware infrastructure, the whole notebook took about 200 minutes to run on Colab pro with a Tesla P100.
Ah okay, thanks, and congrats on your strong finish!
Congrats, and thanks for sharing ! I can see that dropping the low-count classes had a net postive effect.
Yes it did. There were too many classes in the extra data with not enough images to learn properly, it was best to leave them.
Thanks for sharing. Did you train your model over all 2000+ classes?
Nah......the model for this sample notebook only saw about 405 unique classes since I cut out classes with less than 7 images.
wait, what?! the data i used only had 100 unique classes ??
my bad , got what you mean. external data had +2000 classes. interesting that never crossed off my mind.
Thanks for sharing @Professor. Elegant approach.
I'm wondering what happens at Cell Output 51, Prediction 5 seems to have a None rather than a turtle ID.
Did you fix this in your best submission or there might be instances of this.
@flamethrower, congratulations once again. Yes, infact the best single submission had many Nan cells, the reason is because of the strategy I implemented, and my code makes it impossible to have the same turtle on a row.
I took care of the Nan cells during ensembling, using other submissions to fill it in asides taking the mode.
However, one thing I noticed is that filling the cells alone made almost No change in score. Infact, deleting the whole prediction (5) and probably (4) may not even change the score at all. This is because the model's best prediction is most likely in column 1, worst case scenario, column 2. The rest are most likely wrong.
Thank you bro. Yes from validation on my end, accuracy to ground truth for prediction 1 is around 0.9+, prediction 2 is between 0.2-0.5, prediction 3-5 are less than 0.1. However, I think you could have a different score on private if prediction 3-5 has correct turtle ID, maybe public will be same.
Additionally, did you explore using thresholds to detect entirely new turtles not from extra images/Train database. Also, since it seems you dropped Turtle ID less than 7, so model can't take them into account at Test time even though they are to be classified as New Turtle.
Yes, true @flamethrower. I did a kind of tradeoff by using images from 7 samples upward. for the initial question, yes, I had notebooks where I used thresholds. one, in particular, scored up to 0.92 in private, where I used diminishing thresholds as the model predicts from prediction 1 to 5 for each row. The amazing thing is that there was no extra data used in that notebook.