Primary competition visual

Turtle Recall: Conservation Challenge

Helping Kenya
$10 000 USD
Completed (almost 4 years ago)
Classification
Computer Vision
753 joined
247 active
Starti
Nov 19, 21
Closei
Apr 21, 22
Reveali
Apr 21, 22
User avatar
Professor
Carnegie Mellon University Africa
A simple approach to 0.94+
Notebooks · 22 Apr 2022, 13:26 · edited less than a minute later · 15

Congrats to the winners, and thanks to Deep mind and @Zindi for hosting. I had a great time in this competition, I however noticed many folks were unable to cross over to 0.9+, so I made a simple notebook that takes you to 0.94+ on the private LB, with ideas for improvement using simple classification. feel free to access it here: https://github.com/osinkolu/Turtle-Recall-Conservation-Challenge

Discussion 15 answers

Great post. Thanks

22 Apr 2022, 14:08
Upvotes 0
User avatar
Professor
Carnegie Mellon University Africa

Welcome aninda_bitm

User avatar
21db

Thanks for the share @Professor ! Can I ask how long this model takes to train?

22 Apr 2022, 18:23
Upvotes 0
User avatar
Professor
Carnegie Mellon University Africa

Hi @DanielBruintjies, it depends on your hardware infrastructure, the whole notebook took about 200 minutes to run on Colab pro with a Tesla P100.

User avatar
21db

Ah okay, thanks, and congrats on your strong finish!

User avatar
100i
Ghana Health Service

Congrats, and thanks for sharing ! I can see that dropping the low-count classes had a net postive effect.

22 Apr 2022, 20:12
Upvotes 0
User avatar
Professor
Carnegie Mellon University Africa

Yes it did. There were too many classes in the extra data with not enough images to learn properly, it was best to leave them.

Thanks for sharing. Did you train your model over all 2000+ classes?

22 Apr 2022, 21:19
Upvotes 0
User avatar
Professor
Carnegie Mellon University Africa

Nah......the model for this sample notebook only saw about 405 unique classes since I cut out classes with less than 7 images.

User avatar
100i
Ghana Health Service

wait, what?! the data i used only had 100 unique classes ??

User avatar
100i
Ghana Health Service

my bad , got what you mean. external data had +2000 classes. interesting that never crossed off my mind.

User avatar
flamethrower

Thanks for sharing @Professor. Elegant approach.

I'm wondering what happens at Cell Output 51, Prediction 5 seems to have a None rather than a turtle ID.

Did you fix this in your best submission or there might be instances of this.

26 Apr 2022, 09:16
Upvotes 0
User avatar
Professor
Carnegie Mellon University Africa

@flamethrower, congratulations once again. Yes, infact the best single submission had many Nan cells, the reason is because of the strategy I implemented, and my code makes it impossible to have the same turtle on a row.

I took care of the Nan cells during ensembling, using other submissions to fill it in asides taking the mode.

However, one thing I noticed is that filling the cells alone made almost No change in score. Infact, deleting the whole prediction (5) and probably (4) may not even change the score at all. This is because the model's best prediction is most likely in column 1, worst case scenario, column 2. The rest are most likely wrong.

User avatar
flamethrower

Thank you bro. Yes from validation on my end, accuracy to ground truth for prediction 1 is around 0.9+, prediction 2 is between 0.2-0.5, prediction 3-5 are less than 0.1. However, I think you could have a different score on private if prediction 3-5 has correct turtle ID, maybe public will be same.

Additionally, did you explore using thresholds to detect entirely new turtles not from extra images/Train database. Also, since it seems you dropped Turtle ID less than 7, so model can't take them into account at Test time even though they are to be classified as New Turtle.

User avatar
Professor
Carnegie Mellon University Africa

Yes, true @flamethrower. I did a kind of tradeoff by using images from 7 samples upward. for the initial question, yes, I had notebooks where I used thresholds. one, in particular, scored up to 0.92 in private, where I used diminishing thresholds as the model predicts from prediction 1 to 5 for each row. The amazing thing is that there was no extra data used in that notebook.