The Turtle Recall: Conservation Challenge was a collaboration between Deepmind and Zindi, where competitors were tasked with building a machine learning model to identify individual sea turtle faces. Sea turtles are known as indicator species, which means that their presence and abundance reflect the health of the wider ecosystem.
The competition attracted 700+ participants and 8 000+ submissions from 49 countries, all vying for a $10 000 prize pool. We talked with the winners, Stella Kimani (Plato, 1st place) and Team FlameTurbo (ZFTurbo and flamethrower, 2nd place ) shared their solutions and talked about what set their solutions apart.
Please introduce yourself
I am Stella Njeri Kimani from Nairobi, Kenya and I work as a consultant at PWC.
Tell us a bit about your solution and the approach you took.
I focused more on building a simple approach. The main idea is to use all training data to get feature representations of the images and then compare the same with unknown turtles/test data. This approach worked best when comparing it to the traditional image classification approach.
I used the latest developments in deep learning to build a diverse array of convolutional neural nets and used 10 diverse models for generalisation including Densenet 121, 202, Efficientnet 3, 4, 5, Nfnet 10, 12, Ecarasnet 50d, 101d and Seresnext101. I also used different image sizes for training: 384, 400, 512, 640, and batch sizes: 16, 25, 32, and 64.
The learning rate used was standard across all models: 1e-3. This is because I used the ArcFace Subcenter adaptive loss function which recommends a high learning rate.
In terms of hardware, I used 8 * A100-SXM4-40GB GPUs, 2 * RTX A6000-48GB GPU, CPU 32/64 and RAM 256 GB. I also used all provided images in training the models and used the extra images to mimic new_turtles when validating models. This gave me a huge boost in generalising models without losing any training data.
What set your winning solution apart from others?
The use of ArcFace Subcenter adaptive loss which was perfect for this challenge
How do you prepare for a challenge?
I checked out the Beluga whale classification challenge on Kaggle and similar other projects.
What do you like about Zindi?
I love the diversity of challenges which can make a competitor a full-stack data scientist.
Please introduce yourself.
I am Roman Solovyev from Russia, Moscow. I've been competing in computer vision tasks for several years now. My previous experience was quite helpful in this challenge. I am a chief researcher at the Institute for Design Problems in Microelectronics RAS. I deal with various problems, including the use of neural networks at the hardware level.
Tell us a bit about your solution and the approach you took.
Since there was relatively little data, my solution was based on the use of Siamese networks and a special algorithm, to find the most difficult pairs to match. My solution was slightly inferior in quality to solutions based on Metric Learning approaches. So it did not get into the final submission with the best result.
What set your winning solution apart from others?
In this task, in my opinion, the following was important:
1) I think quite a few participants tried to do a simple classification due to the relatively small number of classes. But in this challenge, it was better to use the approaches used for re-identification tasks.
2) Due to the small amount of data, the solution must be stable. To achieve this it is better to use a large set of diverse models in the ensemble.
3) Participants should use a more trusted local validation.
What are the biggest areas of opportunity you see in AI in Africa over the next few years?
AI will help automate most of the routine work and also help accelerate economic growth.
What are you looking forward to most about the Zindi community?
I hope Zindi will help in training a large number of Data Science engineers. Data Science is growing rapidly and experienced engineers are needed to put AI solutions into production in Africa.
Please introduce yourself.
I am Damola Oriola (flamethrower) of Team FlameTurbo) from Nigeria. I'm a Petroleum Engineer at Baker Hughes, working on building energy technology solutions. During my undergraduate internship, I developed an interest in data science and I have spent some time understanding the field through a lot of self-learning. In my free time, I participate in solving interesting problems in data science competitions in order to develop a skill set from firsthand experience solving problems, and also to learn the latest developments in the field while trying to solve problems. I am very passionate about data science and its application in enabling value creation across industries.
Tell us a bit about your solution, and the approach you took.
Our approach is based on ArcFace Subcenter with dynamic margins, which is a metric learning method good for few-shot learning. This allows us to use the full extent of the dataset to force the model to learn embeddings, such that similarity scores for the same turtle IDs are higher and lower for different turtle IDs. Dynamic margins help improve model convergence, by adjusting the margins due to the imbalance of the identities in the database, rather than using a constant margin with regular ArcFace.
We used image augmentations during model training exposing the augmentations to similar image orientations in the dataset of turtle identities. This way the model could become location/orientation invariant.
We trained 8 ConvNet models for ensembling of different variants: NFNETs, ResNext, DenseNet, Resnet, EFF NETS and concatenate model embeddings from each trained backbone for more diversity using an IMG size of 420, AdamW optimiser and OneCycleLR scheduler.
For validation, we designed it in phases to match the test time expectations where we are expected to also detect when there will be new turtles not currently registered in the database:
We apply the same routine for test time prediction after validating model performance in identity recognition and performance in detecting when this is a new turtle image not currently in the database.
Overall validation approach worked on public and private LB. With the best threshold of 0.55, the validation score is 0.9811, 0.9796 on public LB, and 0.976 on private LB.
What set your winning solution apart from others?
The most critical component of any successful machine learning solution is problem understanding. When you can embody this mindset, you can think of the pros and cons of every implementation, and ensure the correct approach is taken.
In this challenge, all through development we looked through how best our solution matched what is required for the problem, this enabled us to spend some time researching state-of-the-art approaches and think through what we were not taking into account at every step. We strived for generalisability, this made us consider how best to develop a facial recognition system that works and will also meet the sponsor's requirements of a solution that can work on new identities without frequent retraining. Working with this detail in our development, enabled us to build the 2nd-place winning model on LB and win the generalisability prize after sponsor review.
I believe in treating every competition like a project bestowed upon you by a company that requires a great solution. This will help you think through your solution development and pay attention to every necessary detail.
What are the biggest areas of opportunity you see in AI in Africa over the next few years?
There are vast amounts of opportunities in several industry spaces, but a critical component where AI can have a significant impact is in the infrastructure in Africa. I'm looking forward to connecting with professionals on the platform and hopefully one day collaborating to build products and companies for enabling value creation in AI. I believe from taking on diverse problems and learning from diverse thought processes in solution development, and knowledge sharing, we are all building the future of AI in Africa.
Learn more and explore their solutions on Zindi’s GitHub page here.