30 Apr 2020, 07:25

Meet the winners of the Sea Turtle Rescue: Forecast Challenge

Zindi is excited to announce the winners of the Sea Turtle Rescue: Forecast Challenge. The challenge attracted 259 data scientists from across the continent and around the world, with 95 placing on the leaderboard.

The objective of this competition was to create a machine learning model to help Kenyan non-profit organization Local Ocean Conservation anticipate the number of turtles they will rescue from each of their rescue sites as part of their By-Catch Release Programme.

The models were trained on historic data on the number of turtles rescued from each site from 1998 until 2018. To date, Local Ocean Conservation has released over 10,000 sea turtles.

The winners of this challenge are: mlandry from the United States in 1st place, Team Sala7ef Enninja (Blenz, FADHLOUN) from Tunisia in 2nd place and witold in 3rd place.

A big thank you to Local Ocean Conservation for sponsoring the competition, all the participants, and especially to the winners for their generous feedback. Here are their insights.

Mark Landry (1st place)

Zindi handle: mlandry

Where are you from? United States

Tell us about the approach you took:

My solution is extremely basic: heavily smoothed historical averages by site. No time series or machine learning, just a few ways of averaging by site and different cuts of time.

The different sites had a lot of variation, so all models used the site averages in various ways. Year over year variance is also fairly high, so though I tried to use the full length of historical data, all methods to integrate data beyond four years yielded worse results.

There was a slight seasonal component, so the averages per "quarter" were used to further segment, but I did not find a reliable signal any deeper in the time series than quarters. I also explored attempts at predicting increasing or decreasing populations year-over-year, but again did not find a reliable signal in the history or any attempts to use slight extrapolations.

The main approach was to look at the data via exploratory data analysis (EDA). I produced many tables and charts to understand the data, think about potential factors, and create and validate potential hypotheses. Most of these did not yield meaningful insights, but were worthwhile to investigate nonetheless.

The actual components of the model were the 2018 average, 2017+ "period" (week_num/11) average, and 2015+ period average, all by capture site; also with three overall site weights.

What were the things that made the difference for you that you think others can learn from?

A simple benchmark is a good place to start, and it's helpful to understand how various machine learning models improve upon that benchmark or fail to do so. I usually prefer to build simple non-machine-learning models to start as the transparency helps me understand the impact of the most important features beyond seeing feature importance values. Usually, those models are quickly replaced with machine learning models, but sometimes that doesn't happen.

With the metric of RMSE, the predictions driving accuracy would be those with high capture volumes. There were not many of them, and they possessed a fairly noisy signal. Going too far beyond the capture site average proved very difficult. I used a similar no-ML approach to the recent SANRAL roads competition.

What are the biggest areas of opportunity you see in AI in Africa over the next few years?

People; data scientists. Admittedly a newcomer and outsider, I like the fluency in machine learning and artificial intelligence that Zindi users possess, and am sure that will continue. When I started on Kaggle years ago, seeing people from across the globe able to compete at a high level felt wonderfully meritocratic. With the continued prevalence of open source tools, accessibility of data science education, and emergence of cloud platforms, I am encouraged to see Zindi competitors have fluent and confident in their use of models to approach problems.

What are you looking forward to most about the Zindi community?

Seeing it grow further. There are an impressive number of open competitions, several with hundreds on the leaderboard. I have long found that the best learning happens when achieving something difficult, and also failing to overcome a hurdle after significant effort and then learning from others how they cleared it. Having this level of involvement in Zindi competitions creates a sense of achievement, making it fun and educational to compete.

This competition was hosted by Local Oceans Conservation, edgeryders, and IEEE Sup’com

What are your thoughts on our winners' feedback? Engage via the Discussions page or leave a comment on social media.