10 Oct 2019, 08:37

Meet the Winners of the Farm Pin Crop Detection Challenge

Get insights from challenge winners! A special thank you to the winners for their generous feedback.

Zindi is excited to announce the winners of the Farm Pin Crop Detection Challenge. The objective of the competition was to create a machine learning model to Classify fields in South Africa by crop type using Sentinel-2 satellite imagery. The challenge attracted 719 data scientists from across the continent and around the world, of whom over 42 were on the leaderboard. We are happy to introduce the winners of the competition: Olayinka Fadahunsi of Nigeria, Andrey Ponikar of the United Kingdom and Matthew Baas of South Africa!

Name: Olayinka Fadahunsi (1st prize)

Zindi handle: DrFad

Where are you from? Nigeria

Tell us a bit about yourself.

I work as a Data Scientist with Stanbic IBTC Bank ( a member of the Standard Bank Group). I enjoy creating Data Science solutions that help solve the world's pressing issues.

Tell us about the approach you took.

First, I would like to congratulate all “Zindians” who made it onto the leaderboard. This competition presented a difficult and unique problem. It took me months to understand the problem and make my first submission.
Convolutional Neural Networks is the gold standard for image classification. However, I dared to challenge the norm by leveraging a traditional tree based ML solution largely because I wanted a low-cost solution that is fit for use in Africa. Deep learning models require huge computing power. I am a huge fan of R programming language, as such I used R to solve this problem.
All images across the 11-time slices were extracted utilising the Raster package in R. Leveraging the spectral bands, I was able to create different vegetation indices such as NDVI, NDRE, WDRVI, MTCI etc. I used 10 of the most significant ones. Vegetation indices were quite helpful because they tell you information about the vegetation on the ground such as greenness, water content, height etc. I alternated between using the median value and mean values of the image pixels based on whichever gave me a better cross validation (CV) score.
Next, because tree crops like Pecan and Dates remain relatively the same year-round while crops like maize are harvested every few months, I created features around the standard deviation of the calculated Vegetation Indices. For example, the greenness of tree crops will remain relatively stable while that of crops like Maize will decline during planting periods.
Furthermore, the behaviour of varying greenness was utilised to create features around the period from planting to harvesting. For instance, I extracted the date were NDVI was lowest from when it was highest to obtain duration. Also, interactions amongst the important features were created.
Once all the feature extraction and engineering was completed, I was faced with another problem - high dimensionality of data. I had in excess of 15,000 features. Failure to address this will lead to overfitting resulting in poor model performance. Using a feature selection algorithm, I reduced the features to a total of 500.
Finally, 3 different models using 3 different samples were utilised to create an ensemble of models with the XGBoost algorithm as the base.

What were the things that made the difference for you that you think others can learn from?

A few things were instrumental to my success in this competition.
Dedication and passion. A few of the data scientists I know gave up on the challenge. When you torture data long enough, it will confess to you.
I had to learn the domains of satellite imagery and agriculture. Prior to this challenge, I had no experience with these domains. Being a domain expert is often overlooked as an important part of being a Data Scientist. There is rather a focus on statistics, maths and programming. I would not have been able to create useful features without the domain expertise.
Discipline. I ensured I did not overfit to the leaderboard. This was achieved by trusting my CV strategy. I was 3rd on the leaderboard then became 1st on the private board with an improved score.

What are the biggest areas of opportunity you see for AI in Africa over the next few years, and what are you looking forward to for the Zindi community?

In my opinion, computer vision and advanced AI algorithms which solve Africa's biggest problems such as poverty, access to clean water and disease detection.
I would also like to see more Zindians sharing ideas on the discussion board.

Name: Andrey Ponikar (2nd prize)

Zindi handle: PermanentPon

Where are you from? Bristol, UK

Tell us a bit about yourself.

I'm a Machine Learning Researcher at Cookpad (UK). Before that I worked as a data scientist, developer and consultant in different companies in Russia.

Tell us about the approach you took.

The solution consists of four base models and one 2nd layer stacking model.
All base models predict crop probabilities per pixel. More specifically, I split imagery data into samples that represent a pixel and its neighbourhood and use this data as features. After that, I took the mean of pixel predictions to calculate field level predictions and used them on the second level.
Two of the base models are 3d-CNN models with different architectures. These models are the best performers. As input to this model I used per pixel data, so I classified each pixel. I applied convolutions across both the spatial and temporal dimensions.
Two of the base models are Random Forest models. These models have different bias, so despite the fact that they perform worse than CNN models, it improves the final result in the ensemble by ~1%. Input for these models is flattened imagery data from the pixels I classified and 8 neighbour pixels from all timestamps.
2nd layer model
Lightgbm model was used to classify fields based on 1st level models predictions and other field level features extracted from the shapefiles.

What were the things that made the difference for you that you think others can learn from?

Build a good local cross-validation pipeline from start.
Read research papers on the topic before implementing anything.
Improve your performance every day bit by bit.

Name: Matthew Baas (3rd prize)

Zindi handle: Baas

Where are you from? South Africa

Tell us a bit about yourself.

I am an electrical engineering student at Stellenbosch University and I enjoy deep learning, particularly generative models and reinforcement learning.

Tell us about the approach you took.

I used PyTorch with fastai to make a simple 3-layer feed-forward neural network that is trained on tabular features extracted from the shape file geometries and pixels to predict the classes. Prediction score was improved using ensembling, averaging predictions across multiple k-fold cross-validation seeds, and deriving additional features (cloud cover, NDVI, labels of closest known crop) for each crop. These tabular features were then normalized and used to train a 3-layer neural network using PyTorch & fastai.
It was trained several times with different layer sizes (from 200 to 10000 activations in each layer) for best CV and submission score.

What were the things that made the difference for you that you think others can learn from?

To ensure that your processing pipeline is free of errors at each step of the pipeline. I had some weird bugs in my processing and feature extraction procedure that caused quite a few problems and cost me a lot of time, which could have been avoided by being more tenacious in checking the output of each stage of my preprocessing pipeline.
Other things that made the difference were switching away from CNN architectures to a simple feed-forward perceptron improved my score a lot, and then adding features improved it even more. Adding features of the closest known label and cloud/shade cover were important as well. And as always, normalizing everything into the network was critical.
Google Colab is really all one needs to take part in the competition. The basic free instance Google Colab gives you is more than enough to train the model in under a minute, which really helped since I do not have the hardware to train large models.

What are the biggest areas of opportunity you see for AI in Africa over the next few years, and what are you looking forward to for the Zindi community?

The availability of compute for African data scientists. My thought is that the main hurdle to most data science applications for small/medium businesses in Africa is the availability and cost of compute, which currently is severely strained in Africa compared to other continents.
I think this is a great area of opportunity since things like Google Colab and cloud platform providers are seemingly becoming more willing to give GPU/compute credits to African data scientists/companies, which really democratizes the compute needed to apply and scale a lot of the aspects of modern data science in the field.

This competition was hosted by Farm Pin (farmpin.com) and sponsored by Microsoft (microsoft.com), Liquid Telecom (liquidtelecom.com) and Cortex Logic (cortexlogic.com).

What are your thoughts on our winners' feedback? Engage via the Discussions page or leave a comment on social media.