Meet the winners of the Radiant Earth Spot the Crop Challenges
Meet the winners ยท 20 Jan 2022, 13:16

Meet Raphael Kiminya, MG Ferreira, and Tien-Dung LE, our 1st place winners in the Radiant Earth Spot the Crop and Spot the Crop XL Challenges, where they competed to predict crop types in the Western Cape, South Africa, using satellite image time-series data, gathered from Sentinal-1 (radar) and Sentinel-2 (multispectral). 🚀🌾

The competition was organised in partnership with the Western Cape Department of Agriculture in South Africa, and with support from the convening sponsor GIZ FAIR Forward program, platinum sponsor Computer Vision for Global Challenges (CV4GC), and gold sponsor, Descartes Labs. The competiton attracted 831 data scientists who all worked hard to build machine learning models that identify crop type classes for both challenges. Radiant Earth generated the training data based on ground reference data collected and provided by the Western Cape Department of Agriculture.

A special thank you to our 1st place winners for sharing some insights into how they succeeded in this challenge, so we can #keeplearning and #keepwinning. 🔥

Raphael Kiminya

Meet Raphael Kiminya (kiminya) from Kenya. He won the Spot the Crop Data Challenge that used Sentinel-2 multispectral data as input to the model.

“Competitions are easily the best way to jumpstart your data science journey. You have the chance to solve a diverse range of real-world challenges.” — Raphael Kiminya
“This was my first time working with this type of data.” - Raphael Kininya

Congratulations on winning the Radiant Earth Spot the Crop Challenge! What inspired you to get involved in this field? How did you become interested in machine learning? Tell us about your machine learning journey.

Thank you so much. It’s an honour.

At my previous job, I was part of a data analytics team tasked with implementing a new business intelligence solution. I worked in various roles to support the solution:

  • Modeling ETL processes
  • Designing the data warehouse
  • Developing reports and dashboards
  • Maintaining the databases and operating systems

This exposed me to the end-to-end data engineering process. There was always something new to learn, a bug to patch, a feature to implement. I got comfortable with the idea that learning is a lifelong process — a mindset that has proved invaluable on this journey.

I often came across the terms predictive analytics, machine learning, AI — and wondered about the next step of data analysis. So when I left my old job a couple of years ago, I figured I might see what the AI hype was all about.

The world of machine learning was intimidating at first glance. The sheer scope of it all — ML applications, algorithms, research papers, blogs and tutorials, frameworks, and libraries — was overwhelming. Within all the chaos, I came across data competitions. They are well-defined projects with a fixed scope and timeline. Competitions helped me narrow my focus and learn one thing at a time.

The primary appeal of AI is its potential to solve a broad range of challenges. Over the past two years, I have completed projects across domains that I had never given a second thought before — weather forecasting, manufacturing, construction, medicine, particle physics, space exploration, and more. Machine learning is the lens through which I glimpse how the world works. I know that this is just the beginning. There is still a long way to go, and I can’t wait to see what wonderful things await.

Where did you learn about the Spot the Crop Challenge, and what made you decide to participate?

I’m a member of the Zindi community and a regular competitor, so I found out about the competition once it was published.

Using a satellite orbiting Earth from hundreds of kilometers in space to identify the type of crop growing on a farm sounded like a clever idea. I was excited to challenge myself to build such a solution.

Your winning algorithm outperformed 2045 solutions submitted by 509 participants from 65 countries. How did you approach the problem, and what do you think set you apart?

Once I understood the problem and the data structure, sequential image classification seemed like the way to go. Since the fields were of different sizes, the images needed to be resized, padded, or cropped to a uniform size. However, the variance in the field sizes was too large — from one to tens of thousands of pixels — so the approach felt wasteful and inefficient to implement. After experimenting with CNN-LSTM models, I quickly realized that the limits of my environment wouldn’t let me comfortably explore the idea.

The data could be easily represented in a tabular format, so I tried that next. I summarized the images by taking the mean of the pixels representing a particular field for each time-step and reframed the problem as time-series signal classification. I came across the fastai-based tsai library, which implements state-of-the-art time-series modeling techniques. The tool was well suited to the task. This approach was much more computationally lighter and yielded promising results from the start.

The ability to prototype ideas quickly was an enormous advantage. Initially, I focused my experiments on feature engineering, testing out the various band combinations (indices) useful for crop monitoring tasks. There are many vegetation indices in literature, and some are more useful than others in differentiating various crops. I ended up using 34 indices in the final solution.

Data augmentation was important to regularize and improve the robustness of the model. I tried out the time series augmentation techniques implemented in tsai, but most didn’t significantly improve the results. In the final solution, I only used the CutMix augmentation. Another effective augmentation technique was to divide large fields into smaller subsets, thereby increasing the total number of samples. I was careful to group the subsets into the same fold during cross-validation to prevent data leakage.

Finally, I tested the various architectures implemented in tsai. I ensembled XceptionTime and InceptionTime models in the final solution. Most of the models produced similar results, and it was really a matter of balancing speed versus accuracy.

Were you familiar with using machine learning on satellite imagery before this competition? How does this differ from common problems in computer vision?

No, this was my first time working with this type of data.

The complexity of this data sets it apart from common computer vision tasks. Normal color images have only 3 dimensions — the red, blue, and green channels. In contrast, this dataset has 13 dimensions/ bands of sequential data. Hundreds of other interactions (indices) can be derived by combining these bands using various formulae.

The unique structure of the data allows for multiple approaches to the problem. Solutions based on sequential image classification or segmentation algorithms may be the most powerful since they can take advantage of the spatial and temporal features. If speed is more important than accuracy, the data can be compressed across the space and time dimensions into a tabular format suitable for classical machine learning algorithms.

What unexpected insights into the data have you discovered?

I suspected that there was some noise in the dataset. While preprocessing the data, I noticed a few fields labeled with multiple crop types. Additionally, labels such as planted pastures, fallow, weeds, and small grain grazing sound vague and might encompass multiple crop types.

It’s also expected that some farmers may plant more than one type of crop in one season. Some of the crops with long growth cycles may have been planted alongside short-term crops with faster returns. Intercropping is a popular practice used to maximize resource utilization and reduce the risk of crop failure.

These observations may suggest why label mixing regularisation techniques like MixUp and CutMix worked really well.

Any challenges you would like to share?

I struggled with the CNN-LSTM approach, mostly because of the memory and processing constraints of my environment.

Machine learning is a fast-growing field. How do you stay up-to-date with the latest technological developments?

I mostly rely on competitions for hands-on experience. Subscriptions to relevant blogs and news feeds (e.g., Towards Data Science, MIT News) keep me immersed in the AI world. Following popular ML repositories on GitHub and tracking new research through sites like Papers With Code helps me stay updated with state-of-the-art techniques.

Any words of advice for beginner data scientists who would like to participate in data science competitions?

Competitions are easily the best way to jumpstart your data science journey. You have the chance to solve a diverse range of real-world challenges. Don’t be overwhelmed. Find a project you care about and jump in. Aim at completing it, not winning. Break down complex problems into a single thing you can accomplish in a day.

You will make mistakes along the way. Don’t beat yourself up. All hell won’t break loose if you fail. Celebrate your small victories and try again tomorrow. Accept that your journey to mastery is never-ending, and learn to enjoy the process.

See you on the leaderboard! 😁

MG Ferreira and Tien-Dung LE

Meet MG Ferreira (skaak) from South Africa, and Tien-Dung LE (Moto) from Belgium. They teamed up and won the Spot the Crop XL Challenge that used Sentinel -1 radar Sentinel-2 multispectral data as input to the model.

“I compete actively on platforms such as Zindi and Kaggle to gain practical implementation experience.” — MG Ferreira

Congratulations on winning the Radiant Earth Spot the Crop XL Data Competition. What inspired you to get involved in this field? How did you become interested in machine learning? Tell us about your machine learning journey.

MG: I trained as an econometrician but loved its technical aspects more. When I entered the labour market, derivative instruments and risk management became huge, allowing me to pursue a more technical career. I subsequently changed my course to mathematics. Along this path, I started using neural networks to construct automated trading models. This really was in desperation, as all the other techniques I tried failed to trade profitably. I am still fascinated by the field and started competing in order to stay abreast of the latest developments.

Where did you learn about the Spot the Crop XL Data Competition, and what made you decide to participate?

I learned about Zindi at Nvidia’s GTC conference in April 2021 and immediately competed in challenges. There, I met Tien-Dung, who I believe to be one of the best in the world. After competing against him, I jokingly told him to select one of the two Radiant Earth competitions so that I could stand a chance in the other one. He then invited me to form a team, and I grabbed this opportunity with both hands.

Your winning algorithm outperformed 960 solutions submitted by 322 participants from 57 countries. How did you approach the problem, and what do you think set you apart?

I think Tien-Dung’s experience helped us gain an edge. From my side, I tested numerous ideas, and funny enough, the simpler ideas often worked better, so we were very careful not to overfit. Something else that set us apart is that we started working on the solution immediately and finished our model very early in the competition. So we were done by the time the others started competing seriously. In the end, with such a playing field, there is no winning recipe, and I am grateful that we managed to get first prize.

Were you familiar with using machine learning on satellite imagery before this competition? How does this differ from common problems in computer vision?

I have competed in similar competitions, and image quality is often a problem. Another difference here is the multiple channels in an image. This makes it more of a challenge as you first have to tweak any boilerplate code you find on the Internet to deal with the multiple channels.

What unexpected insights into the data have you discovered?

A satellite photo is taken every few days, and as is, it is simply too granular to model. So you need to find the best way to aggregate. Is it, for example, better to use the average of all images in a week or a month? Is it better to use the average or the median of each channel? These questions present quite a challenge. Without going into the details, it was surprising that the simpler approaches often worked better.

Any challenges you would like to share?

When you work with satellite images, you need to be able to handle a lot of data. The data sets are huge and you must be comfortable managing them and writing efficient code to traverse such a data set quickly.

Machine learning is a fast-growing field. How do you stay up-to-date with the latest technological developments?

MG: The Internet allows you to access material at the click of a button that in earlier days would require weeks of library time to dig out. To prevent this from being just academic, I compete actively on platforms such as Zindi and Kaggle to gain practical implementation experience.

Any words of advice for beginner data scientists who would like to participate in data competitions?

MG: I would certainly encourage you to compete. The benefit is that you gain invaluable practical experience and the competitive environment forces you to use the best techniques. At the same time, the score does not mean much and, while you should play to win, realise that the benefit is the journey and not the destination. However, it would help if you were careful as it is easy to be exploited. Read the rules carefully before you sign up. Whenever the rules change midway through a competition, reevaluate your involvement and ensure there is a legitimate reason for the change.

If you would like to know more on the Radiant Earth Foundation, or read the original articles here: