📸 Must-Read: 2nd placed solution

ICLR Workshop Challenge #2: Radiant Earth Computer Vision for Crop Detection from Satellite Imagery

Helping Kenya

$5 000 USD

Completed (over 6 years ago)

Skills you will learn

Classification

Earth Observation

654 joined

110 active

Info Data Chat Leaderboard

Start

Feb 03, 20

Mar 28, 20

Reveal

Mar 29, 20

2nd placed solution

Help · 17 Apr 2020, 19:43 · 3

Thanks to the competition host Radiant Earth, and Zindi. It was a really challenging problem.

APPROACH

I used two different approaches. The first approach involved training with 3 set of features - a) image pixel values b) about 10 vegetation/spectral indices (e.g. NDVI, AVI etc.), and their relevant statistics c) spatial features (e.g area of farm etc.). The second approach involved training with only pixel values and their relevant statistics. My solution is an ensemble weighted average of the two approaches.

MODELLING

The two approaches each went through the same modelling process by using a CatboostClassifier (without class_weights), another CatboostClassifier (with class_weights to take care of class imbalance), and a LinearDiscriminant algorithm (known in sklearn as LinearDiscriminantAnalysis - LDA ). LDA is a weak learner, so in order to improve it's performance, I bagged (ensemble) it using sklearn's BaggingClassifier. The weighted Catboost and bagged LDA added some diversity to the modelling due to the highly imbalanced dataset. Using just the single Catboost with no class_weights, I was having about 1.18 on the Public Leaderboard. By adding the two other algorithms subsequently, my score gradually improved to about 1.14 on the Public Leaderboard.

Didn't work for me:

Some other things I tried but didn't really find success with either due to wrong implementation, false hypothesis, or inherent reasons include:

Attempting to first predict if a field contains a single crop or multiple (inter-crop) - my local classification validation score wasn't impressive.
Using a 1D Convolutional Neural Network to take into consideration temporal nature of data - I'm not so good with Deep Learning.
Multi-label/MultiClassVsOne Classification.
Spectral band combinations (RGB, False Color etc).
Removing samples with high cloud probability etc...

Special appreciation to first placed KarimAmer, that's an impressive score. Your brief insight into your Deep Learning solution wowed me. Hoping to read more about your approach.

Link to my code with some notebook documentation.

Discussion 3 answers