Primary competition visual

Farm Pin Crop Detection Challenge

Helping South Africa
$11 000 USD
Completed (over 6 years ago)
Classification
Earth Observation
1469 joined
42 active
Starti
Mar 04, 19
Closei
Sep 15, 19
Reveali
Sep 16, 19
2nd place solution
Help · 30 Sep 2019, 13:05 · edited ~1 hour later · 11

Congratulations to the winners and thanks to everybody for participating in this very interesting competition!

Below I explain my solution. According to the rules the winners can't share code, unfortunately. However, I gladly answer any questions.

The solution consists of four base models and one 2nd layer stacking model. I use identical 5-fold data split across all models.

Data preprocessing

To train models we need to prepare data: crop, create numpy arrays, normalize and save the data. Normalization is a critical step to train CNN models. Imagery data was z-normalized per channel, i.e. the mean was subtracted and divided by the standard deviation for each channel data.

Base models

All base models predict crop probabilities per pixel. More specifically I split imagery data into samples that represent a pixel and its neighbourhood and use this data as features. After that, I just take the mean of pixel predictions to calculate field level predictions and use them on the second level. All base models use only 10 channels of the data with 10m and 20m resolution: ['B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B11', 'B12']

All base models use all timestamps of data, i.e. 11.

CNNs

2 of the base models are 3d-CNN models with a little bit different architectures. These models are the best performers. As input to this model I use per pixel data, so I classify each pixel. Input data is 4d array - (CH, T, H, W):

  • CH - number of channels (10)
  • T - number of timestamps (11)
  • H, W - number of pixels (5)

I apply convolutions across both the spatial and temporal dimensions.

RFs

2 of the base models are Random forest models. These models have some different bias, so despite the fact it's much worse than CNN it improves the final result in the ensemble by ~1%. Input for these models is flattened imagery data from the pixel I classify and 8 neighbour pixels from all timestamps - 1188 features (11 timestamps * 12 channels * 9 pixels). 12 channels consist of 10 initial 10m, 20m channels + NDVI and NDWI. One of the models takes only 8 classes(I just don't use class 2, because it's underrepresented) and balances the training data as `1 / sqrt(number of samples)`.

2nd layer model

Lightgbm model is used to classify fields based on 1st level models predictions and other field level features. Other field level features include:

  • Area (1 feature)
  • Subregion (1 feature)
  • Latitude and longitude of the field centroids (2 features)
  • Number of closest neighbour fields(max 8) per class with distance coefficient - (100.0 / dist) ** k, where dist is the distance between a field centroid in question and centroids of neighbour fields (9 features), k - is manually chosen coefficient
  • Square of intersections between a field padded with buff buffer and neighbour fields per field class (9 features), buff- is manually chosen coefficient

Hardware used

  • Intel Core i7 - 6850K
  • 32GB RAM
  • 11GB GTX1080 Ti

Timing

To train one CNN model or RF model takes about 10 minutes per fold for both. Basically I used a tiny CNN which is very easy to train.

Please ask if you have any questions.

Discussion 11 answers

All I can say: Impressive!!

30 Sep 2019, 13:19
Upvotes 0
User avatar
Raheem_Nasirudeen
The polytechnic ibadan

Great wonderful Work. You arearehavearearehavehav

30 Sep 2019, 13:25
Upvotes 0

Congratulations and thanks for the write up!

If I understand correctly your 3D convolutional models mapped from a rank 4 input with size 10 x 11 x 5 x 5 to a rank 1 output with size 10, each number being the probability of each of the classes?

Can you give any more detail about the sizes of your convolutions?

30 Sep 2019, 13:43
Upvotes 0

Thank you!

> 3D convolutional models mapped from a rank 4 input with size 10 x 11 x 5 x 5 to a rank 1 output with size 10

Yes, almost like this, but the output size is 9.

> Can you give any more detail about the sizes of your convolutions?

I hope I don't violate any rules). I use 3-4 layers(blocks) of convolution with small filters: 2, 3 in the spatial dimensions; 3,4 in the temporal dimension. The number of filters grows from 10 to 128.

Thanks! Very interesting, I will definitely give this a go if I deal with temporal image data again.

Here are the papers I used to inspire my approach: https://www.mdpi.com/2072-4292/11/8/907, https://arxiv.org/pdf/1811.10166.pdf. Each of the papers presents different approaches which I combined.

User avatar
James cook university, townsville

Well done on the challenge! Could you recommend tools (in python preferably) to extract the image patches? I am new to the satellite images and the training farm images were very tiny for me: that is why I did not even try CNNs. I followed: https://zindi.africa/competitions/farm-pin-crop-detection-challenge/discussions/201

1 Oct 2019, 02:53
Upvotes 0

I also used rasterio as in the discussion you mentioned for this part of the task.

I have posted my code extracting bands and cropping them to fields on Github, in case anyone will find it useful: https://gist.github.com/akatasonov/cb682ff5a064e7b3cbd4223c8fbcaeeb