Farm Pin Crop Detection Challenge
Classify fields in South Africa by crop type using Sentinel-2 satellite imagery
$11 000 USD
Ended over 3 years ago
42 active · 1229 enrolled
South Africa
Earth Observation
Winning solution (placed 1st position on Private Leaderboard)
Connect · 30 Sep 2019, 12:29 · 14

First, I will like to congratulate all “Zindians” who made a place on the leaderboard. This competition presented a difficult and unique problem. It took me months to understand the problem and make my first submission.

Convolutional Neural Networks is the gold standard for image classification. However, I dared to challenge the norm by leveraging a traditional tree based ML solution largely because I wanted a low-cost solution that is fit for use in Africa. Deep learning models require huge computing power. I am a huge fan of R programming language, as such I used R to solve this problem.

All images across the 11-time slices were extracted utilising the Raster package in R. Leveraging the spectral bands, I was able to create different vegetation indices such as NDVI, NDRE, WDRVI, MTCI etc. I used 10 of the most significant ones. Vegetation indices were quite helpful because they tell you information about the vegetation on ground such as greenness, water content, height etc. I alternated between using the median value and mean values of the image pixels based on whichever gave me a better cross validation (CV) score.

Next, because tree crops like Pecan and Dates remain relatively same year-round while crops like maize are harvested every few months, I created features around the standard deviation of the calculated Vegetation indices. For example, the greenness of Tree crops will remain relatively stable while that of crops like Maize will decline during planting periods.

Furthermore, the behaviour of varying greenness was utilised to create features around period from planting to harvesting. For instance, I extracted the date were NDVI was lowest from when it was highest to obtain duration. Also, interactions amongst the important features were created.

Once all the feature extraction and Engineering was completed, I was faced with another problem - high dimensionality of data. I had in the excess of 15,000 features. Failure to address this will lead to overfitting resulting in poor model performance. Using a feature selection algorithm, I reduced the feature to a total of 500 features.

Finally, 3 different models using 3 different samples were utilised to create an ensemble of models with XGBoost algorithm as the base.

Discussion 14 answers

Congratulations on winning the competition!

Nice job constructing all these features!

30 Sep 2019, 12:46
Upvotes 0

Thanks PermanentPon. Congratulations as well.

Awesome job, DrFad! I also used NDVI in my submissions (with the mean values) with decision trees but since I'm new to measuring plants with satellite images I didn't know about the existence of NDRE, WDRVI, MTCI, etc. So much to learn! But my gut feeling was that a tree-based model would do well in this competition since the fields are too small for a CNN to be applied easily.

Great job on feature engineering too. The prize is well deserved!

What algorithm or package did you use to pick 500 features out of 15,000?

30 Sep 2019, 13:03
Upvotes 0

Thanks. The Boruta package in R works. It utilises a variant of random forest.

Good job!

30 Sep 2019, 13:05
Upvotes 0

What was your hardware specs please.....congrats

30 Sep 2019, 13:24
Upvotes 0

@ Chuka19952. Thank you. Because it wasnt a deep learning model. 16GB RAM, 3.5GHz clockspeed and core i7 processor did the job.

@DrFad Congratulations!

If possible could you kindly share the code you used for feature engineering and feature selection

19 Feb 2020, 12:05
Upvotes 0

Thanks. Please see below code for feature selection.


# library(doParallel) #Optional for parallel running

# registerDoParallel(cores = 4) #Optional for parallel running


Farm_Boruta_Out <- Boruta(Crop_Id_Ne ~ . , data=Farm_train_med_all[,-c(1)], doTrace=2)

Farm_boruta_signif <- names(Farm_Boruta_Out$finalDecision[Farm_Boruta_Out$finalDecision %in% c("Confirmed","Tentative")]) # collect Confirmed and Tentative variables

print(Traffic_boruta_signif) # significant variables

Farm_Imp <-$ImpHistory)

Farm_Imp[] <- 0

Farm_Imp <- sapply(Farm_Imp,FUN=mean) #Obtain dataframe for list of features and their importance

plot(Farm_Boruta_Out, cex.axis=0.6, las=2, xlab="", main="Variable Importance") # plot variable importance

grid(ny = 100, lty = "dotted",lwd = 2)

save(Farm_Boruta_Out,Farm_boruta_signif, file = "Farm_Featue_Selection5.rda") #Save feature selection as rda for later use