Thanks to the competition host Radiant Earth, and Zindi. It was a really challenging problem.
I used two different approaches. The first approach involved training with 3 set of features - a) image pixel values b) about 10 vegetation/spectral indices (e.g. NDVI, AVI etc.), and their relevant statistics c) spatial features (e.g area of farm etc.). The second approach involved training with only pixel values and their relevant statistics. My solution is an ensemble weighted average of the two approaches.
The two approaches each went through the same modelling process by using a CatboostClassifier (without class_weights), another CatboostClassifier (with class_weights to take care of class imbalance), and a LinearDiscriminant algorithm (known in sklearn as LinearDiscriminantAnalysis - LDA ). LDA is a weak learner, so in order to improve it's performance, I bagged (ensemble) it using sklearn's BaggingClassifier. The weighted Catboost and bagged LDA added some diversity to the modelling due to the highly imbalanced dataset. Using just the single Catboost with no class_weights, I was having about 1.18 on the Public Leaderboard. By adding the two other algorithms subsequently, my score gradually improved to about 1.14 on the Public Leaderboard.
Some other things I tried but didn't really find success with either due to wrong implementation, false hypothesis, or inherent reasons include:
Special appreciation to first placed KarimAmer, that's an impressive score. Your brief insight into your Deep Learning solution wowed me. Hoping to read more about your approach.
Link to my code with some notebook documentation.
Congratulations on your impressive work.
I will share more details with the code hopefully in the next couple of days.
Congratulations and nice
cool! Keep it up man