AgriFieldNet India Challenge
Can you detect crop types in a class-imbalanced satellite image dataset?
Prize
$10 000 USD
Time
Ended 25 days ago
Participants
179 active · 626 enrolled
Helping
India
Advanced
Classification
Agriculture
My approach
Help · 1 Nov 2022, 11:58 · 3

Well, I am honoured that @aninda_bitm requested me to share. Yes, I will gladly share my approach, but not my entire solution. It is a bit early to give too much details, but I will start sharing some of the important aspects of my solution here. It will be non-technical on purpose. I hope you like it like that. Over the coming weeks, especially as the leaderboard settles, I will share more bits and pieces.

Note that there was a similar competition a year ago (spot-the-crop) and there our solution won in both tracks (using S1 and S2 and just S2 imagery). That solution was the starting point. So to start, I have to go back to that one. Normally, for something like this, you start by calculating some statistics for each field. So you would calculate e.g. the mean, standard deviation, skewness and kurtosis for each band of each field and use something like a gradient booster to try and classify based on that.

In the previous competition, after a lot of testing, I realised that, for that one, the higher moments, specifically skewness and kurtosis, does not really contribute. So that (spot-the-crop) solution used only the mean and standard deviation to classify crop types.

This time around, given the small size of fields, I focussed on just the measures of locality (mean, mode, median and others). I did not even use the standard deviation or other measures of spread (much). Now keep in mind that I did not test it this time, I simply made the assumption, but that allowed me to save a lot of time and just focus on models and ways to use measures of locality to classify fields.

For what it is worth, I really struggled with this one. For a long time I was stuck on a score of 1.6+ and I had to work very hard to break through that. So while I share what I did here, I do think there are better ways to approach this.

I have a nice graph that sort of illustrates my progress which I will share later, but my progression was something like

  • quick model that I believe can work but very stuck at 1.6+
  • start to calibrate model, now stuck at 1.5+
  • add some more nifty features (mostly from my team mate @Moto) ~ 1.5
  • huge effort to improve model 1.4+
  • breakthrough idea 1.25
  • more refinements 1.21
  • finally another breakthrough idea on last day of competition! 1.20

and that was it. Now each of these bullets represent a lot of subs and a lot of hard work. I'll share details on some of these in the coming days. This sort of shows you how much you get from perspiration and how much from inspiration, but of course you need both in a competition like this.

Discussion 3 answers

Great job @skaak

Well detailed approach. If I may ask, what measure of locality did you settle for? And regarding the bands, what were your preferred bands?

I will be waiting for more details on the bullets you specified. Thanks

Thanks for sharing!!

Well, it seems the leaderboard has settled, so let me share a bit more.

Which bands to use?

In the previous competition we used just a few but important bands. This time around I started off by using the same ones, but later in an attempt to try and improve, I added a lot more. I remembered from the solution presentations that @Kiminya used a lot of bands and so I used all of his 30+ bands. Then later I reduced the bands based on feature importance metrics from our model, and later still I added them all back again.

For reference, here I refer to @Kiminya solution to spot the crop where he used 30+ bands.

It is much easier to mention what actually happened than to specify which bands to use!

This is not a binary issue - I think the bands to use will depend on the data and the model you use. If you used a gradient booster you probably had to use fewer bands and if a random forest you could use more bands. Gradient boosters suffer a bit if the features are too correlated. Here the data was perhaps not as good as in spot the crop, and using fancy bands would not fix that. As I mentioned, during the competition, I at different times used different bands, sometimes very few and sometimes a lot, and to be honest, it did not make that much of a difference.