Well, I am honoured that @aninda_bitm requested me to share. Yes, I will gladly share my approach, but not my entire solution. It is a bit early to give too much details, but I will start sharing some of the important aspects of my solution here. It will be non-technical on purpose. I hope you like it like that. Over the coming weeks, especially as the leaderboard settles, I will share more bits and pieces.
Note that there was a similar competition a year ago (spot-the-crop) and there our solution won in both tracks (using S1 and S2 and just S2 imagery). That solution was the starting point. So to start, I have to go back to that one. Normally, for something like this, you start by calculating some statistics for each field. So you would calculate e.g. the mean, standard deviation, skewness and kurtosis for each band of each field and use something like a gradient booster to try and classify based on that.
In the previous competition, after a lot of testing, I realised that, for that one, the higher moments, specifically skewness and kurtosis, does not really contribute. So that (spot-the-crop) solution used only the mean and standard deviation to classify crop types.
This time around, given the small size of fields, I focussed on just the measures of locality (mean, mode, median and others). I did not even use the standard deviation or other measures of spread (much). Now keep in mind that I did not test it this time, I simply made the assumption, but that allowed me to save a lot of time and just focus on models and ways to use measures of locality to classify fields.
For what it is worth, I really struggled with this one. For a long time I was stuck on a score of 1.6+ and I had to work very hard to break through that. So while I share what I did here, I do think there are better ways to approach this.
I have a nice graph that sort of illustrates my progress which I will share later, but my progression was something like
and that was it. Now each of these bullets represent a lot of subs and a lot of hard work. I'll share details on some of these in the coming days. This sort of shows you how much you get from perspiration and how much from inspiration, but of course you need both in a competition like this.
Great job @skaak
Well detailed approach. If I may ask, what measure of locality did you settle for? And regarding the bands, what were your preferred bands?
I will be waiting for more details on the bullets you specified. Thanks
Thanks for sharing!!
Well, it seems the leaderboard has settled, so let me share a bit more.
Which bands to use?
In the previous competition we used just a few but important bands. This time around I started off by using the same ones, but later in an attempt to try and improve, I added a lot more. I remembered from the solution presentations that @Kiminya used a lot of bands and so I used all of his 30+ bands. Then later I reduced the bands based on feature importance metrics from our model, and later still I added them all back again.
For reference, here I refer to @Kiminya solution to spot the crop where he used 30+ bands.
It is much easier to mention what actually happened than to specify which bands to use!
This is not a binary issue - I think the bands to use will depend on the data and the model you use. If you used a gradient booster you probably had to use fewer bands and if a random forest you could use more bands. Gradient boosters suffer a bit if the features are too correlated. Here the data was perhaps not as good as in spot the crop, and using fancy bands would not fix that. As I mentioned, during the competition, I at different times used different bands, sometimes very few and sometimes a lot, and to be honest, it did not make that much of a difference.
Thanks for sharing, but the 30 bands you are talking about from @Kiminya, is there a write up about it?
Hi @Koleshjr
You can find those bands on github (Radiant Earth "spot the crop" winning solution repo).
What wins a competition - inspiration or perspiration?
Well, I kept track of my progress in this comp. Here e.g. are the local and public LB scores throughout the competition.
As you can see, at times there is little correlation between the two. Then you probably are going in the wrong direction ... one nice confirmation of your model is if the LB and CV align.
Anyhow, back to question: hard work or good ideas. Well, here is another graph I prepared during this competiton. I've blacked out some stuff for now, perhaps will share it later, but for this discussion does not matter.
Here each coloured band represents a model grouping. It shows my progress for this over time. You can see how, initially, there was very little correlation between CV and LB and later they were much better aligned. Also note how the score improved over time. I scaled the effort for each of these, as well as the reward. Quite arbitrary, but you can plot it to get the following.
Those blacked-out blocks represent the idea that happened for that particular point. As you can see, sometimes a good idea can give you a lot of reward for little effort, and you are hunting for those during a comp. However, I think, if you do not work hard, you either need to have worked hard elsewhere (= experience) or you need supernatural inspiration or, typically, you work hard and that opens your mind for the inspiration you need to get those huge boosts on the reward scale.
I think, even with just a few very good ideas, you still need to work hard anyway, even if you get little reward for it, to win, you probably need a lot of incremental improvements.
Anyhow, this is my experience. I'd be interested to hear others' as well. If you e.g. had the right idea, subbed a single model and won a big competition, either you are telling a fish tale, or you entered a small local comp, or you are Einstein. I don't think you can bargain on this kind of approach, which is sad in a way, as it requires quite a commitment to a comp if you want to do well.