The description of the target value extent says it is in this range of 0-100 with increments of 10%.
## Ancillary data For each growth stage the damage types and their `extent` are provided with the extent given as percentage (%) in 10% increments.
Therefore, from a classification point of view, it seems like we will have 11 classes (0, 10, 20, ... 100). It might appear easier for the model to tackle the problem this way as we are only dealing with eleven values instead of a continuous range (regression) even though labels are discrete.
However, from what I have tried so far, a simple regression model works far better than a simple classification one. I would assume this might be because of the imbalanced nature of the dataset or the test data that contains values outside this train extent range (like in this example given to us).
ID extent
L1095F00009C01S00200Rp01978 56
L1095F00009C01S00200Rp09218 48
What do you think?
Because mse is sensitive to differences between numbers, while crossentropy is not.
Hey, It makes sense that the better choice here would be regression. If you take a simple example where your model is struggling on whether an image extent should be 40 or 50 with both having equal probabilities. How do you decide to go with either one ? An obvious solution would be to take the middle ground (being 45). Well, that's natively done with a regression approach.
Someone could go with a classification approach and then use the probabilities to output a single value. Meaning " np.sum(probabilities * labels)", with labels being [0, 10, ... , 100]
Haven't tried the latter, but could be a good compromise
Hi, thanks for your input on this. Looking at the extent column in the training dataset, it looks like it only contains 10% increments of percentages. So, every instance will have one specific label (either of these [0, 10, ..., 100]).
Therefore, when speaking about the model struggling between 40 and 50, may I ask you the case where you think this phenomenon might happen?
Yeah, I agree that every instance has 1 specific label. But considering the metric to be RMSE ( and not logloss or any other classification metric) and the fact that it is a gradual 10% increments, make it even even more punishing in case you predict the wrong extent.
I have seen a few instances where the image contains an obvious drought damage, but the extent is 0. In such cases, predicting anything higher than 0 will result in a relatively high penalty.
I also saw a few cases ( a lot actually:) where the extent is very low (let's say 30 ) but the model's estimation of the damage is about 70 (and frankly in some of those cases I believed the model to be right for the simple reason that in these images there wasn't a single healthy weed )
Oh Interesting! you are correct.
are you using all train images for training, or missing out some on the basis of analysis.
I'm using all images for now. But removing some images "might" help
you got the rmse score of 10, without any post processing using metadata given in dataset?
i think all submissions with scores >= 9 can be achieved without any use of the "damage type" column. I doubt that the other higher solutions didnt use the leak though.
take into consideration that you can use the leak to get a high score and hide your real score.
hmm, i too got rmse of 10 without damage type and most probably will get score of 9 too. But i dont't think there is much scope of pushing it further. It's a humble request to all participants who are top 10 to atleast tell us whether they used damage type in their solution in any way.
Well the final accepted top 10 solutions are based on the scores on the full test (currenlty this score is computed on only 20-30% of the test set, so a shake-up is expected).
If all these top 10 solutions select the submissions which use the leak, they will all be disqualified. Given the time invested everyone put in this competition, i prefer to not beleive that they would risk it all for nothing.
Anyway, I think its better if the organizers check the top 25-30 solutions instead of 10.
Agree with you on every point, but i am not expecting much shakeup, as cv and lb seems to be correlated and test set seems to be randomly picked from full dataset.
@Nayal_17 Yeah, my current score is without any postprocessing. Like @hasan_n is saying, you can achieve a score of 9.x without any pp or using the leak.
I wouldn't trust any score lower than that :)