Thanks to the organizers of the competition and to all the participants who shared their experiences in the discussions. I've decided to write a brief report on my work, even though I didn't win, to share my experience and, of course, to learn from the experiences of other participants. It doesn't matter whether the solution was good; what's most important is the knowledge gained, understanding what worked well and what didn't. I would love to read about how you approached this challenge!
And now about my solution.
The significant gaps between labels influenced my decision to opt for classification rather than regression. This choice offered a structured approach to handling the discreteness, making the process more straightforward and, hopefully, more deterministic.
I opted for a metric learning approach over traditional classification with classical loss functions. The reason for this choice is that our images had nearly identical scene parameters. Furthermore, dealing with noisy labels added an extra layer of complexity, making traditional training methods challenging to work with. In particular, I used a Triplet loss for this purpose.
A few weeks ago, when I began, my initial inclination was to exclude all the "non-drought" samples and train solely on "good" and "drought" data. My subsequent idea was to employ a model that could predict the type of damage (using a one-vs-all approach: drought&good vs. all other types of damage). However, upon reviewing this discussion, I had to abandon this idea, as it remained unclear whether the "damage" data column could even be used for training (and, of course, it could not be utilized during submissions).
I had no choice but to attempt training the model on the entire dataset. Thanks to the triplet loss, miners, and samplers, I didn't need to be concerned about class imbalance. Unfortunately, the local validation results turned out to be quite poor: rmse ~16.4 and f1 ~0.73. When reviewing the incorrect predictions, it became evident that a significant portion of them were connected to the growth stages. For example, visually, it might appear as if the plant had been affected by drought, but in reality, it was simply a "mature" yellowing plant.
My final setup involved using a distinct extent-predicting model for each of the growth stages, and the local results were as follows: rmse ~15.1 and f1 0.79 (public score 14.89431900).
I had been eager to explore the Open Metric Learning library for quite some time, and this competition turned out to be the perfect opportunity to do so. (Did it? :D)
This library is truly exceptional. It enables you to start training right away, offers numerous examples, and, most importantly, is actively maintained!
I used a meta's pretrained model vits16_dino, and trained one for each of growth stages. I used the same setup for each of the models in terms of loss functions (triplet loss or soft triplet loss), AdaM as optimizer with grid-searched lr 1e-5...1e-4, balanced sampler (that's why I didn't care about class disbalance!). The library allows you to select from different miners (e.g. hard miner, which selects only the hardest learning examples - when the negative distance is minimal and the positive distance is maximal, but it wasn't an optimal solution for my use case).
Each model's output is a tensor of size 384 (embedding). After training the models, I proceed with the following steps:
If we may not use the "damage" column, I could try to:
1. Predict an extent distribution using Kullback–Leibler loss (soft labels like here)
2. Train models from scratch or train on the whole dataset or create folds
Nice writeup!
I was also playing around a little bit with Metric-Learning. But I used a regression model and then I just extract the embedding for each image (so using a layer before regression one). Just single model.
But I haven't compute mean embedding, just use KNN and mean a predictions of top 15 neighbors (number 15 found in validation set). And these give me slightly better score overall. My intuition was that this approach will take into consideration noisy labels from dataset.