Hi all,
There were a lot of disscusions about low public scores despite high validation, and there's my two cents about this phenomenon.
I assume that some mess happens with image_id column in test.csv. The point is that in test data there is no any correlation between provided image_location and real one. However, train data is labeled almost perfectly. You can see some examples here: https://ibb.co/album/QjWZvV. Also my code for generation: https://colab.research.google.com/drive/1T4jHIcHNvIZgDa--FpM9i0SqvaWDkrR6?usp=sharing.
Such a significant difference between quontity of wrong location labels in test and train defenetly not normal and should be fixed somehow.
Wrong match between "image_id" and labels can explain both location issue and low public scores.
Hopefully the issue will be found and resolved and we will have a great contest!
Good point!
Don't have much hope that they will solve it. They don't even respond.