Hello Zindians,
I joined this challenge early this week and I started creating a baseline where my local validation wasn't that bad compared to what I've got in the public leaderboard.
I started investigating about this because I have seen too many threads talking about wrong metric calculation in Zindi backend until I found this thread https://zindi.africa/competitions/turtle-recall-conservation-challenge/discussions/9474 where @picekl asked a question "would you mind sharing the script that handles the mAP calculation at the backend?" and @stigvp response was "Hi Lukas, please see the latest version of the tutorial, which now includes such an example.".
The problem is that the metric is wrong implemented in the starter notebook and here's a notebook that I have made explaining the reasons in details. https://colab.research.google.com/drive/1de_tHzS-rasM1sV0BmBWgjudYxFlKVxv?usp=sharing
The question now is "Is Zindi using the same function to calculate our scores?" According to @stigvp answer in @picekl's thread . "YES".
Could you confirm this also?
Cordially,
Hello Fadhloun,
Thanks for taking your time to debug this issue, and for the simplified MAP@5 formula.
In some of the unit tests for the starter notebook implementation, I noticed you're passing a list of actual labels to the apk function, e.g. actual = ["E"] * len(predicted)
If we only pass the label e.g. actual = "E", both functions seem to be identical.
Could you confirm?
Yes I passed same unittests and it worked perfectly. I wonder now if they are passing true labels as a list or not. Maybe that's the issue. Let's wait and see
I am afraid the metric calculation on this competition actually works just fine.
I had a look at the most similar images found on the test set (for a model performing pretty well on train) and to be honest they are far from perfect and the scores on LB make sense (at least for me).
My feeling is that the images in train (+extra) and test differ in time, which makes sense, and models have a hard time generalizing in time.
This sounds like a pretty hard challenge to me.
https://zindi.africa/competitions/turtle-recall-conservation-challenge/discussions/10167
I'm afraid if there is a problem public board is not representative. Top score participants can have a lot of errors in their predictions. I compared top-1 of my models prediction with GT, turtles mostly are the same.
This is the main reason why this task is difficult. It is hardly about the metric and more about the differences between the train and test sets.
I noticed this earlier and already asked those @Zindi and @Deepmind about the key differences between both sets . While they responded that there wasn't any major difference , I have reason to believe that isn't the case.
To validate this, I tested a good model for image location classification on a subset of the train set (val set) and on the test set. The model performed quite well on the val set but very poorly on the test set, just as the case has been in this competition.
While I am not sure what the difference actually is, mabe of time like you suggested, I believe the difficulty in this task due to some sort of disparity between the train and test sets and not neccesarily the metric.
I don't agree with that since I checked a lot of my top-1 predictions very carefully. Same did @Fnoa. The predictions are quite good. But in general it doesn't matter. Organizers and "experts" from Zindi just ignoring all our discussions here. @amyflorida626
I agree with you. And there are only 6 days left for this competition but we haven't heard a logical response from @zindi team yet.
If there's a real issue, this challenge should be extended and I think zindi will agree with that as they are seeking best solutions for their customers and during all this period there were no real competition.