I'm sorry for the inconvenience but in this competition there is something that is not right and maybe together we can solve it.
I have spent time creating a good baseline for this competition, however it is frustrating to see how the scores problem is never resolved.
I will tell you the methodology that I followed to reach these conclusions (if you agree I suppose that zindi will do something about it)
Methodology
To make sure there is something wrong in the leaderboard I have manually analyzed a big part of my submission (btw, I got a local validation of 0.7).
When I say manually I mean that I have looked at the predictions one by one, checking if the prediction was really true.
I have analyzed a total of 150 randomly chosen images from the test set.
Out of 150 I have a total of 102 well classified
According to @amyflorida626, the public leaderboard contains 30% of the images and the private one 70%. This means that when I choose an image at random from the 490 that we have in the submission, I have a probability of 3/10 that it is an image contained in the public leaderboard.
Doing the math of my 150 manually analyzed images around 45 should correspond to the public leaderboard (more or less).
Of these 45 it should have around 30 (in accordance with the relation 102/150 that I checked manually) well classified and with 30 well classified it should have a minimum map of 0.2 (only taking into account those that I have analyzed manually, if we extrapolate it should be around 0.7).
However, the highest map that has been achieved so far is 0.0752.
With what has been said so far I think it is obvious that there is something wrong.
Even so, to check this again I put an invented name to the 150 images and made the submission again to observe changes.
There was no change in the score. (Another reason to think there's something wrong)
Here I attach ten images (from my 150 manually checked) and their correct labels from the test set as an example:
1) ID_6NEDKOYZ - t_id_4ZfTUmwL
2) ID_57QZ4S9N - t_id_Kf73l69A
3) ID_OCGGJS5X - t_id_YjXYTCGC
4) ID_2E011NB0 - t_id_dVQ4x3wz
5) ID_OY5D7O3A - t_id_hRzOoJ2t
6) ID_2FKXUZ69 - t_id_AMnriNb5
7) ID_DFT8JWF0 - t_id_8b8sprYe
8) ID_8XLIWU92 - t_id_15bo4NKD
9) ID_EDTGD7SN - t_id_IlO9BOKc
10) ID_A9QSMNCB - t_id_VP2NW7aV
@amyflorida recently checked the leaderboard scoring function and it seems to be working fine, so my two hypotheses are as follows:
1) The first and perhaps most likely hypothesis is that there is some kind of bug in the zindi backend tags and therefore they are checking our predictions against erroneous data.
2)Another hypothesis that occurs to me is that instead of taking 30% for the public leaderboard, they are taking such a small percentage that it is not representative.
I hope I have explained myself well; If you have any hypotheses, I would be delighted to hear it.
This is my last attempt, I hope someone listens to me and we solve this for the good of the competition
Thank you for your time!
Hi. I've created many different models and always get a very low score.
I agree with you that seems that something is wrong.
One doubt: how can you know that those turtles has the correct labels (e.g. you said that ID_6NEDKOYZ - t_id_4ZfTUmwL and ID_57QZ4S9N - t_id_Kf73l69A) ?
Hi @igorkf, good question!
I have a folder with the 490 images that we have to predict.
On the other hand I have 100 folders (one for each turtle) with all their photos.
When I have my submission I only have to visualize the image of the test set with the images of the turtle that my model has predicted.
It's usually pretty straightforward to see
I don't know about the labels, but the locations (left, right, top) seem to be wrong for several of the test pics...
@amyflorida626, could you also check sceneries when submission both has and has no right answer in top-5? It seems like backend problem can potentially lead to random results on a private board in the end. Which makes the competition just a wasting of time for participants.
That's one of my main concerns too, it looks like we're going to be randomly tested...
@Fnoa
It looks like the second of your hypothesis is right. I tried to do a random shuffle of some part my submission (from 20-60% affected) and the result can be very different. Sometimes the shuffle change the score a little, sometimes it doesn't!
Thanks for answering, you got the same results as I did when I experimented. My only concern at this point is if zindi will do something about this since they never respond to this type of posts. I even wrote direct messages to @amyflorida626 but didn't get an answer either. I feel that it doesn't really make sense to continue spending time in this competition if nothing is done about it