Mean Absolute Error would suite the counting goal better.
Some of the training images are clearly mislabeled (35 vs 0 trees) If there are similar large labeling errors in the test set it would ruin the leaderboard with the current metric.
How do you make sure the test set has better labeling quality?
@zindi, please address this. There are label errors in the training.... hope the test set is free of such errors???
Just to make sure label errors are expected in every labeling process. Switching the metric to MAE would reduce the impact of such errors.
Another option would be to double check the test set labels by the organizers but that could take a few hours of manual work. And as I said labeling errors are always expected :)
An example of very bad labelling :
Can we correct these errors manually or how are we supposed to correct them?
According to this https://zindi.africa/competitions/digital-africa-plantation-counting-challenge/discussions/15368 we're not allowed to do so .But the problem is if the test data contains such large labeling errors it will ruin the leaderboard.
PS : Based on my experiments at least the public LB contains such errors!