Primary competition visual

Lacuna Solar Survey Challenge

Helping Madagascar
$5 000 USD
Completed (12 months ago)
Computer Vision
Prediction
729 joined
247 active
Starti
Feb 14, 25
Closei
Mar 23, 25
Reveali
Mar 24, 25
User avatar
Mohamed_abdelrazik
Deliberate mislabeling or inaccurate labeling leads to meaningless competition
Data · 24 Mar 2025, 05:11 · 10

First of all, I want to thank the @Zindi team for this competition. However, after the deadline and checking the private leaderboard, I discovered a serious mislabeling issue. A critical image (Image_ID: 'IDYEfIRfa') is labeled as having 12 pans, while in reality, it contains over 280. If the organizers adjusted the labeling by removing large adjacent panel spaces for some images, why wasn’t the same done for Image_ID: 'IDjNKiAGj,' which has a true count of 70+? These data quality issues completely undermine the competition’s goal, its meaningfulness, and the effort put in by participants for over a month.

Discussion 10 answers
User avatar
MICADEE
LAHASCOM

@Mohamed_abdelrazik I can relate to this. It's quite unfortunate.

24 Mar 2025, 06:32
Upvotes 1
User avatar
Muhamed_Tuo
Inveniam

Hey @Mohamed_abdelrazik,

Mislabeling was a big issue in this competition, as there was quite a handful of them. And it only takes 3 of these images to ruin your score. I totally understand the frustation.

But at the same time, no data is 100% clean. These are the parameters we have to compose with. The 1st Team seems to have done a great job at mitigating the effects

24 Mar 2025, 11:27
Upvotes 9

wow, thanks for sharing. I think it comes down to if it is intentional. I guess not? Labeling is a tedious task and i have seen cases of poor labeling quality.

24 Mar 2025, 13:04
Upvotes 3
User avatar
Mohamed_Elnageeb
University of khartoum

I totally agree. I had this concern during the competition and I assumed/hoped that maybe they split the data so that there are no mislabbeling in the test set as I noticed many images in the train set and fixed their labels manually example: (IDgpMa0V : wrong label : 152 panels while actual : 1064 , IDmfKSa wrong label : 235 actual : 461) This is a huge difference and can be a difference between top 50 and top 10.

This is really disappointing.

24 Mar 2025, 19:08
Upvotes 1
User avatar
3B

Because the metric of this competition is MAE, just one mislabeled test image (for example having 410 panels but only 10 panels are labeled) can change the score by 400/2214 = 0.18. Maybe the difference between the top 1 and top 10 is simply due to 1 mislabeled test image.

25 Mar 2025, 04:07
Upvotes 1

Interesting ! but from my point of view this is not a valid argument . I assumed that we have all trained our models on the same train images meaning that if your model is performing well it will make a good prediction even on the mislabeled images . For the example you have given let's say we have an image with 410 pannels but is labeled as having 10 pannels most of the well performing models will predict 300~ while the recorded ground truth value is 10 and this will affect quite similarly everyone score.

User avatar
3B

How can you be sure that the competitors are using the same training data? Most of my time is spent cleaning the data. I have many ways to use noise-free images in the training set, such as selecting samples where the absolute error between the ground truth and the prediction is less than 5 for training.

How can one knows 🧐that there are some test images that are mislabeled ?

@Mohamed_abdelrazik I was also wondering if it serious to say that there is a serious mislabeling issue because of I dunno 6 ou 7 mislabeled satelite images while we have more than 3000 images in the training dataset ?

25 Mar 2025, 10:24
Upvotes 1
User avatar
3B

Choose a submission, modify a pan or boil value of one image by adding 400, resubmit it and check how the score changes.

Okay I got it , thanks .