📜 Challenge Chat: Deliberate mislabeling or inac...

Lacuna Solar Survey Challenge

Helping Madagascar

$5 000 USD

Completed (~1 year ago)

Skills you will learn

Computer Vision

Prediction

734 joined

247 active

Info Data Chat Leaderboard

Start

Feb 14, 25

Mar 23, 25

Reveal

Mar 24, 25

Mohamed_abdelrazik

Deliberate mislabeling or inaccurate labeling leads to meaningless competition

Data · 24 Mar 2025, 05:11 · 10

First of all, I want to thank the @Zindi team for this competition. However, after the deadline and checking the private leaderboard, I discovered a serious mislabeling issue. A critical image (Image_ID: 'IDYEfIRfa') is labeled as having 12 pans, while in reality, it contains over 280. If the organizers adjusted the labeling by removing large adjacent panel spaces for some images, why wasn’t the same done for Image_ID: 'IDjNKiAGj,' which has a true count of 70+? These data quality issues completely undermine the competition’s goal, its meaningfulness, and the effort put in by participants for over a month.

Discussion 10 answers

MICADEE

LAHASCOM (Freelance)

@Mohamed_abdelrazik I can relate to this. It's quite unfortunate.

24 Mar 2025, 06:32

Upvotes 1

Muhamed_Tuo

Inveniam

Hey @Mohamed_abdelrazik,

Mislabeling was a big issue in this competition, as there was quite a handful of them. And it only takes 3 of these images to ruin your score. I totally understand the frustation.

But at the same time, no data is 100% clean. These are the parameters we have to compose with. The 1st Team seems to have done a great job at mitigating the effects

24 Mar 2025, 11:27

Upvotes 9

snow

wow, thanks for sharing. I think it comes down to if it is intentional. I guess not? Labeling is a tedious task and i have seen cases of poor labeling quality.

24 Mar 2025, 13:04

Upvotes 3

Mohamed_Elnageeb

University of khartoum

I totally agree. I had this concern during the competition and I assumed/hoped that maybe they split the data so that there are no mislabbeling in the test set as I noticed many images in the train set and fixed their labels manually example: (IDgpMa0V : wrong label : 152 panels while actual : 1064 , IDmfKSa wrong label : 235 actual : 461) This is a huge difference and can be a difference between top 50 and top 10.

This is really disappointing.

24 Mar 2025, 19:08

Upvotes 1

Because the metric of this competition is MAE, just one mislabeled test image (for example having 410 panels but only 10 panels are labeled) can change the score by 400/2214 = 0.18. Maybe the difference between the top 1 and top 10 is simply due to 1 mislabeled test image.

25 Mar 2025, 04:07

Upvotes 1

Nyabisegera

Interesting ! but from my point of view this is not a valid argument . I assumed that we have all trained our models on the same train images meaning that if your model is performing well it will make a good prediction even on the mislabeled images . For the example you have given let's say we have an image with 410 pannels but is labeled as having 10 pannels most of the well performing models will predict 300~ while the recorded ground truth value is 10 and this will affect quite similarly everyone score.

replied to 3B25 Mar 2025, 10:12

Upvotes 0

How can you be sure that the competitors are using the same training data? Most of my time is spent cleaning the data. I have many ways to use noise-free images in the training set, such as selecting samples where the absolute error between the ground truth and the prediction is less than 5 for training.

replied to Nyabisegera25 Mar 2025, 10:22

Upvotes 0

Nyabisegera

How can one knows 🧐that there are some test images that are mislabeled ?

@Mohamed_abdelrazik I was also wondering if it serious to say that there is a serious mislabeling issue because of I dunno 6 ou 7 mislabeled satelite images while we have more than 3000 images in the training dataset ?

25 Mar 2025, 10:24

Upvotes 1

Choose a submission, modify a pan or boil value of one image by adding 400, resubmit it and check how the score changes.

replied to Nyabisegera25 Mar 2025, 10:30

Upvotes 0

Nyabisegera

Okay I got it , thanks .

replied to 3B25 Mar 2025, 11:04

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status