📚 AI in Focus: Possible bounding box conversi...

Amini Cocoa Contamination Challenge

Helping Ghana

$7 000 USD

Completed (~1 year ago)

Skills you will learn

Computer Vision

Object Detection

932 joined

255 active

Info Data Chat Leaderboard

Start

Feb 14, 25

May 11, 25

Reveal

May 12, 25

stefan027

Possible bounding box conversion errors in Train.csv

Data · 17 Apr 2025, 10:08 · 25

[EDIT TO ORIGINAL POST] See this explanation by @Muhamed_Tuo below.

---

ORIGINAL POST:

The labels (bounding boxes) for this challenge are found in two places:

As .txt files in the dataset/labels/train folder
In the Train.csv file

The starter notebook uses the .txt files directly, so it is fair to assume those are the ground truth labels. The format of the labels in the .txt files is (xcentre, ycentre, width, height) in relative coordinates. It appears as if these labels were then converted to the (xmin, ymin, xmax, ymax) format (in absolute coordinates) found in Train.csv.

When looking at the bounding boxes in Train.csv, our team noticed a high percentage of boxes that are outside of the image bounds. I then started to compare the bounding boxes from Train.csv to those in the .txt files. I think there is an error (or at least an inconsistency) in how the labels were converted. Specifically, it appears as if the image width and height were 'switched' when converting the relative coordinates to absolute coordinates for some images.

While we can use the .txt files to train our models, the concern is that Test.csv is used to evaluate submissions. If the same conversion error or inconsistency occured, then the ground truths for this challenge is incorrect.

My step-by-step analysis can be found in this notebook: https://www.kaggle.com/code/stefan87/amini-cocoa-contamination-bb-error

Discussion 25 answers

crossentropy

Federal university of Technology, Akure

Good day

Thank you very much for this

I personally noticed this while analysing the dataset coordinates in the Train.csv file, initially it may be smart to clip the values of ymax where ymax > image height and do the same for cases where xmax > image width, but after this, there's still so much to look out for

Considering swapping the width and height, i looked into this too and noticed that it wasn't so for all the box coordinates. Still been thinking of a way to properly approach it

I have personally not tried to train using the initially prepared dataset in YOLO fornat provided by Zindi in this competition, but as you said, what if there's also an issue with them?

I haven't compared them too

17 Apr 2025, 10:17

Upvotes 0

stefan027

For me the main problem is that we have two sources of labels that don't seem to match. At the very least we should know which source is correct and that the test data is also based on that

replied to crossentropy17 Apr 2025, 10:23

Upvotes 2

crossentropy

Federal university of Technology, Akure

Considering this What if they've kind of also been swapped in the evaluation dataset?

replied to stefan02717 Apr 2025, 10:26

Upvotes 0

stefan027

Exactly, that's what I'm wondering about too

replied to crossentropy17 Apr 2025, 10:28

Upvotes 0

CodeJoe

And here was me not focusing on the data at all.

17 Apr 2025, 10:21

Upvotes 1

stefan027

lol that's usually me!

replied to CodeJoe17 Apr 2025, 10:23

Upvotes 3

crossentropy

Federal university of Technology, Akure

You need to, my initial suspicion was that the x and y coordinates might have been swapped(i still kind of think so) I'll share an observation of mine later

replied to CodeJoe17 Apr 2025, 10:25

Upvotes 0

CodeJoe

Not only me then XD.

replied to stefan02717 Apr 2025, 10:43

Upvotes 0

crossentropy

Federal university of Technology, Akure

Take a quick look at this:

Please i am human as well and might have some misconceptions or ideas from my analysis so do well to correct me if any of my calculations or computations are wrong:

Firstly: Modifying the path and computing image resolution

📷

Secondly: Checking some cases of misappropriation for example xmin>image width just does not make sense!

At the same time, we can see that there is not a single case where xmin>xmax, same for ymin and ymax

📷

Thirdly: I have a brief assumption(theory?), nothing serious, i have only just reasoned it :)

* Now, except if i am wrong, mathematically:

* If xmax > image_width and we proceed to make it such that for every scenario where this is a thing, we clip the excess values to the width, say:

* if xmax>image_width:

* xmax == image_width

* else:

* xmax remains

The above is not expected to make it such that after computation, some values of xmin automatically become greater than xmax because we checked initially and for all cases, xmin is not greater than xmax

Lastly: After computing the base stats as i did earlier in the second state, we can see a change. Why? I have no actual facts, just assumptions:

My first assumption would be that the coordinates were swapped i.e xmin for ymin and similarly xmax for ymax

Secondly, similarly to the first assumption, only in a few cases was this an issue(swapping coordinates)nt

I also plotted some of these boxes, carefully comparing the box width and height with respect to the image resolution, it is quite evident that perhaps they are swapped or i am mistaken and these are just wrong case of boxes looking quite decieving!

I'd very much love to hear your opinions on this and if potentially there is anything wrong in my calculations and assumptions, I'd love to hear them too!

Thanks!

replied to CodeJoe17 Apr 2025, 13:19

Upvotes 0

Jaw22

Zindi africa

Hi, in preprocessing, perhaps change all images to the same size and standardize.

17 Apr 2025, 11:01

Upvotes 0

stefan027

If we resize the images, we also have to resize the bounding boxes proportionally, which means that we will still have the problem where the bounding boxes in Train.csv are different to those in the .txt label files.

Data cleaning and preprocessing are super important when we have bad or noisy data. By reducing the noise, we can train better models. Even if the test set is also noisy, better models should perform better.

However, if there is a 'systematic error' in the data (e.g., hight and width switched during a transformation, as I suspect), and if that systematic error is also present in the test set, then no amount of preprocessing will help to improve test set performance. In fact, data cleaning might actually hurt test set performance since because models that are able to replicate the systematic error will do well.

What I'm suspecting is that there is some sort of systematic labeling error here which is different from just bad/noisy data.

replied to Jaw2217 Apr 2025, 13:49

Upvotes 3

Jaw22

Zindi africa

@Stefan027, thank you for sharing.

17 Apr 2025, 11:02

Upvotes 2

Koleshjr

Multimedia university of kenya

@AJoel @amy_bray @Zindi

17 Apr 2025, 11:22

Upvotes 0

julianz

Thanks @Muhamed_Tuo for the workaround. It appears that there are still some mislabels -- just keen to confirm if anyone else has seen the same, and there isn't a bug in my code?

Eg: ID_skBkBf, ID_gTbZrd, ID_FHDhzz, ID_U0JAu1

replied to Koleshjr3 May 2025, 14:39

Upvotes 0

Muhamed_Tuo

Inveniam

Hey @Stefan027

Actually, both data sources are correct. The issue is coming from the exif metatata stored in the images, or more specifically the `orientation` metadata. And PIL by default, does not take into account that rotation information, as opposed to OpenCV.

To properly read those images with PIL, we need to:

from PIL import Image, ExifTags

image = Image.open(filepath)

for flag in ExifTags.TAGS.keys():
    if ExifTags.TAGS[flag]=='Orientation':
        break

# flag value will be 246

exif = image._getexif()
orientation_value = exif.get(flag, None)

if orientation_value == 3:
    image=image.rotate(180, expand=True)
elif orientation_value == 6:
    image=image.rotate(270, expand=True)
elif orientation_value == 8:
    image=image.rotate(90, expand=True)

With OpenCV, this is done by default, as I mentionned earlier.

So, if you want to mimic the same behaviour as with PIL, by ignoring the orientation, you can do:

image = cv2.cvtColor(
    cv2.imread(
        fp, 
        cv2.IMREAD_IGNORE_ORIENTATION | cv2.IMREAD_COLOR
    ),
    cv2.COLOR_BGR2RGB
)

# This will basically disregard the orientation flag and incorrectly load the image

@ZINDI I don't think there's anything to worry about the test annotations

18 Apr 2025, 08:35

Upvotes 5

crossentropy

Federal university of Technology, Akure

Thanks for this information

Really helpful

replied to Muhamed_Tuo18 Apr 2025, 08:42

Upvotes 1

Muhamed_Tuo

Inveniam

For those using YOLO, you most likely noticed a fair amount of warning being displayed at the start of the training. What's happening behind the hood is that YOLO is processing these images by loading, applying the correct orientation, and caching them

 train: WARNING ⚠️ /kaggle/working/cocoa_diseases/yolo_data/0/train/ID_AJD939.jpg: corrupt JPEG restored and saved

replied to Muhamed_Tuo18 Apr 2025, 08:50

Upvotes 3

stefan027

Hey @Muhamed_Tuo, this is really great! I just learned a lot about processing images! This completely explains the apparent 'switching' of height and width that I talked about in my post, and why it seemed to apply to some images but not others.

For those using the competion's starter notebook for EDA, you can add this function to the notebook (derived from @Muhamed_Tuo above):

from PIL import Image, ExifTags

def load_image(filepath):
    image = Image.open(filepath)
    
    for flag in ExifTags.TAGS.keys():
        if ExifTags.TAGS[flag]=='Orientation':         
            break
    orientation = flag
    exif = image._getexif()
    orientation_value = exif.get(orientation, None)
    
    if orientation_value == 3:
        image=image.rotate(180, expand=True)
    elif orientation_value == 6:
        image=image.rotate(270, expand=True)
    elif orientation_value == 8:
        image=image.rotate(90, expand=True)
    return image

Then in the plot_image_with_boxes function replace this line:

image = np.array(Image.open(str(image_path)))

with this:

image = np.array(load_image(str(image_path)))

replied to Muhamed_Tuo18 Apr 2025, 09:30

Upvotes 4

100i

Ghana Health Service

Thanks for the clarification @Muhamed_Tuo. Very helpful feedback

replied to Muhamed_Tuo18 Apr 2025, 19:29

Upvotes 1

100i

Ghana Health Service

Great work @Stefan027. Grateful to you for bringing clarity to this issue.

replied to stefan02718 Apr 2025, 19:32

Upvotes 1

nymfree

Thanks for this.

replied to Muhamed_Tuo19 Apr 2025, 12:30

Upvotes 0

Kamenialexnea

Ecole nationale superieure polytechnique yaounde

Actually

cv2.cvtColor(

cv2.imread(filepath, cv2.IMREAD_IGNORE_ORIENTATION | cv2.IMREAD_COLOR),

cv2.COLOR_BGR2RGB,

)

Is not working, only orientation with PIL can solve this issue

replied to Muhamed_Tuo24 Apr 2025, 14:58

Upvotes 0

Koleshjr

Multimedia university of kenya

Of course that does not work since you are literally ignoring the orientation . Read the comment in that code block :)

replied to Kamenialexnea24 Apr 2025, 15:00

Upvotes 0

Kamenialexnea

Ecole nationale superieure polytechnique yaounde

My badddd, sorry missed the comment

replied to Koleshjr24 Apr 2025, 15:19

Upvotes 0

Koleshjr

Multimedia university of kenya

I think the issue was solved in the above thread?

22 Apr 2025, 15:46

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status