📊 Data Talk: Feedback on Evaluation

Barbados Lands and Surveys Plot Automation Challenge

Helping Barbados

$10 000 USD

Completed (6 months ago)

Skills you will learn

Computer Vision

Geospatial Data

Optical Character Recognition

903 joined

179 active

Info Data Chat Leaderboard

Start

Aug 01, 25

Oct 19, 25

Reveal

Oct 20, 25

Amy_Bray

Zindi

Feedback on Evaluation

Data · 17 Oct 2025, 15:25 · 10

Both the public and private test sets have been manually reviewed and corrected to align with the cadastral plans. Minor name discrepancies (such as inclusion or omission of initials) remain, which is why WER was selected as the evaluation metric. The training data may still contain inconsistencies, but the test data have been cleaned to ensure reliable evaluation.

Discussion 10 answers

Mohamed_abdelrazik

first of all thank you @amy for you response and effort may i ask when the test data have been cleaned and if this reflected to current LB or no ? Specially for Polygon mapping and here are some examples of wrong polygons in train and there is more

BYLS-107,BYLS-099,BYLS-060

17 Oct 2025, 15:28

Upvotes 0

meganomaly

Zindi

@Mohamed_abdelrazik This is reflected in the current leaderboard.

replied to Mohamed_abdelrazik17 Oct 2025, 18:48

Upvotes 0

@Amy_Bray

So for the noisy labels in the training set, are we allowed to manually correct them?

There are only 2 days left until the competition ends, could you please kindly give us an answer. This issue has been mentioned several times before but hasn’t received any response, and we don’t want to risk submitting a solution that violates the rules.

For example, in the Lacuna Solar Survey Challenge, the top-1 team used a solution that included manual labeling on the training set, achieved a significantly higher score than teams that didn’t use it, and this solution was considered valid.

Thank you very much.

17 Oct 2025, 15:39

Upvotes 3

meganomaly

Zindi

@3B solutions must use the training set provided i.e. no manual corrections

replied to 3B17 Oct 2025, 18:51

Upvotes 2

Koleshjr

Multimedia university of kenya

@meganomaly @Amy_Bray I honestly don’t understand how we’re expected to train models using the provided Zindi polygon annotations when they clearly don’t align with the images(they are geo cordinates not pixel cordinates). If that’s the case, how are we supposed to avoid making manual corrections in the training set to have correct polygons? Right now, it feels like a classic case of garbage in, garbage out (GIGO). Has the Zindi team actually tried visualizing the polygons over the images in the train set? Because doing so would really help highlight the issue we’re raising here. Kindly guide us

or maybe you meant manual corrections on the other columns ?

replied to meganomaly18 Oct 2025, 08:16

Upvotes 0

Joseph_gitau

African center for data science and analytics

@Koleshjr This is true and I believe other can confirm as well. We can forgo the errors present in the OCR part but for segmentation it's near to impossible and since the start of the competition we only got a function for IOU masks evaluation but no clarification on the polygons.

No Transformation tested works for me at all.

replied to Koleshjr18 Oct 2025, 08:45

Upvotes 0

It seems that the organizers only care about the shape of the land plot rather than their exact coordinates on the images, which explains why they use IoU instead of distance metrics. For example, square or rectangular plots are likely considered more price than trapezoidal or irregularly shaped ones.

And as another participant mentioned, they organizer want to reproduce the solution on a larger dataset, so manually annotate a large amount of data would be very time-consuming. You can absolutely build solutions that achieve IoU = 0.95 without using any manual labeling

replied to Koleshjr18 Oct 2025, 09:12

Upvotes 1

Koleshjr

Multimedia university of kenya

"And as another participant mentioned, they organizer to reproduce the solution on a larger dataset, so manually annotate a large amount of data would be very time-consuming."

That’s true, but it doesn’t negate the fact that some manual labeling is still necessary. You don’t have to label the entire dataset just a small portion to get the model started. From there, you can leverage semi-supervised learning to scale efficiently.

replied to 3B18 Oct 2025, 09:24

Upvotes 4

Juliuss

Freelance

Hello @meganomaly/ @Amy_Bray,

Regarding the instruction 'must use the training set provided i.e. no manual corrections', could you please specify if 'manual corrections' applies to:

a) Adjustments to the CSV values in the train/test sets, or

b) Manual adjustments to the annotations?

If it's the latter, revising our solutions now (2 days before close) would be extremely challenging given the weeks of significant effort, cost, and resources already dedicated to our current annotation strategy for building a precise segmentation model.

Regards

replied to Koleshjr18 Oct 2025, 09:36

Upvotes 4

Joseph_gitau

African center for data science and analytics

On point b this is something that I had asked for clarrification a month ago on 21st Sep and it's sad that the response is only being shared 2 days to competition end. It's not possible to even revise the solution is a day.

replied to 3B18 Oct 2025, 10:07

Upvotes 3

Join the largest network for
data scientists and AI builders

About FAQs

Status