Both the public and private test sets have been manually reviewed and corrected to align with the cadastral plans. Minor name discrepancies (such as inclusion or omission of initials) remain, which is why WER was selected as the evaluation metric. The training data may still contain inconsistencies, but the test data have been cleaned to ensure reliable evaluation.
first of all thank you @amy for you response and effort may i ask when the test data have been cleaned and if this reflected to current LB or no ? Specially for Polygon mapping and here are some examples of wrong polygons in train and there is more
BYLS-107,BYLS-099,BYLS-060
@Mohamed_abdelrazik This is reflected in the current leaderboard.
@Amy_Bray
So for the noisy labels in the training set, are we allowed to manually correct them?
There are only 2 days left until the competition ends, could you please kindly give us an answer. This issue has been mentioned several times before but hasn’t received any response, and we don’t want to risk submitting a solution that violates the rules.
For example, in the Lacuna Solar Survey Challenge, the top-1 team used a solution that included manual labeling on the training set, achieved a significantly higher score than teams that didn’t use it, and this solution was considered valid.
Thank you very much.
@3B solutions must use the training set provided i.e. no manual corrections
@meganomaly @Amy_Bray I honestly don’t understand how we’re expected to train models using the provided Zindi polygon annotations when they clearly don’t align with the images(they are geo cordinates not pixel cordinates). If that’s the case, how are we supposed to avoid making manual corrections in the training set to have correct polygons? Right now, it feels like a classic case of garbage in, garbage out (GIGO). Has the Zindi team actually tried visualizing the polygons over the images in the train set? Because doing so would really help highlight the issue we’re raising here. Kindly guide us
or maybe you meant manual corrections on the other columns ?
@Koleshjr This is true and I believe other can confirm as well. We can forgo the errors present in the OCR part but for segmentation it's near to impossible and since the start of the competition we only got a function for IOU masks evaluation but no clarification on the polygons.
No Transformation tested works for me at all.
It seems that the organizers only care about the shape of the land plot rather than their exact coordinates on the images, which explains why they use IoU instead of distance metrics. For example, square or rectangular plots are likely considered more price than trapezoidal or irregularly shaped ones.
And as another participant mentioned, they organizer want to reproduce the solution on a larger dataset, so manually annotate a large amount of data would be very time-consuming. You can absolutely build solutions that achieve IoU = 0.95 without using any manual labeling
"And as another participant mentioned, they organizer to reproduce the solution on a larger dataset, so manually annotate a large amount of data would be very time-consuming."
That’s true, but it doesn’t negate the fact that some manual labeling is still necessary. You don’t have to label the entire dataset just a small portion to get the model started. From there, you can leverage semi-supervised learning to scale efficiently.
Hello @meganomaly/ @Amy_Bray,
Regarding the instruction 'must use the training set provided i.e. no manual corrections', could you please specify if 'manual corrections' applies to:
a) Adjustments to the CSV values in the train/test sets, or
b) Manual adjustments to the annotations?
If it's the latter, revising our solutions now (2 days before close) would be extremely challenging given the weeks of significant effort, cost, and resources already dedicated to our current annotation strategy for building a precise segmentation model.
Regards
On point b this is something that I had asked for clarrification a month ago on 21st Sep and it's sad that the response is only being shared 2 days to competition end. It's not possible to even revise the solution is a day.