It seems that some teams have started using manual labeling. Starting from this competition, there is an additional clause: “All data manipulation must be done in code, manual manipulation via manual labeling or Excel will lead to disqualification.” If nothing else is specified, it seems this applies to both the train and test sets. One of the top competitors went through a terrible experience in the “CGIAR Root Volume Estimation Challenge” and I don’t want us to repeat that feeling. It seems getting a clear yes/no answer from Zindi is too difficult, so does anyone have any advice on manual labeling?
Thanks!
The general rule is All editing of data must be done in a notebook (i.e. not manually in Excel).
I believe there is an exception for the training data. Amy_Bray Comments "Both the public and private test sets have been manually reviewed and corrected to align with the cadastral plans. Minor name discrepancies (such as inclusion or omission of initials) remain, which is why WER was selected as the evaluation metric. The training data may still contain inconsistencies, but the test data have been cleaned to ensure reliable evaluation."
Without official Zindi confirmation for this specific training set edit, this remains a wildcard play.
It seems we need a lucky charm 🍀
That sounds fascinating! Data Slope Free is truly a fantastic source of research and access to new discoveries and effective solutions.