Dear Zindians,
There has been a miscommunication in the requirements for this challenge, and so we are resetting the leaderboard to ensure a fair contest. Thank you for you contributions to making sure this is addressed, and please accept our apologies for the disruption to the competition. We know that this change disregards many hours of hard work.
You are not allowed to use feature "type of damage" at all in your solution. We recommend dropping the column when you read the file in. During code review, if a solution uses "type of damage" the team/user will be disqualified.
All submissions before 26 September 2023 14:00 GMT will not count on the leaderboard, but will remain in your Submissions tab. If you have a past, valid (i.e. does not use the "type of damage" feature) submission you would like to be considered, you will need to download that submission and resubmit it.
The total number of submissions and daily submissions count have been increased.
Thank you for your patience and understanding, and may the best model win!
Hello,
I think our last submission has not been reset, which will make the leaderboard look very unbalanced since it uses the type of damage feature
Hello,
So, can we only use the images? Or can competitors also use the remaining columns: "growth_stage" and "season"?
Although I appreciate the effort in making this right, the data have already been leaked, and there is no guarantee that ill-intentioned competitors won't use that information in following submissions. At this point I don't think there's anything the organizers could do (apart from creating a whole new dataset) to adress this issue, so I'd suggest to competitors that they largely ignore their leaderboard positions as a "tuning" feature and focus only on their own validation results.
I'll be resigning from this competition for other personal reasons, but wish the best of luck to all still working on their solutions (:
Hello amy,
In my understanding, there was no label leakage. The objective of this competition is to predict the effect of drought (DR) on crops, even though the crops can also encounter other types of damage, such as disease, flood... but these are set to zero in both train and test provided by Zindi . Therefore, it would make more sense if we were provided with only DR images in both the train and test datasets because it make nosense to test my model trained on DR to other types of damage.Or simply change the objective(therefore evaluation metric) of the competition to both : predicting Type of Damage and it's extent.
BR,
I agree that this leaderboard kind of defeats the purpose of those competitions. We will need to move in the dark for a month and pray that the people above don't use the leak and that there is still room for improvement ?
Agree. it should be just predicte the damage's extent for DR images.
I see some people still use the "type of damage" column because their scores I see don't change. Resetting LB like that doesn't make any sense :)
Unfortunatlly, the issue will remain if they do not change the images ids in away make it difficult to rematch with old ids. I don't know why people still using the data leak, just to confuse the other competitors :)
Even then it will be easy to rematch the images. each image has a size and metadata associated, simply take the mean of each channel and you will be able to find each match in an hour.
I think either reverting to the rules there was yesterday or removing the non-DR images is the right call. I really hope they do that tbh.
This is great. Thanks for the quick response.
Should we use the 'type of damage' column in the training data? I thought the issue was that it was in the test dataset.
It's mentioned to not use "damage" in your solution at all, so we can't use it even in training.
Why are people still using it??
In a standard dataset. There are many metadata for the train set that are not available during testing. This metadata could be used efficiently for formulating a reasonable CV strategy.
I want to believe this is the case for the damage column now in the train set. If not, the only reasonable thing to do is to have a completely different dataset solely for the DR images provided...
i think we even can't use damage column in train set also in any way.