Hello everyone,
Please accept the apologies of the whole Zindi team for the slow response time and timeline confusion in this challenge.
Following recent discussions regarding data from the IDRISI repo, we've taken some time to review the potential for a leak in this challenge.
Please note that data for this competition was prepared using only the json files in the repo, with the test set curated by the team in charge of the repository. It is possible that some of the test set data may have been sourced from some of the other resources in the repo. However, please bear in mind that access to the repository is only granted for learning purposes. The data in the repo does not form part of the approved datasets usable in this challenge.
As with all Zindi challenges, your submissions in this challenge are subject to the challenge rules and regulations; in this case, the following rule is specifically relevant:
Zindi is committed to providing solutions of value to our clients and partners. To this end, we reserve the right to disqualify your submission on the grounds of usability or value. This includes but is not limited to the use of data leaks or any other practices that we deem to compromise the inherent value of your solution.
We encourage you to try as best as you can to adhere to and incorporate the challenge rules as you build out your solutions. As always, we will conducct a detailed review of top-performing solutions and any submission found to be contravening these rules will be disqualified.
As raised by several participants, the timelines are currently contradictory. The official close date of this challenge is 13 October, and the platform has been updated to reflect that.
We wish you the best of luck for the remainder of the challenge.
Happy coding!
Thanks for clarifying the close date, now this gives me more time to tackle the competition challange
Hmm, I already started to write my solution... I guess I have to wait for another few weeks then.
So correct me if I'm wrong but I assume that Train_1.csv and Test.csv from Data are the only approved datasets usable in this challenge, am I right?
@davidreifferscheidt Yeah.. You're very right. We need to know if only Train_1.csv and Test,csv are the only approved datasets usable for this challenge. Because this said repository data is the main cause of this leak.
The dataset in IDRISI that has the leak is the one present in Gold Time based sub-directory while the data approved for this challenge is the both the csvs and the json data present in Gold Random as explained in the Data Info page and I quote:
"""The data is available in JSONL format in the GitHub repository. (Full example). The full training, dev and test files, can be downloaded from here: https://github.com/rsuwaileh/IDRISI/blob/main/LMR/data/EN/gold-random-json/"""
So the contentious data here is the one in the time based one which has the leak and I think @ZINDI can clarify on this too
@Koleshjr Okay. In that case, they should reaffirm this by pointing us to the datasets that must be used for this challenge inside this repository then.
I second this @MICADEE
It might even be simpler to forbid all use of the repository, to avoid any further confusions. Even though, a lot of us - me included - relied on the json data.
@Zindi @Amy_Bray
Same, I relied on the JSON data and I think forbidding it will be simpler to avoid confusions.
Hello, you are correct that the only datasets usable in this challenge are Train_1.csv and Test.csv from the Data page. These correspond to the data in gold-randon-json, but we recommend not using datasets from the repo at all. The Data page has been updated to reflect this. Our apologies once again for any confusion caused.
Thanks !!!
Thank you for your response. Regarding the leaderboard, I assume it is no longer relevant to the competition and does not reflect the current standings ?
@Zindi Thank you for the clarification. There is also a question around the certification credits. Will they also only be presented after the close date or after the winner is announced?
Hi @MakalaMabotja, we will share certification credits with those who have made a valid submission at the clsoing date of the challenge.
Thank you for the response
@Zindi, I think the leaderboard needs to be reset. Comment please!
Yes, I agree, it should. Because at present, we have "no real idea" of what the actual peak performance is.
@Zindi
This makes things a bit more understandable. I also want to clarify this from @ZINDI :
From the data page of the competion, we have the following statement: ""The data is available in JSONL format in the GitHub repository. (Full example). The full training, dev and test files, can be downloaded from here: https://github.com/rsuwaileh/IDRISI/blob/main/LMR/data/EN/gold-random-json/ ... The datasets have also been provided as CSV files if you would prefer to use CSV files. The choice is yours."".
Thus we are allowed to use only the EN-Gold-Random-BILOU-JSON dataset from IDRISI or the csv file provided right?
However, from this clarification, it is said that ""Following recent discussions regarding data from the IDRISI repo, ... The data in the repo does not form part of the approved datasets usable in this challenge."".
Does this mean that we are now only to use CSV file provided or the exception on the IDRISI dataset is that of the EN-Gold-Random-BILOU-JSON which we have been permitted to use in the data page?
This is still quite confusing to me.
I think to every Zindi challenge, there's always that private test data. We all are scored based on the part of that private test data which forms our public score on which our generated models are tested. On completion of the challenge, our models are tested on the larger set private test data.
I think this should be it. Either the CSV files to build your model or the github link announced.
Hi @Ezino, we have updated the Data page to eliminate the confusion around this point. You should only use the Test.csv and Train_1.csv on the Data page, and no data from the repository can be used in your models.
@Zindi,what about the reset of the leaderboard?
What about the reset of the lb please ? @ZINDI
@ZINDI have the certification credits been shared with the participants?