GIZ NLP Agricultural Keyword Spotter
$7,000 USD
Classify audio utterances in Luganda and English from Uganda
476 data scientists enrolled, 139 on the leaderboard
11 September—29 November
Ends in 30 days
Rules clarification - Additional datasets
published 10 Oct 2020, 12:15
edited 12 minutes later

the rules say "You may use only the datasets provided for this competition" but participants can use pretrained weights e.g "Imagenet". If I first train my model on not allowed (public) dataset and use the result as a checkpoint, will it be a rule violation?

I am sorry for my ignorance. This is my first competition on this platform

I was also asking myself the same question but i guess the answer is: yes it will be a rule violation .

The rules aren't that clear! I think that training on other public audio datasets just to get a pretrained model for this task is okay because pretrained weights like imagenet are used. But, merging a public dataset with this dataset to train your final model results in disqualification. Am I correct ??

But may be you should share the pretrained weights with evereyone. Like open source it.

Zindi needs to confirm. I am not sure.

With image classification, models pre-trained on imagenet are somewhat of a standard, and often built into popular libraries. For audio there isn't an exact equivalent.

Obviously, we don't want a situation where someone wins because of access to something the other participants didn't have. So in general sourcing, an extra dataset (even a public one) and using that to get an edge would be a potential issue. But if you have a dataset (or even better a pretrained model) in mind that you think would help all entrants, and it's public+free, let us know and we can see about adding it as an allowed source.