With image classification, models pre-trained on imagenet are somewhat of a standard, and often built into popular libraries. For audio there isn't an exact equivalent.
Obviously, we don't want a situation where someone wins because of access to something the other participants didn't have. So in general sourcing, an extra dataset (even a public one) and using that to get an edge would be a potential issue. But if you have a dataset (or even better a pretrained model) in mind that you think would help all entrants, and it's public+free, let us know and we can see about adding it as an allowed source.
Yes you may.
With image classification, models pre-trained on imagenet are somewhat of a standard, and often built into popular libraries. For audio there isn't an exact equivalent.
Obviously, we don't want a situation where someone wins because of access to something the other participants didn't have. So in general sourcing, an extra dataset (even a public one) and using that to get an edge would be a potential issue. But if you have a dataset (or even better a pretrained model) in mind that you think would help all entrants, and it's public+free, let us know and we can see about adding it as an allowed source.
There are free+public speech to text models available in this python library
So can we use the above mentioned library?
It works for some audio files