@zindi, in the rules it states "Your models should not use any of the metadata provided" - what precisely do you mean by this?
Please don't use the length of utterance or any information from the name of the file. Using metadata will not be useful to the client.
i guess it means we cannot directly extract text from audio and use it ....
Your task is to predict the labels for the test set. You can extract text from audio.