Hello fellow Aspiring Data Scientists, As one of the first timers to Audio Classification.. I'd like to get access to tutorial content on the task . Because from my point of view i'm finding difficulty in understanding how to make the models learn accurately from audios converted to images by applying different image transforms. Any form of resource helpful in understanding the topic would be much appreciated... Also Personal explanations would help a bunch.. Thanks
I have been having the same problem too. For audio i think you can not use much transformations except resize the spectogram. Unless i have been researching wrong stuff. If you find help do share.
I have found the following blog post very helpful: https://www.assemblyai.com/blog/end-to-end-speech-recognition-pytorch . As far as I can tell, data augmentation is done on the raw signal before converting into a spectrogram: https://medium.com/@makcedward/data-augmentation-for-audio-76912b01fdf6