I have tried to change all the parameters, but the same error always occurs at the same step.
Can I try other models as well or should we just use the whisper?
RuntimeError: Caught RuntimeError in replica 0 on device 0. Original Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker output = module(*input, **kwargs) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/opt/conda/lib/python3.7/site-packages/transformers/models/whisper/modeling_whisper.py", line 1384, in forward return_dict=return_dict, File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/opt/conda/lib/python3.7/site-packages/transformers/models/whisper/modeling_whisper.py", line 1249, in forward return_dict=return_dict, File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/opt/conda/lib/python3.7/site-packages/transformers/models/whisper/modeling_whisper.py", line 1001, in forward hidden_states = inputs_embeds + positions RuntimeError: The size of tensor a (449) must match the size of tensor b (448) at non-singleton dimension 1
Have you tried wav2vec2? There are a ton of pretrained models you can try out. Check this one out
https://huggingface.co/blog/fine-tune-wav2vec2-english
@intron is right. You can find all the pretrained models here https://huggingface.co/models?pipeline_tag=automatic-speech-recognition
that's why I asked, because I used a Wav2Vec2 model on an African mozilla_common-voice dataset before and it gives me significant results. I wanted to know if you prefer a specific architecture.
I think we finally found the fix for this error. If you remove all samples in train/dev where the number of characters in the transcript is over 300, that should take care of this problem