Primary competition visual

Intron AfriSpeech-200 Automatic Speech Recognition Challenge

$5 000 USD
Challenge completed over 2 years ago
Automatic Speech Recognition
430 joined
41 active
Starti
Feb 17, 23
Closei
May 28, 23
Reveali
May 28, 23
User avatar
Siwar_NASRI
hidden_states
Help · 27 Feb 2023, 00:27 · 4

I have tried to change all the parameters, but the same error always occurs at the same step.

Can I try other models as well or should we just use the whisper?

RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/transformers/models/whisper/modeling_whisper.py", line 1384, in forward
    return_dict=return_dict,
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/transformers/models/whisper/modeling_whisper.py", line 1249, in forward
    return_dict=return_dict,
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/transformers/models/whisper/modeling_whisper.py", line 1001, in forward
    hidden_states = inputs_embeds + positions
RuntimeError: The size of tensor a (449) must match the size of tensor b (448) at non-singleton dimension 1
Discussion 4 answers

Have you tried wav2vec2? There are a ton of pretrained models you can try out. Check this one out

https://huggingface.co/blog/fine-tune-wav2vec2-english

27 Feb 2023, 07:06
Upvotes 1
User avatar
Muhamed_Tuo
Inveniam

@intron is right. You can find all the pretrained models here https://huggingface.co/models?pipeline_tag=automatic-speech-recognition

User avatar
Siwar_NASRI

that's why I asked, because I used a Wav2Vec2 model on an African mozilla_common-voice dataset before and it gives me significant results. I wanted to know if you prefer a specific architecture.

I think we finally found the fix for this error. If you remove all samples in train/dev where the number of characters in the transcript is over 300, that should take care of this problem

4 Mar 2023, 18:40
Upvotes 0