Intron AfriSpeech-200 Automatic Speech Recognition Challenge
Can you create an automatic speech recognition (ASR) model for African accents, for use by doctors?
Prize
$5 000 USD
Time
2 months to go
Participants
11 active · 193 enrolled
Advanced
Automatic Speech Recognition
Health
Media
hidden_states
Help · 27 Feb 2023, 00:27 · 4

I have tried to change all the parameters, but the same error always occurs at the same step.

Can I try other models as well or should we just use the whisper?

RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/transformers/models/whisper/modeling_whisper.py", line 1384, in forward
    return_dict=return_dict,
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/transformers/models/whisper/modeling_whisper.py", line 1249, in forward
    return_dict=return_dict,
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/transformers/models/whisper/modeling_whisper.py", line 1001, in forward
    hidden_states = inputs_embeds + positions
RuntimeError: The size of tensor a (449) must match the size of tensor b (448) at non-singleton dimension 1
Discussion 4 answers

Have you tried wav2vec2? There are a ton of pretrained models you can try out. Check this one out

https://huggingface.co/blog/fine-tune-wav2vec2-english

27 Feb 2023, 07:06
Upvotes 1

that's why I asked, because I used a Wav2Vec2 model on an African mozilla_common-voice dataset before and it gives me significant results. I wanted to know if you prefer a specific architecture.

I think we finally found the fix for this error. If you remove all samples in train/dev where the number of characters in the transcript is over 300, that should take care of this problem

4 Mar 2023, 18:40
Upvotes 0