I think we finally found the fix for this error.
RuntimeError: Caught RuntimeError in replica 0 on device 0. Original Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker output = module(*input, **kwargs) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/opt/conda/lib/python3.7/site-packages/transformers/models/whisper/modeling_whisper.py", line 1384, in forward return_dict=return_dict, File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/opt/conda/lib/python3.7/site-packages/transformers/models/whisper/modeling_whisper.py", line 1249, in forward return_dict=return_dict, File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/opt/conda/lib/python3.7/site-packages/transformers/models/whisper/modeling_whisper.py", line 1001, in forward hidden_states = inputs_embeds + positions RuntimeError: The size of tensor a (449) must match the size of tensor b (448) at non-singleton dimension 1
Removing all samples in train/dev where the number of characters in the transcript is over 300 should take care of this problem.
thanks @intron,
When I used a truncation with a max_length = 80, the error changed to: "expected sequence of length 80 at dim 1 (got 90)" and I'm still looking after it. I think it will be better to remove the last truncation than to drop all samples>300.
Even after filtering the transcripts >300 the error remains "ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length."
NB: padding and truncation already activated.
When I find out the reason, I will let you know.
This is one of the batchs where I have the problem, the length of the input_features is 80 (correct), but the labels length is 85 (!=label_ids), so if you have the same problem, you need to adjust the preparation function.