Primary competition visual

AI4D Yorùbá Machine Translation Challenge

Helping Nigeria
$2 000 USD
Challenge completed over 4 years ago
Machine Translation
678 joined
84 active
Starti
Dec 04, 20
Closei
May 30, 21
Reveali
May 30, 21
User avatar
Federal University of Technology Akure
hard to predict on test data in colab
Data · 20 Dec 2020, 11:42 · 3

colab runtime run out of memory and restarts when trying to predict on all test data

Discussion 3 answers
User avatar
Muhamed_Tuo
Inveniam

Hi,

if you haven't resolve the problem, I would suggest reducing the batch_size and the tokenizer max_length (any value below 80 should work for a batch_size of 16 or lower).

Here is something I like to do:

train['length'] = train['Yoruba'].apply(lambda x: len(tokenizer.encode(x)))
train.length.hist()
train.length.describe()

# Do the same for English, and choose your max_length accordingly.
User avatar
Muhamed_Tuo
Inveniam

It is also important to keep in mind that a too low max_length may have an impact on your BLEU score. But I guess it is a compromise you have to accept, depending on your training hardware.

User avatar
Federal University of Technology Akure

thanks