AI4D Yorùbá Machine Translation Challenge
$2,000 USD
Can you translate Yorùbá to English?
263 data scientists enrolled, 21 on the leaderboard
TranslationUnstructuredNLP
Nigeria
4 December 2020—11 April 2021
Ends in 3 months
hard to predict on test data in colab

colab runtime run out of memory and restarts when trying to predict on all test data

Hi,

if you haven't resolve the problem, I would suggest reducing the batch_size and the tokenizer max_length (any value below 80 should work for a batch_size of 16 or lower).

Here is something I like to do:

train['length'] = train['Yoruba'].apply(lambda x: len(tokenizer.encode(x)))
train.length.hist()
train.length.describe()

# Do the same for English, and choose your max_length accordingly.

It is also important to keep in mind that a too low max_length may have an impact on your BLEU score. But I guess it is a compromise you have to accept, depending on your training hardware.