Dear participants,
We apologise and we have updated the rules but you may not use the JW3000 dataset.
Apologies again.
@zindi can English translations of articles be used?
As long as it is not hardcoded and that all edits are done in a notebook.
"but you may not use the JW3000 dataset." - due the resource disparity, I take? because a part of the dataset can still be trained on a CPU.
@zindi are we allowed to collect Chichewa text data from the web to train a language model from scratch on a bigger dataset? And if yes, should these texts and models be shared before the end of the competition?
Unfortunately, no external data is allowed.
@zindi can English translations of articles be used?
As long as it is not hardcoded and that all edits are done in a notebook.
"but you may not use the JW3000 dataset." - due the resource disparity, I take? because a part of the dataset can still be trained on a CPU.
@zindi are we allowed to collect Chichewa text data from the web to train a language model from scratch on a bigger dataset? And if yes, should these texts and models be shared before the end of the competition?
Unfortunately, no external data is allowed.