Primary competition visual

Lacuna Masakhane Parts of Speech Classification Challenge

Helping Africa
$7 000 USD
Completed (over 2 years ago)
Classification
Natural Language Processing
472 joined
101 active
Starti
Jun 08, 23
Closei
Sep 17, 23
Reveali
Sep 17, 23
Notebook terminates during Prediction
Help · 20 Aug 2023, 08:09 · 2

This Questions is based on the main starter Notebook- "train_pos.ipynb".

When train the model on any language, everything works well but when I try to predict, I get some Ram errors and the notebook is terminated.

The model prints the error "Maximum sequence length exceeded: No prediction for ...." when I try words from tsn and luo...

My questions is: What are the tricks or methods to predict words from tsn and luo without running out of Ram in my case? or I'm I the only person in this condition? 😁 Just curious.

Discussion 2 answers

Hello ! I assume you are using a tokenizer for this.

Make sure you truncate any sequence that exceeds the model's Maximum sequence length, (usually 512 tokens), Most sequences in the testing data don't exceed 100 tokens for most Byte encoded tokenizers.

For the Ram, Try not to load all the sentences at onces as you model will need to create massive embeddings at some point that I believe your GPU won't be able to handle.

Good luck and I hope I helped you in some way.

20 Aug 2023, 10:41
Upvotes 1
User avatar
Nayal_17

Prediction with max batch size is not giving OOM error for me, so even you load all test set at once it's not going to give OOM error. For GPU may be there can be some silly mistakes while prediction, like not setting torch.no_grad() or torch.eval(). May be full information of error can be helpful. And i don't understand "Maximum sequence length exceeded" error as none of the sequences in test set are exceeding 512 tokens(atleast for bpe), if you are using bert or some other transformer based model.

20 Aug 2023, 12:20
Upvotes 1