This Questions is based on the main starter Notebook- "train_pos.ipynb".
When train the model on any language, everything works well but when I try to predict, I get some Ram errors and the notebook is terminated.
The model prints the error "Maximum sequence length exceeded: No prediction for ...." when I try words from tsn and luo...
My questions is: What are the tricks or methods to predict words from tsn and luo without running out of Ram in my case? or I'm I the only person in this condition? 😁 Just curious.
Hello ! I assume you are using a tokenizer for this.
Make sure you truncate any sequence that exceeds the model's Maximum sequence length, (usually 512 tokens), Most sequences in the testing data don't exceed 100 tokens for most Byte encoded tokenizers.
For the Ram, Try not to load all the sentences at onces as you model will need to create massive embeddings at some point that I believe your GPU won't be able to handle.
Good luck and I hope I helped you in some way.
Prediction with max batch size is not giving OOM error for me, so even you load all test set at once it's not going to give OOM error. For GPU may be there can be some silly mistakes while prediction, like not setting torch.no_grad() or torch.eval(). May be full information of error can be helpful. And i don't understand "Maximum sequence length exceeded" error as none of the sequences in test set are exceeding 512 tokens(atleast for bpe), if you are using bert or some other transformer based model.