Congratulations to the winners and many thanks to Zindi and InstaDeep for hosting this interesting competition.
My solution is a transformer model with no ensembling or fold averaging. The code is written in Pytorch.
I trained the model in Kaggle notebooks (1 x P100 GPU). Training time was approx. 8.3 hours per epoch. Inference time was approx. 2.2 hours. I used 11000 samples per class and a max length of 256 in order to stay within the 9 hour GPU runtime limit.