Congratulations to the winners and many thanks to Zindi and InstaDeep for hosting this interesting competition.
My solution is a transformer model with no ensembling or fold averaging. The code is written in Pytorch.
Approach
I trained the model in Kaggle notebooks (1 x P100 GPU). Training time was approx. 8.3 hours per epoch. Inference time was approx. 2.2 hours. I used 11000 samples per class and a max length of 256 in order to stay within the 9 hour GPU runtime limit.
Edit
Please note that this solution does not comply with the competition rules i.e. < 8 hours for training and < 2 hours for inference.
Great solution
Thank you.