Primary competition visual

InstaDeep Enzyme Classification Challenge

Job Interview
Challenge completed almost 5 years ago
Classification
520 joined
70 active
Starti
Nov 17, 20
Closei
Feb 21, 21
Reveali
Feb 21, 21
7th Place Solution Approach
Connect · 23 Feb 2021, 06:14 · edited 6 days later · 2

Congratulations to the winners and many thanks to Zindi and InstaDeep for hosting this interesting competition.

My solution is a transformer model with no ensembling or fold averaging. The code is written in Pytorch.

Approach

  • Fine tune a pre-trained prot_bert_bfd model from the Hugging Face library (prot_bert_bfd was trained on amino acid sequences).
  • Map rare amino acids (U, Z, O, B) to X. This is how the data was pre-processed when prot_bert_bfd was trained.
  • Use a max sequence length of 256.
  • Use 11000 samples per class and train for 2 epochs.
  • Use dropout and weight decay to regularize the model.
  • Set aside creature4 and creature5 data for validation.

I trained the model in Kaggle notebooks (1 x P100 GPU). Training time was approx. 8.3 hours per epoch. Inference time was approx. 2.2 hours. I used 11000 samples per class and a max length of 256 in order to stay within the 9 hour GPU runtime limit.

Edit

Please note that this solution does not comply with the competition rules i.e. < 8 hours for training and < 2 hours for inference.

Discussion 2 answers