Primary competition visual

InstaDeep Enzyme Classification Challenge

Job Interview
Challenge completed over 4 years ago
Classification
520 joined
70 active
Starti
Nov 17, 20
Closei
Feb 21, 21
Reveali
Feb 21, 21
First Place summary
Connect · 22 Feb 2021, 15:51 · edited 12 days later · 9

My model is a fine-tuned pretrained model (ProtBert) that was trained on billions of protein sequence data (BFD). It is based on the Transformers architecture - BERT in particular. The pretrained model can be found here. There was no need for me to use the given unlabelled sequences. There is a similar pretrained model trained on lesser data. Both give good results. As it is integrated into the HuggingFace library, one can easily use them in similar fashion as other popular models like BERT.

There was no need for much fine-tuning as one could get good results of about 90%+ accuracy with little to no tuning. The data is also large enough to learn from various patterns.

Setup

Due to the number of model parameters (~420M), large data size, and high max sequence length (384), I had to use a TPU (v3-8) for fast model training. 1 epoch runs for ~60 minutes.

Model parameters

  • Max sequence length - I used 384 so as to boost my score. 256 and below also give good results.
  • Epochs - 1 epoch was enough to reach reasonable convergence and generalization. I didn't train for more than that.
  • AdamW optimizer.
  • 5e-5 Learning rate with scheduling.
  • Batch size - 16 * 8 (TPU cores)

I used TensorFlow as it's more TPU friendly.

Link to code

Discussion 9 answers
User avatar
L’université paris-dauphine | tunis

Thank you for this explanation

22 Feb 2021, 15:59
Upvotes 0

Great work

22 Feb 2021, 16:00
Upvotes 0

Congrats and thanks for sharing

22 Feb 2021, 18:32
Upvotes 0
User avatar
L’université paris-dauphine | tunis

Congratulations and thank you for sharing

22 Feb 2021, 19:24
Upvotes 0
User avatar
Chizurum_Olorondu
University of lagos

Nice work bro. Congratulation

22 Feb 2021, 20:33
Upvotes 0
User avatar
MICADEE
LAHASCOM

Wow..... nice one @Femi. Great work i must say. Congratulations Femi. Thanks for sharing.

22 Feb 2021, 20:45
Upvotes 0
User avatar
_MUFASA_

insightful !

22 Feb 2021, 21:26
Upvotes 0

Thank you brother! Looking forward to the code as well! Best of luck on the interview! :)

23 Feb 2021, 07:28
Upvotes 0

The rules of the competition stated: "Specifically, we should be able to re-create your submission on a single-GPU machine (eg Nvidia P100) with less than 8 hours training and two hours inference." For that same reason, I didn't use transformers.

23 Feb 2021, 18:16
Upvotes 0