Primary competition visual

InstaDeep Enzyme Classification Challenge

Job Interview
Challenge completed over 4 years ago
Classification
520 joined
70 active
Starti
Nov 17, 20
Closei
Feb 21, 21
Reveali
Feb 21, 21
About

The data for this competition consists of labelled amino acid sequences. Each sequence has a unique ID, the amino acid sequence, the organism it came from and the label. You must predict the label for the test set. Labels consist of one of 20 classes. There are ten organisms, 8 in the training set and 2 in the test set. Sequences above a set length have been excluded from this dataset.

In addition to the labelled data, you are also provided with a large set of unlabelled sequences. You may use these for any model pre-training or data augmentation methods you choose to use. You may NOT use any external data for this competition.

Files available for download:

  • Train.csv - contains an ID, string indicating the protein and the target. This is the dataset that you will use to train your model.
  • Test.csv- resembles Train.csv but without the target-related column. This is the dataset on which you will apply your model to.
  • SampleSubmission.csv - shows the submission format for this competition, with the ‘ID’ column mirroring that of test.csv and the target columns containing your predictions. The order of the rows does not matter, but the names of the ID must be correct.
  • UnlabelledSequences.zip - Additional unlabelled sequences for language modelling or other unsupervised learning tasks.
Files
Description
Files