Primary competition visual

PRACTICE Advanced Challenge

Knowledge
Challenge completed over 2 years ago
Classification
68 joined
6 active
Starti
Feb 15, 23
Closei
Mar 16, 23
Reveali
Mar 16, 23
About

The data for this competition consists of labelled amino acid sequences. Each sequence has a unique ID, the amino acid sequence, the organism it came from and the label. You must predict the label for the test set. Labels consist of one of 20 classes. There are ten organisms, 8 in the training set and 2 in the test set. Sequences above a set length have been excluded from this dataset.

In addition to the labelled data, you are also provided with a large set of unlabelled sequences. You may use these for any model pre-training or data augmentation methods you choose to use. You may NOT use any external data for this competition.

Files
Description
Files
Resembles Train.csv but without the target-related column. This is the dataset on which you will apply your model to.
Contains an ID, string indicating the protein and the target. This is the dataset that you will use to train your model.
Additional unlabelled sequences for language modelling or other unsupervised learning tasks.
Shows the submission format for this competition, with the ‘ID’ column mirroring that of test.csv and the target columns containing your predictions. The order of the rows does not matter, but the names of the ID must be correct.