PRACTICE Advanced Challenge 🧬

PRACTICE Advanced Challenge

Knowledge

Completed (almost 3 years ago)

Skills you will learn

Classification

69 joined

6 active

Info Data Leaderboard

Start

Feb 15, 23

Mar 16, 23

Reveal

Mar 16, 23

About

The data for this competition consists of labelled amino acid sequences. Each sequence has a unique ID, the amino acid sequence, the organism it came from and the label. You must predict the label for the test set. Labels consist of one of 20 classes. There are ten organisms, 8 in the training set and 2 in the test set. Sequences above a set length have been excluded from this dataset.

In addition to the labelled data, you are also provided with a large set of unlabelled sequences. You may use these for any model pre-training or data augmentation methods you choose to use. You may NOT use any external data for this competition.

Files

Description

Files

Resembles Train.csv but without the target-related column. This is the dataset on which you will apply your model to.

Contains an ID, string indicating the protein and the target. This is the dataset that you will use to train your model.

Additional unlabelled sequences for language modelling or other unsupervised learning tasks.

Shows the submission format for this competition, with the ‘ID’ column mirroring that of test.csv and the target columns containing your predictions. The order of the rows does not matter, but the names of the ID must be correct.

Join the largest network for
data scientists and AI builders

About FAQs

Status