UmojaHack Africa 2022: African Snake Antivenom Binding Challenge (ADVANCED) 🛡️

UmojaHack Africa 2022: African Snake Antivenom Binding Challenge (ADVANCED)

Helping Africa

$3 000 USD

Completed (almost 4 years ago)

Skills you will learn

Natural Language Processing

Classification

252 joined

112 active

Info Data Chat Leaderboard

Start

Mar 19, 22

Mar 20, 22

Reveal

Mar 20, 22

About

The challenge dataset stems from a high-density peptide microarray experiment that aimed to address how cross-reactive 8 different commercially available snake antivenoms are and where in the toxin sequence the antibodies they contain bind the toxin (epitope).

Each row in the dataset represents a k-mer (16 amino acid sequence within the toxin) and it has a signal column coming from the high-density peptide microarray experiment.

You will need to predict the signal column generated by a given Toxin_K_mer and Antivenom. You can use any other column available in the test set to enhance your predictions or enrich your data. We also facilitate the protein prot_bert embeddings for each row.

Watch this video, it is a walk-through of the starter notebook and the relevance of the challenge.

You can also view it here.

The data you are presented with includes the following columns:

ID : Unique identifier for each row
Toxin_UniprotID= identifier for a specific toxin
Position_start = The start position in the toxin global sequence of the k-mer Position_end = The end position in the toxin global sequence a given k-mer
Antivenom = name of the antivenom tested in the high-density peptide microarray experiment
Toxin_K_mer = string of 16 amino acids (16-mer) from a given toxin sequence
Signal = (target) The output of the experiment. A proxy for antivenom activity.
Genus = genus of snake the toxin stems from, e.g. Naja (cobra)
Species = species of snake the toxin originates from e.g. Naja nigricollis (Black-necked spitting cobra)
ProteinFam= Toxin protein family, e.g. three finger toxin (3FTx)
ProteinSubFam= Toxin sub-family, e.g. cytotoxin (a type of 3FTx)
ProteinSubSubFam= Toxin sub-sub-family, e.g. cytotoxin IA (a type of cytotoxin)

Files

Description

Files

This is the dataset that you will use to train your model, it contains the target.

This is the dataset on which you will apply your model to, it resembles Train.csv but without the target column.

This shows the submission format for this competition, with the ‘ID’ column mirroring that of Test.csv and the ‘Signal’ column containing your predictions. The order of the rows does not matter, but the names of the ‘ID’ must be correct.

This will help you make your first submission on the leaderboard.

Join the largest network for
data scientists and AI builders

About FAQs

Status