Primary competition visual

UmojaHack Africa 2022: African Snake Antivenom Binding Challenge (ADVANCED)

Helping Africa
$3 000 USD
Challenge completed over 3 years ago
Natural Language Processing
Classification
252 joined
112 active
Starti
Mar 19, 22
Closei
Mar 20, 22
Reveali
Mar 20, 22
About

The challenge dataset stems from a high-density peptide microarray experiment that aimed to address how cross-reactive 8 different commercially available snake antivenoms are and where in the toxin sequence the antibodies they contain bind the toxin (epitope).

Each row in the dataset represents a k-mer (16 amino acid sequence within the toxin) and it has a signal column coming from the high-density peptide microarray experiment.

You will need to predict the signal column generated by a given Toxin_K_mer and Antivenom. You can use any other column available in the test set to enhance your predictions or enrich your data. We also facilitate the protein prot_bert embeddings for each row.

Watch this video, it is a walk-through of the starter notebook and the relevance of the challenge.

You can also view it here.

The data you are presented with includes the following columns:

  1. ID : Unique identifier for each row
  2. Toxin_UniprotID= identifier for a specific toxin
  3. Position_start = The start position in the toxin global sequence of the k-mer Position_end = The end position in the toxin global sequence a given k-mer
  4. Antivenom = name of the antivenom tested in the high-density peptide microarray experiment
  5. Toxin_K_mer = string of 16 amino acids (16-mer) from a given toxin sequence
  6. Signal = (target) The output of the experiment. A proxy for antivenom activity.
  7. Genus = genus of snake the toxin stems from, e.g. Naja (cobra)
  8. Species = species of snake the toxin originates from e.g. Naja nigricollis (Black-necked spitting cobra)
  9. ProteinFam= Toxin protein family, e.g. three finger toxin (3FTx)
  10. ProteinSubFam= Toxin sub-family, e.g. cytotoxin (a type of 3FTx)
  11. ProteinSubSubFam= Toxin sub-sub-family, e.g. cytotoxin IA (a type of cytotoxin)
Files
Description
Files
This is the dataset that you will use to train your model, it contains the target.
This is the dataset on which you will apply your model to, it resembles Train.csv but without the target column.
This shows the submission format for this competition, with the ‘ID’ column mirroring that of Test.csv and the ‘Signal’ column containing your predictions. The order of the rows does not matter, but the names of the ‘ID’ must be correct.
This will help you make your first submission on the leaderboard.