In this challenge, InstaDeep is asking for your help to design a function to score how effectively an antibody binds to the influenza virus receptor binding domain. Identifying a neutralising antibody that effectively targets the influenza virus could offer enormous therapeutic potential.
The train file consists of 70 000 sequences with binding energy and binding energy per position in the RBD. The test file contains 10 000 sequences with binding energy per position in the RBD. The data also include 4 pdb files from PDB bank to explore based on this article along with pre trained embeddings for the sequences.
The objective of this challenge is to predict the binding of influenza.
A starter notebook is provided.
Files available for download:
Train.csv - contains the target. This is the dataset that you will use to train your model. In this case target can be any combination of the energies in the train dataset
Test.csv- resembles Train.csv but without the target-related columns. This is the dataset on which you will apply your model to.
SampleSubmission.csv - shows the submission format for this competition, with the ‘ID ‘ column mirroring that of Test.csv and the ‘binding’ column containing your predictions. The order of the rows does not matter, but the names of the ID must be correct.
embeddings.csv Contains the protBert embeddings for the sequences, generated by DeepChain Playground pre trained transformers