Primary competition visual

Indaba Grand Challenge: Curing Leishmaniasis by Deep Learning Indaba

Helping Africa
3000 Zindi Points
Completed (~5 years ago)
Reinforcement Learning
341 joined
24 active
Starti
Jun 29, 20
Closei
May 31, 21
Reveali
May 31, 21
The official competition of the Deep Learning Indaba 2020

Leishmaniasis is a neglected disease. As a disease of poverty, it has historically received limited funding for discovery, development and delivery of new tools. Current treatment is costly, lengthy, painful and sometimes toxic. Like a handful of similar diseases, it is the scourge of whole regions affected by them, because we still miss cheap, safe and effective cures for them.

At the same time, new drug candidates are being developed and old ones are being tested every day. Today, millions of drug activity assays are available at the press of a button. In this Indaba Grand Challenge, we dare to ask you to help identify amongst the already known, tested and (often) approved drugs, potential cures for different forms of leishmaniasis.

The goal is to propose a new treatment, comprising a Leishmania protein (present in the proteome of one or more of the Leishmania species) and a small molecule (or set of small molecules). Submission can be specified as:

  • PDB file with a structure of a target and a bound small molecule

To enroll in this challenge and get the secret code to join, you must also be registered for the challenge through the Deep Learning Indaba: https://deeplearningindaba.com/grand-challenges/leishmaniasis/

This competition is sponsored by the Deep Learning Indaba and its partners.

Glossary
  • Amino acid: Amino acids are organic compounds composed of a linear chain of nitrogen, carbon, hydrogen and oxygen, along with a variable side chain group. Amino acids are building blocks of proteins. There are 20 amino acids that can form proteins. Proteins are composed of one or more long (often hundreds of positions) chains of amino acids, connected by peptide bonds.
  • Amino acid sequence: Linear arrangement of amino acids in a protein chain. Proteins are composed of one or more chains of amino acids, and the structure and function of each protein is uniquely determined by the types and combination of amino acids present in the chain. The same amino acid sequence will (nearly) always result in the same protein, with the same structure, same function, and being targeted by the same drugs.
  • Approved drug molecule: A drug molecule that has been validated for a therapeutic use by a ruling authority of a government, because it has been proven effective and safe to the patient. Such molecules are the most attractive candidates for drug repurposing.
  • Binding affinity: Binding affinity is the measure of the strength of the binding interaction between a single biomolecule (e.g. protein or DNA) to its ligand/binding partner (e.g. drug). It is usually measured in terms of concentration that is sufficient for binding. The lower, the better. Drugs with binding affinity around 10 micromoles are generally feasible. Commercially successful drugs have binding affinity in the nanomole range.
  • Binding pocket: Proteins interact with the environment through binding small molecules (ligands). The region on the protein surface which binds the interacting ligand, with certain affinity (see binding affinity) and specificity (what partners will bind there), is called the binding pocket. It is usually quite well defined pocket on the protein surface, lined with amino acids that determine the binding specificity. Proteins with similar binding pockets bind similar ligands, even if their annotated function is different. Also known as binding site or active site.
  • Conformer: a three-dimensional molecule (isomer) that differs from another isomer by the rotation of one or more bonds in the molecule. Conformers of one molecule are composed of the same atoms and have the same connectivity. They are all the same drug. While theoretically nearly infinite number of conformers is possible for any molecule with rotatable bonds, there is an enumerable number of conformers which are most likely, due to being energetically favorable.
  • Docking: a method which predicts the preferred orientation of one molecule to a second when bound to each other to form a stable complex. One can think of molecular docking as a problem of “lock-and-key”, in which one wants to find the correct relative orientation of the “key” which will open up the “lock” (where on the surface of the lock is the key hole - binding pocket, which direction to turn the key after it is inserted, etc.). Here, the protein can be thought of as the “lock” and the ligand can be thought of as a “key”. Molecular docking may be defined as an optimization problem, which would describe the “best-fit” orientation of a ligand that binds to a particular protein of interest. However, since both the ligand and the protein are flexible, a “hand-in-glove” analogy is more appropriate than “lock-and-key”. During the course of the docking process, the ligand and the protein adjust their conformation to achieve an overall "best-fit" and this kind of conformational adjustment resulting in the overall binding is referred to as "induced-fit" (from https://en.wikipedia.org/wiki/Docking_(molecular)).
  • Docking protocol: procedure specifying how to perform docking. It is a compromise between accuracy and efficiency.
  • Drug activity assays: an investigative (analytic) procedure in laboratory medicine, pharmacology, environmental biology and molecular biology for qualitative assessment of drug compounds. Their results can be either qualitative (e.g. works well/works poorly/does not work), or quantitative (e.g. yielding binding affinities or activity measurements)
  • Drug candidate: A molecular (often small), which is believed to have a therapeutic effect for a particular disease. It is a product of a drug discovery process, and subject of drug activity assays. A successful drug candidate (lead compound) can be further elaborated by medicinal chemists to improve its properties.
  • Drug repurposing: Process of discovering novel uses for known, drugs and drug candidates. Can involve finding known drug-protein pairs, where the protein is similar to a protein in a pathogen of interest.
  • Drug target: Protein of the pathogenic organism causing the disease of interest, which is essential for its operation, or disruption of whose function may negatively affect the pathogen.
  • Isomers: molecules that have the same molecular formula, but have a different arrangement of the atoms in space. Conformers are a form of isomers. Molecules which are isomers, but differ in topology (connectivity) are different. Isomers with the same connectivity are the same compound, but are not necessarily equally functional (see stereoisomers)
  • Leishmaniasis: a parasitic disease that is found in parts of the tropics, subtropics, and southern Europe. It is classified as a neglected tropical disease (NTD). Leishmaniasis is caused by infection with Leishmania parasites, which are spread by the bite of phlebotomine sand flies.
  • Ligand-based drug discovery: A drug discovery strategy, that relies on the knowledge of a binding molecule alone. It is based on an assumption, that similar molecules bind to similar proteins.
  • Lipinski’s rule of 5: set of rules of thumb to evaluate drug-likeness of a molecule. Drugs that violate them are unlikely to form good oral drugs. Drug-likeness can be easily computed by modern cheminformatics software. Some very successful drugs violate these rules.
  • Ligand: a substance that binds to a biomolecule in its binding pocket as (or instead of) its intended partner, and through interacting with it activates it (agonist), inactivates it (antagonist) or makes it perform opposite action (inverse agonist). Ligand binding serves a biological purpose.
  • Metabolic pathway: a linked series of chemical reactions occurring within a cell. Can be thought of as a supply chain, in which one process produces inputs for the next one. Perturbing metabolic pathways is an attractive target for drug discovery.
  • Molecular topology graph: a representation of the structural formula of a chemical compound in terms of graph theory.
  • Pose: A snapshot of atom positions in a molecule or a complex. Can be a basis for evaluating the forces acting in the system and measuring binding.
  • Proteins: large molecules consisting of one or more long chains of amino acids connected by peptide bonds.
  • Protein-protein interaction: physical contacts of high specificity established between two or more protein molecules as a result of biochemical events steered by interactions that include electrostatic forces, hydrogen bonding and the hydrophobic effect. They form after the protein is synthesized in the cell. Inhibiting protein-protein interactions, by small drugs that bind to the interaction surface is a good strategy for drug discovery.
  • Proteome: the entire set of proteins that is, or can be, expressed by a genome, cell, tissue, or organism at a certain time.
  • Small molecule: a low molecular weight organic compound that may regulate a biological process, with a size on the order of 1 nanometer (approximately the size of one amino acid). Small molecules, even ones that do not exist yet, can be synthesized by medicinal chemists. It is much easier to synthesize variants of already known molecules, than to discover a new synthetic pathway.
  • Structure-guided drug discovery: a type of drug discovery that relies on knowledge of the three-dimensional structure of a particular biomolecular target.
Rules

Teams and collaboration

You may participate in this competition as an individual or in a team of up to four people. When creating a team, the team must have a total submission count less than or equal to the maximum allowable submissions as of the formation date. A team will be allowed the maximum number of submissions for the competition, minus the highest number of submissions among team members at team formation. Prizes are transferred only to the individual players or to the team leader.

Multiple accounts per user are not permitted. Collaboration is encouraged as long as participants are registered with only one team. Individuals and their submissions originating from multiple accounts will be disqualified.

Code can be shared privately outside of a team, but we encourage sharing with all competition participants through the platform. (i.e. on the discussion boards) to maximise interaction among participants.

Datasets and packages

The solution must use publicly-available, open-source packages only.

Use of external data is allowed and data sharing is encouraged. You may use pretrained models as long as they are openly available to everyone.

You are allowed to access, use, and share competition data for any commercial, non-commercial, research or education purposes, under a CC-BY SA 4.0 license.

Your solution must not infringe the rights of any third party, although compounds protected by IP laws are not automatically disqualified.

Submissions and winning

You may make a maximum of 10 submissions per day. Your highest-scoring solution on the public leaderboard at the end of the competition will be the one by which you are judged.

If you are in the top three at the time the leaderboard closes, we will email you to request your code. On receipt of email, you will have 48 hours to respond and submit your code following the submission guidelines detailed below. Failure to respond will result in disqualification.

If your solution places 1st, 2nd, or 3rd in the final ranking, will NOT be required to assign rights of copyright to Zindi. We will however encourage the winners to share their code on GitHub as a public good to the sector.

If two solutions earn identical scores on the leaderboard, the tiebreaker will be the date and time in which the submission was made (the earlier solution will win).

You acknowledge and agree that Zindi may, without any obligation to do so, remove or disqualify an individual, team, or account if Zindi believes that such individual, team, or account is in violation of these rules. Entry into this competition constitutes your acceptance of these official competition rules.

Zindi is committed to providing solutions of value to our clients and partners. To this end, we reserve the right to disqualify your submission on the grounds of usability or value. This includes but is not limited to the use of data leaks or any other practices that we deem to compromise the inherent value of your solution.

Zindi also reserves the right to disqualify you and/or your submissions from any competition if we believe that you violated the rules or violated the spirit of the competition or the platform in any other way. The disqualifications are irrespective of your position on the leaderboard and completely at the discretion of Zindi.

Please refer to the FAQs and Terms of Use for additional rules that may apply to this competition. We reserve the right to update these rules at any time.

Reproducibility

  • If your submitted code does not reproduce your score on the leaderboard, we reserve the right to adjust your rank to the score generated by the code you submitted.
  • If your code does not run you will be dropped from the top 10. Please make sure your code runs before submitting your solution.
  • Always set the seed. Rerunning your model should always place you at the same position on the leaderboard. When running your solution, if randomness shifts you down the leaderboard we reserve the right to adjust your rank to the closest score that your submission reproduces.
  • We expect full documentation. This includes:
  • All data used
  • Output data and where they are stored
  • Explanation of features used
  • Your solution must include the original data provided by Zindi and validated external data (no processed data)
  • All editing of data must be done in a notebook (i.e. not manually in Excel)

Data standards:

  • Your submitted code must run on the original train, test, and other datasets provided.
  • External data must be freely and publicly available, including pre-trained models with standard libraries. If external data is allowed, any data used should be shared on the discussion forum.
  • Packages:
  • You must use the most recent versions of packages. Custom packages in your submission notebook will not be accepted.
  • You may only use tools available to everyone i.e. no paid services or free trials that require a credit card.

Consequences of breaking any rules of the competition or submission guidelines:

  • First offence: No prizes or points for 6 months. If you are caught cheating all individuals involved in cheating will be disqualified from the challenge(s) you were caught in and you will be disqualified from winning any competitions or Zindi points for the next six months.
  • Second offence: Banned from the platform. If you are caught for a second time your Zindi account will be disabled and you will be disqualified from winning any competitions or Zindi points using any other account.

Monitoring of submissions

  • We will review at least the top three solutions of every competition when the competition ends.
  • We reserve the right to request code from any user at any time during a challenge. You will have 24 hours to submit your code following the rules for code review (see above).
  • If you do not submit your code within 24 hours you will be disqualified from winning any competitions or Zindi points for the next six months. If you fall under suspicion again and your code is requested and you fail to submit your code within 24 hours, your Zindi account will be disabled and you will be disqualified from winning any competitions or Zindi points.
Evaluation

Solutions submitted to the Zindi Platform by the Authorised Participants will be scored by InstaDeep’s bioinformatics platform using a single evaluation algorithm.

You are allowed to use your preferred programming language, e.g. Python or R, as long as your submission is in the correct file format. However, most of the provided resources to get started focus on Python.

Starter Notebook
Prizes

There are no cash prizes for this competition. The goal is to advance science and save lives by curing a disease.

The top 10 submissions will earn up to 3000 Zindi Points.

However, there is funding for other AI projects, either related to this competition or other area of innovation. Please apply at www.deeplearningindaba.com/2020/ai4d-indabax-innovation-call-for-proposals.

Timeline

Competition closes on 7 March 2021.

Final submissions must be received by 11:59 PM GMT.

We reserve the right to update the contest timeline if necessary.