Primary competition visual

Indaba Grand Challenge: Curing Leishmaniasis by Deep Learning Indaba

Helping Africa
3000 Zindi Points
Completed (~5 years ago)
Reinforcement Learning
341 joined
24 active
Starti
Jun 29, 20
Closei
May 31, 21
Reveali
May 31, 21
User avatar
Nuclear sciences center of tunisia
Word meaning and data set description
Help ยท 16 Jul 2020, 08:46 ยท edited ~1 hour later ยท 2

Hi,

It could be useful if we clarify some words in the dataset:

For Molecules:

1- There are 5 differents sources of molecules: What do you mean by : Endogenous?, In-Trials?, World?

1a- DrugBank_Leishmania means a valid molecules anti-Leishmania?

1b- in-Trials means a molecules in testing by industries?

1c- World means a merge of all molecules in the remain 4 files?

1d- Endogenous means a molecules produced by Leishmania

2- What was the criteria (keyword == Leishmania) to collect these molecules from differents sources? Or it was a just a raw download?

For Targets:

1- What is mean "I.major"?

2- If prefered Target is available, What is the interest for All-Target? For discovering a unknown Targets?

3- Prefered Targets means known Target of valid anti-Leishmania molecules? 30 MB seems to be a lot of proteins...

4- Sequences from Leishmania or Human?

From data Description:

1- Google storage is not yet available.

2- approved drug molecules: We have to wait these molecules from Google Storage? or it is a task that have to do during prediction?

3- safe for humans drug-like molecules: What does it mean?

Thanks

Kirus

Discussion 2 answers

Hello Kirus,

There is a google bucket with available target and ligand PDB structures.

You can download a list of "safe" approved drugs from the bucket : https://storage.googleapis.com/indaba-challenge/molecules.zip

The pdb files describing the possible targets for the Leishmania are also stored in the google bucket https://storage.googleapis.com/indaba-challenge/

You can follow this example : https://instadeep-public.gitlab.io/grandchallenge/IndabaChallenge.html to see how we use this information through a Rosetta docking protocol.

17 Jul 2020, 12:55
Upvotes 0
User avatar
Nuclear sciences center of tunisia

Thanks @nlopezcarranza,

In my case I will not use PyRosetta package. The link of the possible target is not working.

  1. In molecules.zip folder, each file is repeated 3 times with 3 differents Extentions: sdf, pdb and params.
  • Where can find the target protein for each molecule?
  • What is the difference between molecules.zip in notebook and molecules.zip in dataset? The forders do not have the same files.

Thanks,