Hi,
It could be useful if we clarify some words in the dataset:
For Molecules:
1- There are 5 differents sources of molecules: What do you mean by : Endogenous?, In-Trials?, World?
1a- DrugBank_Leishmania means a valid molecules anti-Leishmania?
1b- in-Trials means a molecules in testing by industries?
1c- World means a merge of all molecules in the remain 4 files?
1d- Endogenous means a molecules produced by Leishmania
2- What was the criteria (keyword == Leishmania) to collect these molecules from differents sources? Or it was a just a raw download?
For Targets:
1- What is mean "I.major"?
2- If prefered Target is available, What is the interest for All-Target? For discovering a unknown Targets?
3- Prefered Targets means known Target of valid anti-Leishmania molecules? 30 MB seems to be a lot of proteins...
4- Sequences from Leishmania or Human?
From data Description:
1- Google storage is not yet available.
2- approved drug molecules: We have to wait these molecules from Google Storage? or it is a task that have to do during prediction?
3- safe for humans drug-like molecules: What does it mean?
Thanks
Kirus
Hello Kirus,
There is a google bucket with available target and ligand PDB structures.
You can download a list of "safe" approved drugs from the bucket : https://storage.googleapis.com/indaba-challenge/molecules.zip
The pdb files describing the possible targets for the Leishmania are also stored in the google bucket https://storage.googleapis.com/indaba-challenge/
You can follow this example : https://instadeep-public.gitlab.io/grandchallenge/IndabaChallenge.html to see how we use this information through a Rosetta docking protocol.
Thanks @nlopezcarranza,
In my case I will not use PyRosetta package. The link of the possible target is not working.
Thanks,