Feedback

type to search

Converting from compound name to chemical structure

Asked by

Hi all,



I was wondering if anybody could help me figure out a comprehensive way of converting compound names to smiles structures.

The best I have so far is this python script: 

from cinfony import webel
import urllib2

namelist=[‘rapamycin’,‘aspirin]

for name in namelist:
try:
compound = webel.readstring(“name”, name)
print compound.smiles
except urllib2.HTTPError:
print “Failed to resolve %s” % (name)
compound = “NA”
My problem with this approach is that it doesn’t identify the structure of 'rapamycin’

This could be fixed by using the chemspider resovler (http://cactus.nci.nih.gov/blog/?p=1386) such as 

Any pointers on how to modify my script to use this resolver, or even an alternative approach completely, greatly received

Cheers,

Iain
or Cancel

3 answers

4

baoilleach [ Admin ] from Unknown

I’ll log this as a feature request for webel. In the meanwhile, you’ll have to do it manually:
>>> print urllib2.urlopen(“http://cactus.nci.nih.gov/chemical/structure/rapamycin/smiles?resolver=name_by_chemspider”).read() C3([C@@H]1CCCCN1C(=O)C([C@@]2(O[C@@H](CC[C]2[CH2])C[C](C(=[C][C]=[C][C]=[C][C](C [C](C([C]([C](C(=[C][C](C(C[C@H](O3)[C](C[C@@H]4CC[C]([C](C4)O[CH2])[O])[CH2])=O )[CH2])[CH2])[O])O[CH2])=O)[CH2])[CH2])[CH2])O[CH2])[O])=O)=O

NN comments
wdiwdi
-

Note that this version lost double bond stereochemistry (compare with the KEGG version above).

iain.m.wallace
-

Daniel’s comment below would be an even better feature request for webel.

or Cancel
3

wdiwdi [ Editor ] from Frankfurt am Main, Deutschland

2 line Cactvs script:

lappend cactvs(lookuphosts) kegg
ens get [ens create rapamycin] E_SMILES

Result:

N24[C@H](C(OC([C](CC1C[C@H]([C@H](O)CC1)OC)C)CC([C](\C=C(\[C@H]([C@H](C([C](C[C](/C=C/C=C/C=C(/[C@H](C[C@H]3O[C@@](C(C2=O)=O)([C](CC3)C)O)OC)C)C)C)=O)OC)O)C)C)=O)=O)CCCC4

(I suggest to always use KEGG as first-line resolver for this kind of biological compound - the structures are more reliable than Chemspider or PubChem - though of course the database is smaller)


or Cancel
0

dan2097 [ Editor ] from Cambridge, United Kingdom

Your modified request I believe will only be using ChemSpider for name resolving.
You may get higher recall if you include the chemical identifier resolver’s database. As you can choose the order the resolvers are used in you can make a request such that the structure will always be returned from ChemSpider if the name is present there. There’s a nice guide on how to do this in Markus' blog:
http://cactus.nci.nih.gov/blog/?p=1386

NN comments
wdiwdi
-

Cactvs has built-in support for name resolution by the NCI resolver, KEGG, ChemSpider (direct, not via NCI) and OPSIN. The sequence of contacting these is straightforward to configure.

or Cancel

Your answer

You need to join Blue Obelisk eXchange to complete this action, click here to do so.