Feedback

type to search

Synomyns of compounds

Asked by

Hi all,

I was wondering if anybody knows of a comprehensive open-source of synonyms for compounds? I am wondering if PubChem would be the best source.

Ultimately, I want to search pubmed for occurances of a set of ~500 named compounds.

Thanks

Iain

NN comments
jrisse
-

There is a thesaurus available published in this paper http://bioinformatics.oxfordjournals.org/content/25/22/2983.long , but the sets of synonyms are not unique as in a lot of terms are assigned to multiple categories. The same goes for PubChem, where a lot of compounds are duplicated or not distinguishable in PubMed. Another free datasource is ChEBI (www.ebi.ac.uk/chebi) and Kegg used to be one, but the ftp server is now subscription only. If you have some patience, I am about to submit a paper containing a compound thesaurus build from ChEBI and KEGG. Depending on what you ultimately want to do with it we could discuss making it available.

fredrik wallner
-

You might want to move this to an answer instead of a comment in order for it to be marked as accepted. If you don't see the box to type an answer, click on the "Answer" symbol below. There are still some design problems with this site...

iain.m.wallace
-
, that is really interesting, I would certainly like to use your thesaurus if possible. My aim is really a simple pubmed search for “drug name Or alias1 Or alias 2 etc” AND yeast. I understand, as has been pointed out alot of pruning may have to be done. 

jrisse
-

Ah, but drug names are not included as such in my thesaurus as the drug synsets should also contain commercial product names and I am more interested in metabolic compounds. Drugbank (http://www.drugbank.ca) might be a solution to that, and they do provide a download option for all entries in XML format.

Are you by the way familiar with the restrictions PubMed places on hightroughput activities? They do provide programmatic access for this kind of things with the entrez eutils (http://www.ncbi.nlm.nih.gov/books/NBK25500/)

or Cancel

5 answers

1

fredrik wallner [ Editor ] from Sverige

If you can’t use Cactvs (e.g. you’re non-academic on a tight budget), the webel module in Cinfony let’s you achieve the same name lookup in Python.

from cinfony import webel
import urllib2
namelist = ['aspirin', 'toluene', 'taxol', 'gregre']
for name in name list:
     try:
            print "Name: %s Aliases: %s" % (name, webel.readstring("name", name).write("names"))
     except urllib2.HTTPError:
             print "Failed to resolve %s" % (name)
NN comments
iain.m.wallace
-

Thanks for the script, looks really useful!

or Cancel
1

wdiwdi [ Editor ] from Frankfurt am Main, Deutschland

You could use the NCI resolver, in the “names” reporting mode.

This is easily done  with the Cactvs toolkit in an implicit fashion:

Script snippet:

foreach name $namelist {
if {[catch {ens create $name} eh]} {
puts stderr "Failed to resolve $name"
} else {
puts "Name: $name Aliases: [ens get $eh E_NAMESET]"
ens delete $eh
}
}

Still, I do not believe that an automatic simple name search in PubMed, even with aliases (which themselves may collide) will yield quality results…

NN comments
iain.m.wallace
-

Thanks for the script and the caveat. 

I understand it won’t be perfect, but it certainly leads to some useful results. Some synonyms are going to be more specific, but similar challenges have been overcome in extracting protein and gene information from Pubmed before.

or Cancel
0

the.chemist.ds from United Kingdom

You can also perform this search using ChemSpider.


If you find the record for the compound of interest and open the Articles infobox, there is a tab labelled PubMed which returns results from PubMed. The results are returned by creating a query using all of the validated synonyms in the Names and Identifiers Infobox. Only 6-7 of the PubMed results are returned in the Articles Infobox, but there is a link which will take you through to the NCBI homepage where you can see all of the results and/or refine the search.
NN comments
iain.m.wallace
-

Is it possible to download all the validated names/identifiers in ChemSpider? I am searching for ~2000 compounds, so am making use of the eutils facility in pubmed.

or Cancel
0

jrisse from Amsterdam, Nederland

Ok, now I found the answer box:
There is a thesaurus available published in this paper http://bioinformatics.oxfordjournals.org/content/25/22/2983.long , but the sets of synonyms are not unique as in a lot of terms are assigned to multiple categories. The same goes for PubChem, where a lot of compounds are duplicated or not distinguishable in PubMed. Another free datasource is ChEBI (www.ebi.ac.uk/chebi) and Kegg used to be one, but the ftp server is now subscription only. If you have some patience, I am about to submit a paper containing a compound thesaurus build from ChEBI and KEGG. Depending on what you ultimately want to do with it we could discuss making it available.

or Cancel

Your answer

You need to join Blue Obelisk eXchange to complete this action, click here to do so.