Hi all,
I was wondering if anybody knows of a comprehensive open-source of synonyms for compounds? I am wondering if PubChem would be the best source.
Ultimately, I want to search pubmed for occurances of a set of ~500 named compounds.
Thanks
Iain
Rich Apodaca
[ Editor ]
from México
The Chempedia Registry
was an attempt to create what you’re asking about. Although new submissions aren’t being accepted at this time, the entire dataset is available under the CC0 License.
There is a thesaurus available published in this paper http://bioinformatics.oxfordjournals.org/content/25/22/2983.long , but the sets of synonyms are not unique as in a lot of terms are assigned to multiple categories. The same goes for PubChem, where a lot of compounds are duplicated or not distinguishable in PubMed. Another free datasource is ChEBI (www.ebi.ac.uk/chebi) and Kegg used to be one, but the ftp server is now subscription only. If you have some patience, I am about to submit a paper containing a compound thesaurus build from ChEBI and KEGG. Depending on what you ultimately want to do with it we could discuss making it available.
You might want to move this to an answer instead of a comment in order for it to be marked as accepted. If you don't see the box to type an answer, click on the "Answer" symbol below. There are still some design problems with this site...
, that is really interesting, I would certainly like to use your thesaurus if possible. My aim is really a simple pubmed search for “drug name Or alias1 Or alias 2 etc” AND yeast. I understand, as has been pointed out alot of pruning may have to be done.
Ah, but drug names are not included as such in my thesaurus as the drug synsets should also contain commercial product names and I am more interested in metabolic compounds. Drugbank (http://www.drugbank.ca) might be a solution to that, and they do provide a download option for all entries in XML format.
Are you by the way familiar with the restrictions PubMed places on hightroughput activities? They do provide programmatic access for this kind of things with the entrez eutils (http://www.ncbi.nlm.nih.gov/books/NBK25500/)