Feedback

type to search

How do I evaluate the effectiveness of molecular similarity methods?

Asked by [ Editor ]
(This is a concrete question on datasets and scoring calculations, not a request for understanding how similarity methods can be measured.)

For chemfp I’ve implemented interface to 8 or so different fingerprint generation methods in several different toolkits, plus come up with my own cross-platform variation of the PubChem/CACTVS fingerprints. 


Now I want to estimate their respective effectiveness for finding compounds which a chemist would agree is “similar.” For example, how effective are the RDKit circular fingerprints compared to OpenBabel’s FP2 fingerprint for task XYZ? 

There’s a huge number of published ways for doing this, and I don’t know the literature that well. Ideally I would like to implement a few of the most common comparisons … if only I knew what they were. 

Could you tell me how to evaluate fingerprint similarity effectiveness? Preferably in the form “download dataset T, generate fingerprints Tfp, for each query Q and fingerprint Qfp find … and score the results as …”
NN comments
pathri
-
.Dalke: Could you please post if you were able to get any such algorithms? Co-incidently, I had just started to think of this —  before I read your post, really! :)

or Cancel

2 answers

0

rajarshi guha [ Editor ] from Bethesda, United States of America

I think your approach to measuring effectiveness will guided by which of two tasks a fingerprint is being used for: finding similar structures (database screening) or finding active compounds given a query compound (virtual screening).


If you’re doing the latter you could try using the MUV datasets and measuring effectiveness by the ranks of the designated actives when the dataset is ordered by similarity to the active compound. Since each dataset has 15 (or 30?) actives, you can evaluate some sort of overall score. (MUV is probably a tough test case, since the datasets are designed to avoid problems associated with benchmarking 2D virtual screening methods). See slides 153-156 in http://www.slideshare.net/rguha/cheminformatics-in-r for an example.
or Cancel
0

baoilleach [ Admin ] from Dublin 4, Ireland

NN comments
baoilleach
-

Shapado gobbled my answer…

At Goslar, someone developed a new similarity method and then created a “phylogenetic tree” comparing it to different similarity methods. The point was that it captured a distinct measure of similarity.

rajarshi guha
-

Doe the tree measure the effectiveness compared to other methods or the relationship to other methods? Do you remember who this was?

rajarshi guha
-

Doe the tree measure the effectiveness compared to other methods or the relationship to other methods? Do you remember who this was?

or Cancel

Your answer

You need to join Blue Obelisk eXchange to complete this action, click here to do so.