Hello all,
I'd like to understand what do people expect when they are making similarity search using different chemistry database cartridges. Opposite to substructure search which is quite strict and well defined, similarity search is not so well defined in terms that you cannot predict exact similarity measure between different molecules when you are making search. So the search is made by trial and error to find similar molecules that you need. But what do people really need when they are doing similarity search? Can you give some real use cases?
Usually molecule similarity is measured by fingerprints similarity, but there are different types of fingerprints. This leads to a another question: is similarity measure based on fingerprints is what do people need? When you are making a similarity search, I expect you don't really need "molecules with fingerprint tanimoto distance less then 0.312", but you need similar molecules in some different sense. For example, there are well defined graph distances and maybe it is better to use them? But they are computationally harder. Do you know some toolkits, applications or etc. that measures molecule similarity not based on fingerprints?
If fingerprints are really what people need then what kind of fingerprint should be used? Should molecules size or features count also be encoded into fingerprint?
My main question is to understand in what direction can we extend/improve similarity search in our database cartridge Bingo. Currently we are using our developed fingerprints and shortly then can be described in the following way: "Bingo fingerprints, as compared to Daylight fingerprints, are built not from bond paths, but from trees and rings". Sorry for a long question.