Hi, given a circular fingerprint, such as ECFP’s what is the accepted way to evaluate the similarity between two such fingerprints? My first thought is that the features (usually unsigned int’s derived from atom environments) get mapped into a fixed length bit strings, after which one proceeds as usual.
nina
[ Editor ]
One can calculate Tanimoto scores without explicit mapping to a bit string, as the Tanimoto formula asks for the number of features / number of common features, not necessary bits.
For example the
usual formula can be used Tanimoto=common(NA,NB)/( NA+NB-common(NA,NB)), where NA is
the number of fragments (
(circular fingerprints) in molecule A, NB is the number of fragments in
molecule B and common(NA,NB) is the number of common fragments between the two molecules.
This means that the length of the (implicit) bit string will be different for different pairs of molecules.
Thanks. But actually doesn’t this approach mean thtat the length of the ‘implicit’ bit string is actually constant and very large (essentially, one bit for each possible feature that can be generated)?
The bitstring representation makes one think that the 0s have some meaning. The Tanimoto coefficient is for comparing sets; we are comparing one set of fragments with another set of fragments.
Yes indeed. But 0s might be interpreted as “missing fragments”, if the bit string corresponds to the presence of particular structural features.
.nina This is only for bit fingerprints, correct? If so, can you extend your answer with a formula for count fingerprints?
The formula should be applicable to counts as well, if we recall the Tanimoto is about comparing sets (including multisets – sets where members can be repeated)
Tanimoto= number-of-members(intersectionofsets(fragments-in-A,fragments-in-B))/( NA+NB-numberofmembers(intersectionofsets(fragments-in-A,fragments-in-B)))
e.g. if fragments-in-A = {c,c,c,c,n,o} and fragments-in-B = {c,c,c,o,o} then intersectionofsets(fragments-in-A,fragments-in-B) = {c,c,c,o} and Tanimoto = 4 / ( 6 + 5 – 4) = 0.57