Feedback

type to search

Tanimoto("raven", "writing desk")

Asked by [ Editor ]

(For background on the title, and use of the term 'bicycle' here, see http://www.straightdope.com/columns/read/1173/why-is-a-raven-like-a-writing-desk .)

Suppose fingerprint fp has no bits set. What should Tanimoto(fp, fp) be? There's three answers I've come across:

  1. 0.0, since neither are like a bicycle (this is what I do)
  2. 1.0, since both are equally like a bicycle (this is what OpenEye and RDKit do)
  3. +infinity (this is what OpenBabel and CDK do)

I justify my answer in the context of a search results. If the query has no bit set then there should be no preference for any solution, while #2 and #3 would always sort with the other targets which have no bits set.

However, I'm obviously a minority here. Does anyone have any practical experience with this? (Knowing already that 0 bits set isn't practical.)

Imported from: http://blueobelisk.stackexchange.com/questions/217

NN comments
andrew dalke
-

Okay, I’ll change my code to make it return 1.0, in agreement with OpenEye, RDKit and the upcoming change to CDK, and everyone’s responses.

or Cancel

2 answers

2

rajarshi guha [ Editor ]

I'd go with 1.0 - the two things being compared are both identical in the context of the fingerprint. I'm not sure why you would consider a comparison wrt an external object (the bicycle)

NN comments
chem-bla-ics
-

I second that. Another nice corner case not covered by the CDK… should write a unit test… infinity is wrong anyway…

andrew dalke
-

It’s a bicycle fingerprint. Bit 0 is if the object has at least 1 wheel, bit 1 is if there are exactly two wheels, bit 2 is if there’s a saddle or seat, bit 3 for steering, bit 4 for gears, bit 5 for brakes, bit 6 for a flag coming off the saddle, bit 7 if it’s solely powered by humans. Both “raven” and “writing desk” have fingerprints of 0x00. a Unicycle has fingerprint 0x85, my city bike is 0xbf, and my fiancee’s dream recumbent is 0xbd.

andrew dalke
-

A modern push scooter using roller blade wheels is 0xab, and the scooter I had as a kid, made from an old roller skate, is 0x81. A car is 0x3d. And I’m having entirely too much fun with this.

or Cancel
1

baoilleach [ Admin ]

"To infinity...and beyond!". That said, probably choice 1 is the best one in the context of similarity (is this different if interested in dissimilarity?).

NN comments
andrew dalke
-

The translation being that the compounds are similar because they are dissimilar from the structures expected by the fingerprint method?

baoilleach
-

I am saying that if you are interested in finding compounds similar to a query compound that has no bits set, a value of 0.0 might be appropriate when comparing two fingerprints with no bits set. However, if you are looking to select a diverse set from a database, a value of 1.0 might be better to avoid selecting both. (Does this even make sense? I don’t know)

andrew dalke
-

The only way to have this problem is if the query compound has no bits set. In that case everything which has at least one bit set will have a Tanimoto score of 0.0. If you define the Tanimoto of zero bits with zero bits as 1.0 then a diverse …. Ahh, I get it. It prevents those from being selected in a diverse context, while using 0.0 would mean there’s a chance of selecting one of those compounds. To be fair, a low chance.

or Cancel

Your answer

You need to join Blue Obelisk eXchange to complete this action, click here to do so.