SMILES and OpenSMILES have a sp2 concept related to the user of lower case element symbols for the organic subset; they are aimed at aromaticity annotation. Now, last week I had another bug report around atom type detection in the CDK, and all fails I was emailed has basically this substructure in common: 'c1c2cc[nH]cc2nc1'.
Now, the first nitrogen in the SMILES (right one in the image) was causing the atom type not recognized, and the problem was that the CDK was looking at the number ring atoms and if the ring could be aromatic. (Yes, aromaticity again.)
That problem was easily fixed, leading to the next atom typing issue. The actual CDK atom type for the first nitrogen is N.planar3: a nitrogen with three neighbors and contribution a lone pair to a pi-system.
This brings me to me question. SP2 hybridization in the CDK typically implies that the atom has one electron available for pi-systems; the second nitrogen is like that, and that has the atom type N.sp2.
Therefore, my question is what SP2 hybridization in the context of (Open)SMILES means? Or, is the above SMILES actually valid, or should it really be 'c1c2cc[NH]cc2nc1' instead?
Imported from: http://blueobelisk.stackexchange.com/questions/289