PSA compares quite well – see http://twitpic.com/3u2lm7 for a comparison on ~ 57K molecules using the latest CDK master and v12 of ACD Physprop. The CDK still had some atom type issues on a few molecules, so the PSA’s for those cases might be wrong.
CDKs logP performance is much poorer – see
http://twitpic.com/3u2os5. The AlogP implementation really needs updating. The XlogP implementation is slightly better (R
2 = 0.43). But I will note that even ChemAxons logP exhibits an R
2 of 0.56 with ACD
Update
Based on Antony's pointing out that a better comparison is with experimental data, I used some measured logP data (~ 10K compounds, but I cannot release values or structures). ACD does significantly better than CDKs XLogP -
http://twitpic.com/3uxs1c.
If I remember correctly, the ACD model is a CNN. Out of curiosity I ran a quick random forest model, using CDK topological and constitutional descriptors and only minimal feature selection (to remove descriptors with undefined values) -
http://twitpic.com/3uxufn. Much better than XLogP - and not bad at all for minimal effort. With proper feature selection and moving to a CNN/SVM etc, I expect one might get close to the ACD performance
This is the goal and scope of CDK News. A few CDK News papers have been published where such results are detailed, but I would love this widened. The incentive is typically missing, and there has been no scientific reward in publishing such results. This is what makes the new Open Research Computation so interesting!