PSA compares quite well – see http://twitpic.com/3u2lm7 for a comparison on ~ 57K molecules using the latest CDK master and v12 of ACD Physprop. The CDK still had some atom type issues on a few molecules, so the PSA’s for those cases might be wrong.
CDKs logP performance is much poorer – see http://twitpic.com/3u2os5
. The AlogP implementation really needs updating. The XlogP implementation is slightly better (R2
= 0.43). But I will note that even ChemAxons logP exhibits an R2
of 0.56 with ACD
Based on Antony's pointing out that a better comparison is with experimental data, I used some measured logP data (~ 10K compounds, but I cannot release values or structures). ACD does significantly better than CDKs XLogP - http://twitpic.com/3uxs1c
If I remember correctly, the ACD model is a CNN. Out of curiosity I ran a quick random forest model, using CDK topological and constitutional descriptors and only minimal feature selection (to remove descriptors with undefined values) - http://twitpic.com/3uxufn
. Much better than XLogP - and not bad at all for minimal effort. With proper feature selection and moving to a CNN/SVM etc, I expect one might get close to the ACD performance