I hope that the question is self-explaining.
Egon Willighagen
[ Admin ]
You can also look a this from another perspective. Go for the quantity, and do the validation yourself. Actually, you would be wise to do this for the smaller, curated databases too. Your particular use case is rarely an exact match with the database; therefore, you are likely in the situation where you have to preprocess your input anyway. Why not just include the validation in that process?
This validation is pretty easy to do with one of the many Open Source utilities around. For example, the CDK can validate many aspects, and do other kind of filtering. Diversity analysis will not be a problem with the free tools around either (e.g. What tools are available for clustering
small molecules ?).
I partially agree. But… How can you validate with open source tools (or commercial tools) if the level of ambiguity is too high at the origin? Look at these structure: http://alchemoinformatics.blogspot.com/2010/05/which-is-real-ru-486-2.html. Is this a major issue or not?
@matteo floris This paper just came out: “Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research” –> http://dx.doi.org/10.1021/ci100176x