What tools are available for clustering small molecules ? (I have a set of say 1000 small molecules and I want to select a non-redundant diverse set)
Noel O'Boyle [ Admin ]
To my mind, diverse set selection is best done directly using the distance matrix. The Kennard-Stone algorithm, for example, is easy to understand and easy to implement. You just start with the pair of molecules most distant, and keep
selecting additional molecules that are most distant to the selected
ones. Stop when you have enough diverse molecules. I don’t know of any available tool for this though.
By the way, if you are selecting a diverse set for use as a training set, you may want to reconsider – the results on the test set will be overly optimistic if you use anything but a random set.