Feedback

type to search

How do I create a Molecular Similarity plot?

Asked by [ Editor ]

Can anyone point me into the right direction as to the protocols that is needed to create a (1) Molecular Similarity plot of the chemical space and a (2) Molecular Similarity plot of the biological space.

I am aware that we could either use fingerprints or molecular descriptors to describe the physicochemical properties of the compounds, what steps would be next? (besides using these descriptors as input to PCA or Self-Organizing Maps).

Any comments or suggestions is greatly appreciated. :)

NN comments
chaninmt
-

For the purpose of this question, chemical space would refer to the landscape plot of the chemical structures in order to discern the structural diversity. Biological space would incorporate information on the biological activities.

or Cancel

5 answers

2

joergkurtwegner [ Editor ]

I personally like graphs, since they allow to mix chemical and biological spaces in any way. Here a chemical similarity visualization I created using OpenBabel, CytoScape (integrates yEd I mentioned in the article), and some Python scripting. Besides, CytoScape is a very nice tool for initial data exploration, since it allows filtering networks on multiple node and edge properties. In short, if you create a full similarity matrix and use an edge filtering (e.g. chemical similarity) between two nodes (e.g. chemical compounds) then you obtain automatically a graph clustering. This is providing a lot of insights and is something I really like for complex data sets. Expanding this with biological information should now be a piece of cake and please note the linking options of CytoScape for biological web-services. 

The only ‘critical’ thing you need for getting started is knowing the simplest supported input format, which is 
NODEid INTERACTION NODEid EdgeProp1 EdgeProp2 … EdgePropX
I would not worry about edge and node properties, you can always map them latter, which is actually recommended and making the data handling much easier.
NN comments
chaninmt
-

Thanks for an interesting and insightful answer. I will definitely give your approach a try.

To create a combined chemical/biological space graph, one would first go about creating the graph clustering of the chemical space first using the full similarity matrix as input, then incorporation of the biological information. Can you elaborate on the latter part (incorporation of biological information)?

Thanks in advance!

joergkurtwegner
-
It’s quite easy, just create a similarity matrix and use a certain similarity cutoff. Every nodes being still connected (they have a similarity above the cutoff) are already forming a cluster. As soon as you group nodes with a forced layout algorithm (included in most packages) similar nodes will ‘cluster’ together. It is simple and of stunning beauty beating many ‘classical’ clustering algorithms on the ‘cluster’ quality aspects, since there is almost no heuristic involved (beside the cutoff).

or Cancel
1

imants

Here is one simple way. Say you have 25,000 molecule structures. Choose a similarity treshold (0.8, for example) and calculate for each structure similar structures. Then you can use Gephi, for example, to plot data. That works for structure comparisons only though.

NN comments
chaninmt
-

Thanks for the tip. Gephi looks quite visually appealing. I am trying it out to see how it is.

or Cancel

Your answer

You need to join Blue Obelisk eXchange to complete this action, click here to do so.