Feedback

type to search
  • Current version

    Back

    Hello,

    There are two options with Indigo toolkit.

    1. You could use Oracle (which is free for non-commercial use, albeit not open-source), installing Bingo cartridge on it. You could then import SMILES or SDF data set into your database, index your table, and perform similarity search with Bingo. The indexing will take some time, but the search results will come up very fast.
    2. You could write up a command-line utility basing on Indigo's code base (for example, taking the Bingo source code). From the 10GB dataset size, I assume you have an SDF file, right? Then this example application should be almost the thing you need. And here is another example showing different similarity metrics between two SMILES structures. The disadvantage will be that the program will recalculate the fingerprints of the molecules in the dataset 100 times (as you are willing to run 100 similarity searches). This can be managed by pre-calculating the fingerprints and storing them somewhere. Or using Bingo, which does this job on Oracle.

    With best regards,

    Dmitry

    typo fix

    Dmitry Pavlov

    Previous Version summary
  • Version 0

    Current

    Hello,

    There are two options with Indigo toolkit.

    1. You could use Oracle (which is free for non-commercial use, albeit not open-source), installing Bingo cartridge on it. You could then import SMILES or SDF data set into your database, index your table, and perform similarity search with Bingo. The indexing will take some time, but the search results will come up very fast.
    2. You could write up a command-line utility basing on Indigo's code base (for example, taking the Bingo source code). From the 10GB dataset size, I assume you have an SDF file, right? Then this example application should be almost the thing you need. And here is another example showing differend similarity metrics between two SMILES structures. The disadvantage will be that the program will recalculate the fingerprints of the molecules in the dataset 100 times (as you are willing to run 100 similarity searches). This can be managed by pre-calculating the fingerprints and storing them somewhere. Or using Bingo, which does this job on Oracle.

    With best regards,

    Dmitry

    updated links

    Dmitry Pavlov

or Back
You need to join Blue Obelisk eXchange to complete this action, click here to do so.