Hi there,
Mikhail Rybalkin
[ Editor ]
from Portland, United States of America
As you want to find substruture that is frequent in A and not frequent in B, then at least it should be frequent in A. I suggest you to look at MoSS (Molecular Substructure Miner) implementation from Christian Borgelt: http://www.borgelt.net/moss.html
Using MoSS you can find all subgraphs with a predefined frequency level. After that you can filter this substructures by the number of matches in the set B.
On that page you can download source code, or program executable. Substructure filtering you can do in various ways, including writing a script with our cheminformatics toolkit called Indigo.
Also this be done in KNIME (open source visual workflow engine: http://www.knime.org/ ), where you can just install Chemistry extension there and use MoSS node interactively without writing any code. They have an example 005001_MoSS that reads set of structures, and finds frequent substructures. I never worked with MoSS and do not know how to parameters and etc., but I know that it may be useful for your case. I can explain, how to run MoSS example in KNIME, if necessary. In KNIME you can also do substructure filtering and a lot of other stuff.
If you want to use MoSS source code, then I think that CDK (Chemistry Development Kit) has also added that algorithm, and you can use it there, but I’m not sure.
Do not hesitate to ask about details, depending on what way are you going to solve this problem: via coding, or via KNIME.
Can you explain your problems in more details? At least why do you you need set B, because it is not mentioned in the problem statement. And what does it mean to be over-represented or enriched?
I have updated the question, thanks.