In general, to get a complete answer for identical (or similar) molecules common to the 2 datasets, you’d need to do a full pairwise analysis. This applies either to isomorphism methods (to check for identical mols) or fingerprint methods (similarity calculation).
An alternative approach is to approximation methods, assuming you’re OK with a certain number of false positives. One possibility is to use something like geometric hashing to identify nearest neighbors – which should correspond to similar molecules. But even then you’d need to loop over each entry in the query database to find similar molecules in the target database.
One other approach could be to ‘map’ the target db onto the query db – using something like FastMAP
or even PCA, and then identify the nearest points in the projection.
Finally, if you’re target db supports parallel queries (say via sharding etc), then you could just chunk up the query set and run it in parallel. Yeah, it’s brute force :)