Efficient Heuristics for Maximum Common Substructure Search
Maximum common substructure search is a computationally hard optimization problem with diverse applications in the field of cheminformatics, including similarity search, lead optimization, molecule alignment, and clustering. Most of these applications have strict constraints on running time, so heuristic methods are often preferred. However, the development of an algorithm that is both fast enough and accurate enough for most practical purposes is still a challenge. Moreover, in some applications, the quality of a common substructure depends not only on its size but also on various topological features of the one-to-one atom correspondence it defines. Two state-of-the-art heuristic algorithms for finding maximum common substructures have been implemented at ChemAxon Ltd., and effective heuristics have been developed to improve both their efficiency and the relevance of the atom mappings they provide. The implementations have been thoroughly evaluated and compared with existing solutions (KCOMBU and Indigo). The heuristics have been found to greatly improve the performance and applicability of the algorithms. The purpose of this paper is to introduce the applied methods and present the experimental results.