Maximum Common Substructure-Based Data Fusion in Similarity Searching
Data fusion has been shown to work very well when applied to ﬁngerprint-based similarity searching, yet little is known of its application to maximum common substructure (MCS)-based similarity searching. Two similarity search applications of the MCS will be focused on here. Typically, the number of bonds in the MCS, as well as the bonds in the two molecules being compared, are used in a similarity coeﬃcient. The power of this technique can be extended using data fusion, where the MCS similarities of a set of reference molecules against one database molecule are fused. This “group fusion” technique forms the ﬁrst application of the MCS in this work. The other application is that of the chemical hyperstructure. The hyperstructure concept is an alternative form of data fusion, being a hypothetical molecule that is constructed from the overlap of a set of existing molecules. This paper compares ﬁngerprint group fusion (extended-connectivity ﬁngerprints), MCS similarity group fusion, and hyperstructure similarity searching, and describes their relative merits and complementarity in virtual screening. It is concluded that the hyperstructure approach as implemented here is less generally eﬀective than conventional ﬁngerprint approaches.