Maximum Common Substructure-Based Data Fusion in Similarity Searching

publication · 3 years ago
by Peter Willett, Edmund Duesbury, John Holliday (University of Sheffield)
Instant JChem

Data fusion has been shown to work very well when applied to fingerprint-based similarity searching, yet little is known of its application to maximum common substructure (MCS)-based similarity searching. Two similarity search applications of the MCS will be focused on here. Typically, the number of bonds in the MCS, as well as the bonds in the two molecules being compared, are used in a similarity coefficient. The power of this technique can be extended using data fusion, where the MCS similarities of a set of reference molecules against one database molecule are fused. This “group fusion” technique forms the first application of the MCS in this work. The other application is that of the chemical hyperstructure. The hyperstructure concept is an alternative form of data fusion, being a hypothetical molecule that is constructed from the overlap of a set of existing molecules. This paper compares fingerprint group fusion (extended-connectivity fingerprints), MCS similarity group fusion, and hyperstructure similarity searching, and describes their relative merits and complementarity in virtual screening. It is concluded that the hyperstructure approach as implemented here is less generally effective than conventional fingerprint approaches.

Visit publication