Compounds from 36 commercial supplier libraries and the NCI open database were analysed to address the bias in structural features for the selection of small molecules for high-throughput screening (HTS). Initially a meta dataset consisting of 11.8 million unique structures was identified from 15.6 million compounds by eliminating redundant molecules from individual libraries. Then the selection of the HTS compounds from these libraries was accomplished using common structural filters, physicochemical filters and recently emerged descriptors. Compound libraries from different suppliers were also analysed according to their exclusiveness, ‘drug-likeness’ and scaffold similarities (using asymmetrical metrics). The results show that large libraries offer the biggest pool of ‘drug-like’ molecules with an optimal trade-off between the diversity of chemotypes and the variety of analogous compounds for biological screening.

Visit publication