Virtual Exploration of the Chemical Universe up to 11 Atoms of C, N, O, F: Assembly of 26.4 Million Structures (110.9 Million Stereoisomers) and Analysis for New Ring Systems, Stereochemistry, Physicochemical Properties, Compound Classes, and Drug Discovery

publication · 10 years ago
by Tobias Fink, Jean-Louis Reymond (University of Berne)
MarvinView JChem Base
All molecules of up to 11 atoms of C, N, O, and F possible under consideration of simple valency, chemical stability, and synthetic feasibility rules were generated and collected in a database (GDB). GDB contains 26.4 million molecules (110.9 million stereoisomers), including three- and four-membered rings and triple bonds. By comparison, only 63857 compounds of up to 11 atoms were found in public databases (a combination of PubChem, ChemACX, ChemSCX, NCI open database, and the Merck Index). A total of 538 of the 1208 ring systems in GDB are currently unknown in the CAS Registry and Beilstein databases in any carbon/heteroatom/multiple-bond combination or as a substructure. Over 70% of GDB molecules are chiral. Because of their small size, all compounds obey Lipinski's bioavailability rule. A total of 13.2 million compounds also follow Congreve's “Rule of 3” for lead-likeness. A Kohonen map trained with autocorrelation descriptors organizes GDB according to compound classes and shows that leadlike compounds are most abundant in chiral regions of fused carbocycles and fused heterocycles. The projection of known compounds into this map indicates large uncharted areas of chemical space. The potential of GDB for drug discovery is illustrated by virtual screening for kinase inhibitors, G-protein coupled receptor ligands, and ion-channel modulators. The database is available from the author's Web page.
Visit publication