Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research
With the recent advent of high-throughput technologies for both compound synthesis and biological screening, there
is no shortage of publicly or commercially available data
sets and databases1 that can be used for computational drug
discovery applications (reviewed recently in Williams et al.2).
Rapid growth of large, publicly available databases (such
as PubChem3 or ChemSpider4 containing more than 20
million molecular records each) enabled by experimental
projects such as NIH’s Molecular Libraries and Imaging
Initiative5 provides new opportunities for the development
of cheminformatics methodologies and their application to
knowledge discovery in molecular databases.
Visit publication