Deep Architectures and Deep Learning in Chemoinformatics: The Prediction of Aqueous Solubility for Drug-Like Molecules
Shallow machine learning methods have been applied to chemoinformatics problems with some success. As more data becomes available and more complex problems are tackled, deep machine learning methods may also become useful. Here, we present a brief overview of deep learning methods and show in particular how recursive neural network approaches can be applied to the problem of predicting molecular properties. However, molecules are typically described by undirected cyclic graphs, while recursive approaches typically use directed acyclic graphs. Thus, we develop methods to address this discrepancy, essentially by considering an ensemble of recursive neural networks associated with all possible vertex-centered acyclic orientations of the molecular graph. One advantage of this approach is that it relies only minimally on the identi?cation of suitable molecular descriptors because suitable representations are learned automatically from the data. Several variants of this approach are applied to the problem of predicting aqueous solubility and tested on four benchmark data sets. Experimental results show that the performance of the deep learning methods matches or exceeds the performance of other state-of-the-art methods according to several evaluation metrics and expose the fundamental limitations arising from training sets that are too small or too noisy. A Web-based predictor, AquaSol, is available online through the ChemDB portal (cdb.ics.uci.edu) together with additional material.