Development and Comparison of hERG Blocker Classifiers: Assessment on Different Datasets Yields Markedly Different Results
In recent years, considerable effort has been invested in the development of classification models for prospective hERG inhibitors, due to the implications of hERG blockade for cardiotoxicity and the low throughput of functional hERG assays. We present novel approaches for binary classification which seek to separate strong inhibitors (IC50<1 µM) from ‘non-blockers′ exhibiting moderate (1–10 µM) or weak (IC50≥10 µM) inhibition, as required by the pharmaceutical industry. Our approaches are based on (discretized) 2D descriptors, selected using Winnow, with additional models generated using Random Forest (RF) and Support Vector Machines (SVMs). We compare our models to those previously developed by Thai and Ecker and by Dubus et al. The purpose of this paper is twofold: 1. To propose that our approaches (with Matthews Correlation Coefficients from 0.40 to 0.87 on truly external test sets, when extrapolation beyond the applicability domain was not evident and sufficient quantities of data were available for training) are competitive with those currently proposed in the literature. 2. To highlight key issues associated with building and assessing truly predictive models, in particular the considerable variation in model performance when training and testing on different datasets.