Comparative analysis of local and consensus quantitative structure-activity relationship approaches for the prediction of biocon
Quantitative structure-activity relationships (QSARs) are broadly classified as global or local, depending on their molecular constitution. Global models use large and diverse training sets covering a wide range of chemical space. Local models focus on smaller structurally or chemically similar subsets that are conventionally selected by human experts or alternatively using clustering analysis. The current study focuses on the comparative analysis of different clustering algorithms (expectation-maximization, K-means and hierarchical) for seven different descriptor sets as structural characteristics and two rule-based approaches to select subsets for designing local QSAR models. A total of 111 local QSAR models are developed for predicting bioconcentration factor. Predictions from local models were compared with corresponding predictions from the global model. The comparison of coefficients of determination (r 2) and standard deviations for local models with similar subsets from the global model show improved prediction quality in 97% of cases. The descriptor content of derived QSARs is discussed and analyzed. Local QSAR models were further consolidated within the framework of consensus approach. All different consensus approaches increased performance over the global and local models. The consensus approach reduced the number of strongly deviating predictions by evening out prediction errors, which were produced by some local QSARs.