Predicting persistence in the sediment compartment with a new automatic software based on the k-Nearest Neighbor (k-NN) algorithm
The ability of a substance to resist degradation and persist in the environment needs to be readily identified in order to protect the environment and human health. Many regulations require the assessment of persistence for substances commonly manufactured and marketed. Besides laboratory-based testing methods, in silico tools may be used to obtain a computational prediction of persistence. We present a new program to develop k-Nearest Neighbor (k-NN) models. The k-NN algorithm is a similarity-based approach that predicts the property of a substance in relation to the experimental data for its most similar compounds. We employed this software to identify persistence in the sediment compartment. Data on half-life (HL) in sediment were obtained from different sources and, after careful data pruning the final dataset, containing 297 organic compounds, was divided into four experimental classes. We developed several models giving satisfactory performances, considering that both the training and test set accuracy ranged between 0.90 and 0.96. We finally selected one model which will be made available in the near future in the freely available software platform VEGA. This model offers a valuable in silico tool that may be really useful for fast and inexpensive screening.