MOTIVATION: The metabolites of exogenous and endogenous compounds play a pivotal role in the domain of metabolism research. However, they are still unclear for most chemicals in our environment. The in silico methods for predicting the site of metabolism (SOM) are considered to be efficient and low-cost in SOM discovery. However, many in silico methods are focused on metabolism processes catalyzed by several specified Cytochromes P450s, and only apply to substrates with special skeleton. A SOM prediction model always deserves more attention, which demands no special requirements to structures of substrates and applies to more metabolic enzymes.
RESULTS: By incorporating the use of hybrid feature selection techniques (CHI, IG, GR, Relief) and multiple classification procedures (KStar, BN, IBK, J48, RF, SVM, AdaBoostM1, Bagging), SOM prediction models for six oxidation reactions mediated by oxidoreductases were established by the integration of enzyme data and chemical bond information. The advantage of the method is the introduction of unlabeled SOM. We defined the SOM which not reported in the literature as unlabeled SOM, where negative SOM was filtered. Consequently, for each type of reaction, a series of SOM prediction models were built based on information about metabolism of 1237 heterogeneous chemicals. Then optimal models were attained through comparisons among these models. Finally, independent test set was used to validate optimal models. It demonstrated that all models gave accuracies above 0.90. For receiver operating characteristic analysis, the area under curve values of all these models over 0.906. The results suggested that these models showed good predicting power.