Assignment of EC Numbers to Enzymatic Reactions with MOLMAP Reaction Descriptors and Random Forests

publication · 7 years ago
by Diogo A. R. S. Latino, João Aires-de-Sousa (Universidade Nova de Lisboa)
The MOLMAP descriptor relies on a Kohonen SOM that defines types of covalent bonds on the basis of their physicochemical and topological properties. The MOLMAP descriptor of a molecule represents the types of bonds available in that molecule. The MOLMAP descriptor of a reaction is defined as the difference between the MOLMAPs of the products and the reactants and numerically encodes the pattern of changes in bonds during a chemical reaction. In this study, a genome-scale data set of enzymatic reactions available in the KEGG database was encoded by the MOLMAP descriptors and was explored for the assignment of the official EC number from the reaction equation with Random Forests as the machine learning algorithm. EC numbers were correctly assigned in 95%, 90%, and 85% (for independent test sets) at the class, subclass, and subsubclass EC number level, respectively, with training sets including one reaction from each available full EC number. Increasing differences between training and test sets were explored, leading to decreased percentages of correct assignments. The classification of reactions only from the main reactants and products was obtained at the class, subclass, and subsubclass level with accuracies of 78%, 74%, and 63%, respectively.
Visit publication