Molecular Docking for Substrate Identification: The Short-Chain Dehydrogenases/Reductases
Protein ligand docking has recently been investigated as a tool for protein function identification, with some success in identifying both known and unknown substrates of proteins. However, identifying a protein's substrate when cross-docking a large number of enzymes and their cognate ligands remains a challenge. To explore a more limited yet practically important and timely problem in more detail, we have used docking for identifying the substrates of a single protein family with remarkable substrate diversity, the short-chain dehydrogenases/reductases. We examine different protocols for identifying candidate substrates for 27 short-chain dehydrogenase/reductase proteins of known catalytic function. We present the results of docking > 900 metabolites from the human metabolome to each of these proteins together with their known cognate substrates and products, and we investigate the ability of docking to (a) reproduce a viable binding mode for the substrate and (b) to rank the substrate highly amongst the dataset of other metabolites. In addition, we examine whether our docking results provide information about the nature of the substrate, based on the best-scoring metabolites in the dataset. We compare two different docking methods and two alternative scoring functions for one of the docking methods, and we attempt to rationalise both successes and failures. Finally, we introduce a new protocol, whereby we dock only a set of representative structures (medoids) to each of the proteins, in the hope of characterising each binding site in terms of its ligand preferences, with a reduced computational cost. We compare the results from this protocol with our original docking experiments, and we find that although the rank of the representatives correlates well with the mean rank of the clusters to which they belong, a simple structure-based clustering is too naïve for the purpose of substrate identification. Many clusters comprise ligands with widely varying affinities for the same protein; hence important candidates can be missed if a single representative is used.Visit publication"