pKa Prediction of Monoprotic Small Molecules the SMARTS Way

publication · 7 years ago
by Adam C. Lee, Jing-yu Yu, Gordon M. Crippen (University of Michigan)
MarvinView Calculator Plugins (logP logD pKa etc...)
Realizing favorable absorption, distribution, metabolism, elimination, and toxicity profiles is a necessity due to the high attrition rate of lead compounds in drug development today. The ability to accurately predict bioavailability can help save time and money during the screening and optimization processes. As several robust programs already exist for predicting logP, we have turned our attention to the fast and robust prediction of pKa for small molecules. Using curated data from the Beilstein Database and Lange’s Handbook of Chemistry, we have created a decision tree based on a novel set of SMARTS strings that can accurately predict the pKa for monoprotic compounds with R2 of 0.94 and root mean squared error of 0.68. Leave-some-out (10%) cross-validation achieved Q2 of 0.91 and root mean squared error of 0.80.
Visit publication