ICCS 2022 - Translating data to predictive models
Biological, chemical and physical properties of molecules are encoded in their molecular structure. The challenge lies in discovering the relationships between the molecular graphs and the measured activity. Where data is measured, collected and curated for a series of compounds there is an opportunity to find the hidden relationships.
Chemical structures come in various shapes and sizes, depending on the scientists or even algorithms that create them. Though variability may sometimes seem subtle to a trained chemist’s eyes, these can introduce inconsistencies that impair chemical search algorithms or model building. Structure normalization is a key component of any cheminformatics workflow with an often underestimated significance. Finding relationships between chemical structures and their measured properties primarily relies on the representation of the chemical matter. Variability of the calculated features and descriptors for these representations can influence data analysis and accuracy of the predictions. During the first part of the presentation we will present the effect of chemical normalization on investigating correlations and building predictive models.
The second part of the talk will incorporate the results of model building on 163 ChEMBL targets extracted from the bioactivity benchmark set1. Results with different descriptor generation methods including ECFP fingerprints, MACCS key, structural properties, geometry properties and phy-chem properties will be discussed in detail. This part focuses on summarizing the results of more than 3000 Random Forest models. Finally model development for ADMET targets will be highlighted including hERG cardiotoxicity prediction, permeability and blood brain barrier penetration. We will describe how these models can be built, analyzed, optimized and deployed using our new machine learning platform.
Related content
Automated model building using only relevant features
Accelerate your drug discovery process: Create multiple models effortlessly with Trainer Engine and...
Building machine learning models using relevant features
Building accurate ML models with relevant features and Boruta algorithm. Feature selection,...
Cheminfo Stories Virtual UGM 2021 Asia Pacific Edition: Deep dive in the future of chemical patent drafting and in-house IP management
Writing chemical patents with Markush claims is a time-consuming, complex and business-critical...
Cheminfo Stories 2021 Virtual UGM Asia Pacific Edition: Design of new compounds from the available chemical space
In computational compound design workflows, the analysis of the available chemical space is an...