Drug discovery is an iterative process of hypothesis construction relying on observations and validation through triggering new observations mainly by synthesis of new chemical entities. During the evolution of an idea to reach selection for synthesis, evidence and prediction results are collected and assessed and scrutinized by the project group. Therefore, the success of recent drug design depends on how data is turned into information and how much knowledge is extracted out of it. Accordingly, attempts toward connecting data sources or making an even broader spectrum of data available in centralized data lakes with corresponding access engines operating on top drive contemporary development and represent a key trend. Powerful data analysis (like matched molecular pairs (MMP)) or instant search over large chemical datasets are highly demanded. Depending on the volume and quality of the raw data, model building approaches may play crucial role in the preprocessing steps. Data analytics platforms with supervised or unsupervised methods are applied like linear fitting, clustering, pattern recognition or neural networks. These models are moving beyond the raw information and the extracted correlations can be exploited on novel, hypothetical structures to judge them in a triaging phase, before deciding on synthesis.
Effective coordination of the hypotheses and compound series in projects where multiple groups are collaborating requires access to optimized and dynamically changing information. Accordingly, the major problem is the collection, grouping, management, and overview of the relevant information (ideas, calculated properties, related data from databases, graphics, comments, attachments, etc.) within a single application.
The goal of this presentation is to introduce the Design Hub (Marvin Live's successor) platform for integration of a wide variety of data sources and services to augment real-time design. Design Hub offers a vendor agnostic, real-time plugin system that can be configured to the current information needs. This allows the seamless integration of in-house databases, local models and workflow tools KNIME , Pipeline Pilot. We are presenting two use cases: first, we will show how an MMP analysis based on ChEMBL data can support designing out hERG liability. Second, we will exemplify the simultaneous and instant searching in various databases like ChEMBL, SureChEMBL, PubChem and vendor catalogs eMolecules, Mcule, Molport, Enamine). Utilizing novel search engines, this service provides results within seconds to a compound collection with a total size of >800M molecules. It supports estimation of freedom to operate, novelty and provides a quick insight to reagent and purchasable compound availabilities.