JChem Base Structure Checker MadFast Similarity Search Standardizer ChemLocator Poster

Chemical Intelligence That Makes Hidden Knowledge Effortlessly Reachable

Posted by

on 2019-09-12

JChem Base Structure Checker MadFast Similarity Search Standardizer ChemLocator Poster

Chemical Intelligence That Makes Hidden Knowledge Effortlessly Reachable

The knowledge, that is being produced and stored in the forms of reports, patents and scientific journal articles is expanding exponentially. Although, the unstructured nature of such contents impose constraints for seamless information access and scientific decision support. Chemistry is a unique field in this regard, for two reasons. First, the nomenclature is verbose in a sense that a chemical structure can be represented with various synonyms, for example traditional name, IUPAC name or a wide range of brand names or chemical formats (SMILES). Second, the navigation in the knowledge base, with queries related to the encapsulated chemical space, calls for peculiar search methods like similarity-based or substructure searches.

Our study highlights computational approaches to turn chemistry related knowledge stored in all the open access articles easily accessible (Fig. 1.). We present our results obtained on this large corpus through the following workflow: i) large-scale conversion of text content to chemical objects, ii) automated preparation of databases to store and organize relevant data, and iii) analysis of the collected chemistry space. Extraction of chemical objects was done from nearly 1.9M articles that stretches the chemical space of open access scientific literature with ChemLocator application. Chemical space was analysed with calculation of fingerprint-based chemical similarity matrix and clustering by MadFast Similarity Search. In order to explore the scaffold diversity of this exclusive chemical space, the obtained set was fragmented to yield rings and ring systems. Hidden relationships were explored by combining text and chemical information in graph data model and related visualization. In summary, our use-case highlights the potential of novel technologies to pre-process, search and explore the information network enfolded in large document sets on the field of chemistry.

Our aim is to provide a method to easily access and explore the chemical space of large scientific knowledge bases stored in scientific articles, patents or reports. Chemistry is a unique field in this regard because chemical structures can be represented with various synonyms; moreover, navigating the knowledge base and the encapsulated chemical space requires special search methods like similarity or substructure searches. Our study highlights computational approaches to turn chemistry related knowledge stored in all the Open Access articles easily accessible. Methods based on chemical similarity and graph databases are introduced to explore and analyze the content at various levels from a chemist's point of view.

Download the poster: Chemical Intelligence That Makes Hidden Knowledge Effortlessly Reachable

Facebook Twitter LinkedIn

Copy to clipboard Copy link

Download the poster: Chemical Intelligence That Makes Hidden Knowledge Effortlessly Reachable

Marvin

The new Marvin is a universal chemical editor that serves the needs of any chemist involved in research and drug discovery.

Design Hub

Your molecular design and tracking platform turning drug discovery into a team sport.

Compound Registration

Compound Registration compares the uniqueness of new small molecules against those already stored in your database.

Design Hub

Chemical Intelligence That Makes Hidden Knowledge Effortlessly Reachable

Chemical Intelligence That Makes Hidden Knowledge Effortlessly Reachable

Related content

Chemical Descriptors & Standardizers for Machine Learning Models - Cheminfo Stories APAC 2020

Design, Synthesis, and Characterization of Novel Small Molecules as Broad Range Antischistosomal Agents

JChem Engines

Webinar: Chemical Descriptors and Standardizers for Machine Learning Models