Exploring and visualisation of chemistry in patents with Marvin and Instant JChem

presentation · 8 years ago
by Alexander Klenner (Fraunhofer Institute for Algorithms and Scientific Computing)
Instant JChem
We present a grid-based solution for chemical named entity recognition (NER) in patent collections that are provided in PDF format. Our architecture identifies and extracts IUPAC and trivial names of chemical compounds and translates them into InChI keys that can subsequently be used to generate structures for each identified entity with Marvin. All structures are stamped into the original PDF as 'pop-up' chemicals together with hyperlinks to corresponding sites of chemspider and pubmed. A generated bookmark tree in the PDF allows access for all identified compounds. Additionally all retrieved chemicals are stored in a ChemAxon JChem database together with a reference to the original patent. JChem enables structural search for the processed patent collection and filtering options. The workflow is based on UIMA and can easily be adapted to incorporate different chemical NER tools. UNICORE is used to access grid resources for efficient parallelization of all processes. Download slides