From chaos to order: Collecting chemical and biologic information in the documentation space

presentation · 7 years ago
by Daniel Bonniot de Ruisselet, David Deng (ChemAxon)
chemicalize.org Naming Document to Structure
Much chemical information is buried deeply and scattered in a chaos within documents. The structures may take different forms, as names (IUPAC, common, generic ΒΌ), strings (SMILES, InChI), images, numbers (CAS registry number, Enzyme EC numbers), or embedded objects. The documents may be proprietary or publicly available. They may also exist in various formats (PDF, images, PowerPoint slides, HTML, etc.) In this presentation, we demonstrate how ChemAxon goes beyond their Naming Technology and extract as much chemical and biological information as possible from documents. In addition, location information is also returned to help users pinpoint the specific structure entity. The extraction is automated and integrated with other ChemAxon applications for indexing and searching. A public web service (chemicalize.org) has been set up with an interactive interface for chemical information visualization and extraction from documents. Download slides