New Software Developments on Chemical Information Extraction from Patent Documents and Markush Structure Analysis
ChemAxon has recently released Document to Structure to extract chemical structures from documents. The latest version adapted text OCR and can even work on non-searchable PDF document (MS Office document support in next release). All chemical names found in the document are converted to structures with location information. Bundled with existing optical recognition technology, these combined offers a useful tool for patent mining. As a demonstration of its applications, a public website (chemicalize.org) has been setup with an interactive interface for chemical information visualization and extraction from documents. ChemAxon also improved its Markush search and enumeration technology, and the full patent Markush database from Thomson Reuters can now be searched on Amazon cloud. The new improvements will also be introduced in this presentation.
Presented at PIUG 2012.