Document-to-Structure to be trilingual: Extract, display, and search chemical information within English, Chinese, and Japanese patents

presentation · 6 years ago
by David Deng (ChemAxon)

By expanding Naming, a reliable chemical name-to-structure technology, ChemAxon has developed a suite of chemistry text mining tools. The core application is Document-to-Structure, which can extract chemical information from patent and other documents. Document-to-Structure includes numerous functions to overcome the challenges in patent mining:

  1. Implemented OCR technology for non-text patent document. A correction algorithm will identify OCR errors and correct the names before converting to structures.
  2. Easy Integration with different image-to-structure software to extract structure images.
  3. In addition to English, Asian language support for Chinese and Japanese patent mining.
  4. Annotate a document with a single mouse click: create a new document with chemical information "annotated". Mouse over the chemical name to display the structures.
  5. With ChemAxon's chemistry search function, the extracted structure information can be searched, which makes identifying a compound in a patent document much faster and easier.
This presentation will demonstrate various text mining applications, including extracting structures from chemical patents using Document-to-Structure; searching the patent structure database with Document-to-Database; and interactively displaying chemical information in patent documents with Document Annotation.

Open slides in pdf