Document-to-Structure to be trilingual: Extract, display, and search chemical information within English, Chinese, and Japanese patents
By expanding Naming, a reliable chemical name-to-structure technology, ChemAxon has developed a suite of chemistry text mining tools. The core application is Document-to-Structure, which can extract chemical information from patent and other documents. Document-to-Structure includes numerous functions to overcome the challenges in patent mining:
- Implemented OCR technology for non-text patent document. A correction algorithm will identify OCR errors and correct the names before converting to structures.
- Easy Integration with different image-to-structure software to extract structure images.
- In addition to English, Asian language support for Chinese and Japanese patent mining.
- Annotate a document with a single mouse click: create a new document with chemical information "annotated". Mouse over the chemical name to display the structures.
- With ChemAxon's chemistry search function, the extracted structure information can be searched, which makes identifying a compound in a patent document much faster and easier.