Computer-assisted chemical information extraction and analysis
ChemCurator is a desktop application for extracting chemical structures and related assay data from patents, journal articles and other documents in English, Chinese and Japanese. The application combines ChemAxon’s text mining, structure handling and Markush technologies; with the additional ability to recognize and highlight chemical structures in documents.
Highlighting chemistry in your documents
ChemCurator is capable of recognizing and highlighting chemical information in various documents to help locate relevant data, in file formats like HTML, XML and PDF. You can import documents from a local file, or directly from Google Patents or IFI Claims. Our Chemical Name and Structure Conversion technology supports English, Chinese and Japanese language name to structure conversion to extend the scope of research. Chemical structures, even in graphical file formats, can also be recognized using the supported third-party tools for optical structure recognition (OSR): CLiDE and OSRA.
Document chemical content side-by-side
The view of the original document is combined with the view showing the extracted chemical information of the document (see screenshot below). Chemical structures of the annotated document can be drag-and-dropped from the Extraction View to the Structure Editor. The extracted structures are linked to their original place in the document and vice versa. If you select a structure in the Extraction View, the Document View will automatically scroll down to the related page highlighting the corresponding part of the text.
Real-life documents often contain OCR (optical character recognition) errors and typos. The built-in OCR error correction technology of our Name to Structure conversion can fix most of these errors, though the correction is not perfect sometimes. In ChemCurator, any misrecognized or unrecognized structure can easily be fixed manually.
Working in a Team
Using the ChemCurator Integration server, users can easily upload projects to a central in-house database or download other users' projects instantly. All modifications are safely logged and can be restored to any previous state.
Rapid chemical content extraction with a few clicks
Extracting chemical structures one-by-one can be extremely time consuming if the document covers hundreds of compounds - like most chemical patents do. ChemCurator's Extraction Wizard can speed up the process. Based on substructure or similarity filters, all of the relevant compounds can be extracted from any document with a few clicks.