Computer-assisted chemical information extraction and analysis

ChemCurator is a desktop application for extracting chemical information and compounds. The smart tool easily extracts Markush structures and related assay data from English, Chinese and Japanese patents, journal articles, and similar documents. The application combines ChemAxon’s text mining, structure handling and Markush technologies; with the ability to highlight structures in documents and allow users to interactively assemble Markush structures in a semi-automated way.

Highlighting chemistry in your documents

ChemCurator is capable of recognizing and highlighting chemical information in a wide range of documents to help locate relevant data, in file formats like: HTML, XML and PDF. You can import documents from a local file, or directly from Google Patents or IFI Claims. Our Chemical Name and Structure Conversion technology supports English, Chinese and Japanese language name to structure conversion to extend the scope of research. Chemical structures, even in graphical file formats, can be recognized as well using the supported native CLiDE or OSRA tools for optical structure recognition (OSR).

Document chemical content side-by-side

The view of the original document is paired up with the view showing the extracted chemical information of the document (see screenshot below). Structures of the annotated document can be dragged from the Extraction View and dropped to the structure editor. The extracted chemical structures are linked to their original place in the document and vice versa. Select a structure in the Extraction View, and the Document View will automatically scroll down to the related page highlighting it. The Extraction View will jump to the selected structure when clicking on an annotated chemical name in the document.

Read more about the views in ChemCurator

ChemCurator's view to extract compounds from text

Manual correction

Real life documents quite often contain optical character recognition (OCR) errors and typos. The built in OCR error correction technology of our Name to Structure conversion can fix most of these errors, though sometimes the correction is not perfect. In ChemCurator any misrecognized or unrecognized structures can be easily fixed or altered manually.

Read more about error fixing

Working in a Team

Using the ChemCurator Integration server users can easily upload projects to a central in-house database or download other users projects instantly. All modifications are safely logged and can be restored to any previous state.

Read more about sharing options

Rapid chemical content extraction with a single click

Extracting chemical structures one-by-one can be extremely time consuming if the document covers hundreds of compounds - like most often chemical patents do. ChemCurator's Extraction Wizard can speed up the process. Based on substructure or similarity filters all of the relevant compounds can be extracted from any document with a single click.

Read more about data extraction

Export functions of ChemCurator