Chemistry-enriched patent curation - automatized chemical and semantic analysis and elaboration of large patent sets
Currently, analysis of large patent sets is a tedious and cumbersome work. In order to improve and speed up this process we developed a patent curation-workflow, in which relevant chemical information, such as Markush structures and chemical compound collections (e.g. exemplified structures), are extracted from a patent set and successively enriched with text-mining retrieved data in semi-automatic manner. The outputs of cheminformatic, OCR/OSR and text-mining tools are combined by means of KNIME and the joined data are finally visualized side by side with the original documents using the ChemCurator application. As well as advanced visualization capabilities ChemCurator offers essential functions for validation and manual refining of the automatically extracted chemical information. The created project specific content gives a solid information base of value to any phase of a drug discovery project.