Recent developments in the CLiDE tool for extraction of chemical structure data from patents and other documents

presentation · 8 years ago
by Peter Johnson, Anikó T. Valkó, Vilmos A. Valkó (University of Leeds, Keymodule)
Chemists routinely communicate information about structures and reactions in the form of 2D structure diagrams, which are easily understood by readers but are not directly accessible for processing by chemical information systems which require a connection table format for chemical structures. CLiDE is an established optical chemical structure recognition (OCSR) tool that aims to address this problem. Recent improvements to the CLiDE system will be presented, including the way CLiDE processes different types of documents such as patents, journal articles and internal reports. New methods for tackling problematic scenarios originating from document quality degradation and difficult drawing features will be discussed as will improvements in the chemical intelligence of CLiDE's structure checker and structure fixer modules. These improvements have a considerable effect on CLiDE's accuracy and speed. A detailed study of CLiDE's performance on some widely available datasets will be presented alongside that of some publically available OCSR systems. Download slides