Imago: Open-source toolkit for chemical structure image recognition
We present the open-source Imago toolkit designed for automatic extraction and conversion of chemical structures from raster image formats into a molecular structure representation format used in cheminformatics. We focused on recognition of photographed or scanned images containing noise, various outlines, different spacing, non-straight lines, non-uniform lighting, and etc. The designed recognition procedure is represented as a series of successive approximations, where on each recognition step we try to extract as much useful information as possible and reconstructs logical layout on-the-fly. To resolve different ambiguities we are using optimization tree, based on the distance metric between source and recognized elements.