Bringing structures to the disorder: Adding chemical intelligence to unstructured documents
Information stored within organizations is dispersed unpredictably in many different systems and often within unstructured documents. In particular, chemical structures may be stored within native application files, embedded in documents or flattened as images such as in PDFs. These documents may then be stored in formal document management systems, or on file-shares or hard-drives across the organization. These documents were traditionally assumed to be un-searchable, despite holding valuable information locked up in patents, reports, conception and study documents. A central database was created to index chemical structures added into the various file-shares and systems across an organization. The database provided a central location and federated access to multiple repositories of unstructured documents. Addition of an optical structure recognition process incorporated structures hitherto lost in flattened image form. We will describe creation of this tool and application within a major global organization to enable structure-based searching of previously inaccessible sources.