Extract chemical information from unsearchable PDF documents - New and future features of Chemicalize.org and Document to Structure

presentation · 9 years ago
by David Deng (ChemAxon)
chemicalize.org Naming Document to Structure

Chemicalize.org is a free online service which pinpoints chemical information within webpages and documents. All chemical information in text is extracted and the document can be visualized in Document Viewer with structures interactively displayed. All structures are indexed on the server hence can be found through structure and/or keyword search. At the back end, chemicalize.org is powered by ChemAxon’s Document to Structure (D2S) application. D2S can extract chemical information with location information from various types of document. The latest D2S adapted text OCR and can even work on non-searchable PDF document. Bundled with existing optical recognition technology, D2S is a useful tool for data mining. Future developments include structure extraction of homology groups, biological entities and MS Office document support.

Download the slides here