IUPAC Naming and chemicalize.org
A very large number of webpages, patents, text, office and PDF documents contain chemical names, which constitute valuable information. However, since the names are mixed in the text, without specific markup or associated chemical representation, it can be very difficult to identify and use this information. We present a set of technologies that solve this problem by detecting chemical names inside free text, converting the names to chemical structures, and annotating the original document with this added information. These technologies are combined to power the chemicalize.org website, a free public service that acts as a proxy rendering any publicly accessible website with added chemical annotations. With these annotations it then becomes possible to perform rich chemical searches (for instance sub/super-structure searches) on the set of indexed documents. We present the underlying name-to-structure and structure-to-name tools, and their improvements over the past year.