Advances in automatic chemical spelling correction

presentation · 9 years ago
by Roger Sayle, Daniel M. Lowe (NextMove Software)
With the impressive progress made in chemical name-to-structure software, the major cause of failing to identify chemical entities in documents is decreasingly the complexity of the molecular structure or nomenclature used, but instead the presence of OCR failures, human spelling errors, hyphenation and line breaking issues. This talk will present statistics on the prevalence of typographical issues found in patent documents from different patent offices, and recent progress on algorithms for handling the high errors rates (sometimes ten or more character substitutions per IUPAC-like name) that are frequently encountered in practice. Download slides