Automated extraction of structure-activity relationships from chemistry patents

presentation · 9 years ago
by Lutz Weber (OntoChem)
We have developed a novel, comprehensive technology to automatically extract structure-activity relationships from chemistry patents. First, named entities are annotated: using large dictionaries and name-to-structure tools, chemical entities and compound classes from our chemical ontology are annotated. Similarly, diseases, biological, pharmaceutical and physiological effects are annotated. In a second step, potential anaphora are resolved, e.g. numbers or underdetermined entities are replaced by their more precise meaning. In a third step, sentences that contain relationships about chemical compounds and effects are analyzed for their syntax using automated tools that determine potential relationship types using a fine grained relation specific relationship ontology. As a last step, the output of normalized relationship triples or n-tuples is generated. These results are than analyzed for their quality using statistical and other criteria to derive a validated SAR that could be used as input for databases or search engines. Download slides