, SureChemOpen, PubChem and the InChIKey: A heavenly conjunction with transformative utility

presentation · 4 years ago
by Christopher Southan (TW2Informatics) Markush search Naming
The ChemAxon Name to Structure functionality is not only a component of the SureChem patent extraction pipeline but also powers Both operations are now submitting sources to PubChem. The former has deposited structures that bring the patent-extracted total in PubChem to 14.5 mill. CIDs. The deposition from chemicalize is ~0.3 mill., but has been actively selected by users and is 20% unique.  The final conjunction is that all three sources generate the InChIKey that turns Google into a de facto merge of PubChem and ChemSpider of ~50 mill. structures. users can convert new patents, other external or internal documents and web based text. Individual results can be Googled, searched against SureChemOpen and bulk extractions triaged against PubChem.  It thus becomes possible to connect chemistry between patents, papers, abstracts and database records via exact match or similarity searching. When SureChem and update their submissions, relationships with the other 47 million structures from ~200 PubChem sources (including ChEMBL and vendor databases) are re-computed and new CID links made. The synergy between SureChem and is powerful because matches between them (~ 0.15 mill.) via SureChemOpen, give occurrence statistics and the location of the structure within patents. The applications of are extended by web tools such as Venny for determining intersects from multiple extractions and CheS-Mapper for cluster visualization. These utility expansions will be illustrated by documents specifying BACE1 inhibitors for Alzheimer’s disease.    

