Chemicalize.org, SureChemOpen, PubChem and the InChIKey: A heavenly conjunction with transformative utility
The ChemAxon Name to Structure functionality is not only a component of the SureChem patent extraction pipeline but also powers chemicalize.org. Both operations are now submitting sources to PubChem. The former has deposited structures that bring the patent-extracted total in PubChem to 14.5 mill. CIDs. The deposition from chemicalize is ~0.3 mill., but has been actively selected by users and is 20% unique. The final conjunction is that all three sources generate the InChIKey that turns Google into a de facto merge of PubChem and ChemSpider of ~50 mill. structures. Chemicalize.org users can convert new patents, other external or internal documents and web based text. Individual results can be Googled, searched against SureChemOpen and bulk extractions triaged against PubChem. It thus becomes possible to connect chemistry between patents, papers, abstracts and database records via exact match or similarity searching. When SureChem and chemicalize.org update their submissions, relationships with the other 47 million structures from ~200 PubChem sources (including ChEMBL and vendor databases) are re-computed and new CID links made. The synergy between SureChem and chemicalize.org is powerful because matches between them (~ 0.15 mill.) via SureChemOpen, give occurrence statistics and the location of the structure within patents. The applications of chemicalize.org are extended by web tools such as Venny for determining intersects from multiple extractions and CheS-Mapper for cluster visualization. These utility expansions will be illustrated by documents specifying BACE1 inhibitors for Alzheimer’s disease.