Historically the cost of access to structured chemical data extracted from patents has been prohibitively high to many researchers working in the field of Drug Discovery. The benefit of delivering this dataset to the scientific community in a free and open manner cannot be underestimated. Aware of the demand for such a service, the European Bioinformatics Institute (EMBL-EBI) acquired the SureChem patent system from Digital Science Ltd. In December 2013. The service has been re-branded SureChEMBL and run by the ChEMBL group along side existing Open Drug Discovery and research resources such as the ChEMBL database and UniChem. The focus of this talk will provide an overview of the existing system architecture, including ChemAxon software, describing how we go from patent literature to structured chemical data, accessible via Web Interface and API. The challenges of migrating such a complex system will be discussed as well as the opportunities to enhance the data processing pipeline, based on prior knowledge from running large chemical resources. In addition to providing an overview of the system, our future plans for the SureChEMBL system will be described. To date these plans include extending the functionality of the entity extractor to identify additional entities important in the Drug Discovery process, such as protein targets, diseases and cell lines. Other plans are focused integration with existing EMBL-EBI resources, such as the ChEMBL database and Europe Pubmed Central. Finally we look towards new and exciting ways to share the data such as integration with Semantic Web technologies and distribution via private Virtual Machine instances.
Related content
ICCS 2022 - Translating data to predictive models
Biological, chemical and physical properties of molecules are encoded in their molecular structure....
Cheminfo Stories Virtual UGM 2021 Asia Pacific Edition: Deep dive in the future of chemical patent drafting and in-house IP management
Writing chemical patents with Markush claims is a time-consuming, complex and business-critical...
Cheminfo Stories 2021 Virtual UGM Asia Pacific Edition: Design of new compounds from the available chemical space
In computational compound design workflows, the analysis of the available chemical space is an...
Cheminfo Stories 2021 Virtual UGM | Boost analytical experiments with phys-chem properties
TRY CHEMICALIZE Log in for videos & slides