Table of contents

Introduction
Dealing with New Molecules
         Marvin Live deployed at Boehringer Ingelheim
         Preclinical data management. From RS3 to ChemAxon compound registration
         Screener - Compound Registration - Mosaic use case
Keynote Talk
         Driving efficiency and innovation in R&D
World of Macromolecules
         HELM-driven tools for peptide-based drug design using the ChemAxon Biomolecule Toolkit
         Macromolecules in E-Workbook
Creating and Searching Structures
         Scientific data management platform for specialty chemicals R&D
         Integrating ChemAxon software in an inventory and ELN
         ChemAxon's technology in Reaxys
         Rhea, a curated knowledgebase of biochemical reactions
Chemical Data Management
         Getting the best out of the JChem PostgreSQL cartridge
         How to use IJC as a lab notebook
         Software tools in the academic HTS workflow
         A user experience of ChemAxon software at GSK
         JChem for Office goes online
Knowledge Extraction
         ChemAxon's naming technology to accelerate extraction of chemical information from unstructured data
         Patent application management using ChemCurator and Marvin Live at Sprint Bioscience
         ChemAxon's technology integration for efficient patent searches and intellectual property (IP) landscape analyses
Partner Session
Conclusion

Introduction

The annual European meeting had a change of time, format, and venue this year. We all gathered at the Akvárium Club, Budapest, but attendees stayed in an assortment of hotels. There were about 70 user attendees from 45 organizations in 18 countries, but, judging from the long lines waiting for food, the numbers were more than doubled if you counted ChemAxon staff. The cold March weather precluded the usual garden party. Instead, after the pre-meeting workshops, there was an informal house party at ChemAxon's offices in the Graphisoft Park. On the next night, in between two days of presentations by users and ChemAxon staff, there was a dinner and wine testing in a lofty and elegant hall at the Pesti Vigadó. On the first day of the meeting, in an introductory session, Ákos Tarcsay of ChemAxon gave an overview of the ChemAxon portfolio. The following sessions were organized around more specific themes.

Dealing with New Molecules

Marvin Live deployed at Boehringer Ingelheim

Boehringer Ingelheim (BI) needs to design compounds as part of the drug discovery process. Compound design can be guided by data, and BI has plenty of data, but the existence, availability, location, retrieval, and interpretation of those data present challenges. Alex Schmalz of BI described the eDesign system which aims to alert users to relevant data, provide convenient access, and offer easily interpretable information. The vision was to build a global state-of-the-art, modular, flexible, and sustainable design environment that provides a seamless user experience. Work started in May 2017, and the core components were rolled out in November. A second version is planned for 2018.

Marvin Live (for 2D design), Schrodinger's Maestro Elements (for 3D design), and an ideation and collaboration tool are integrated with each other, and provide the front end for registering virtual compounds into the eDesign database. A data analysis and visualization tool is used to facilitate and track compound prioritization decisions in the context of experimental and predicted data. Virtual and real compounds and data can be displayed within the same data set, and decisions can be tracked. Certara's D360 is used to combine all gathered data on virtual and real compounds .

The 2D design tools were expected to support the design of moleculesas they were sketched, providing instantpredictionsbased on chemical structure. "SmartAssistants" provide the contextual information in configurable and flexible plugin windows. Favoriteideas can be stored as snapshots, and differentideas can easily be compared to each other. Colleagues can collaborate by designing compounds in common rooms. Favoriteideas can be submitted into a specific database for virtual BI compounds. The product was delivered on time, within budget, and in scope, and met expectations in respect of quality. There are about 400 potential users, and 15 concurrent users, mainly in R&D, accessing a growing pool of about 2.1 GB of data.

Feedback from the IT viewpoint was very largely positive. The balance between the component off-the-shelf part of Marvin Live and the agile development was liked. Installation was easy and straightforward. ChemAxon was responsive; the collaboration was results-oriented and constructive, and communication between BI and ChemAxon was efficient. On the slightly less positive side, performance improvements will be needed if many structures (400 or more) need to be imported.

Alex concluded with a few "wishes" for the future. He would like to use Marvin Live on a Microsoft Surface Hub, a digital whiteboard. He would also like "Marvin JS Plus", that is, Marvin JS with plugins such as Web Services, and Structure Checker. A dynamic interface between Marvin Live and Maestro would allow changes triggered by modifying a molecule in one application to lead to changes in the other, and re-calculation by SmartAssistants. Finally, chemists would like a "push notifications" feature: plugins would be run to alert chemists to information about the existence, availability, novelty, regulatory requirements, or high certainty predictions for compounds.

Preclinical data management. From RS3 to ChemAxon compound registration

Sofia Karlström reported on the PREClinical Information System (PRECIS) project run at Medivir in 2016-2017. Medivir had 30 years' worth of unstructured data, and a compound database in an unsupported and out of date RS3 system. The Accord Enterprise Workbench was a complicated registration interface; data amendments often required help from IT staff; and some of the data were of very poor data quality in RS3 because of bad database integrity, and lack of validation on registration. Medivir wanted a new solution that could also be integrated with Certara's D360 for visualization. Tight integration with the assay result database was another requirement; a decision to use BioRails instead of RS3 for the assay data was made.

User and IT requirements were captured as user stories, and non-functional requirements, and were prioritized as essential, useful, or nice to have. Vendor solutions were scored against the requirements, and ChemAxon's Compound Registration was chosen as the way ahead. A pre-migration clean-up of all the data was then carried out. ChemAxon Standardizer and JChem for Excel were used as part of this process.

Before the start of the migration, fields in RS3 had to be carefully mapped to the fields set up in Compound Registration . The recommendation was to use ChemAxon's Web Services for the task, but Compound Registration Web Services was found to be about 100 times slower than RS3 (for normal tasks). Medivir could not wait 50 hours every time a full migration was redone, so a separate migration database was set up. Compound Registration staging was very valuable for identifying and understanding types of errors.

It was difficult to know everything Medivir needed to know at the onset of the project; they needed to be agile and flexible in coping with three commercial products from three different vendors who had not collaborated previously. Another obstacle was that the assay data system was dependent on the completed migration of compounds to Compound Registration, but Medivir tried to do it in parallel. Communication was the most important factor for success, and users were also involved in the project implementation.

Medicinal chemistry project compounds are registered by batch upload in Compound Registration; a CRO prepares a list of completed compounds with the data needed for registration in JChem for Excel weekly. Staging is a very good function that gives complete control of all potential matches, and aids user in decisions, but update of data for multiple compounds at the same time is on the wish list.

Sofia listed a few current issues. The project list on the registration page is visible only if a user logs in with a username in capital letters, but user names may be capitalized in more than one way. Validation for data fields in bulk uploads is not yet configured at Medivir. Because of this, there is the risk of creating errors (dictionaries are not used). It is not possible to add additional data to parent level fields when registering a new lot. This leads to lots getting discarded without going to staging, so "append data functions" is currently turned off. There are difficulties in the user interface for the Switcher, Standardizer and Structure Checker configuration, with a risk of creating serious errors if the administrator makes mistakes in this configuration.

Improvements in those administration pages should result when Medivir updates from Compound Registration 17.03.13. "No structures" will also be handled better, and it will be possible to register virtual compounds. In future Medivir also plans to improve validation, evaluate enhanced stereochemistry, enhance D360 integration, and evaluate the registration of virtual compounds.

Screener - Compound Registration - Mosaic use case

Anna Tomin of ChemAxon, Oliver Leven of Genedata, and Marcus Oxer of Titian gave a tripartite talk on their integrated systems. Anna started by giving a brief demo, and then outlined the features of Compound Registration integrated with the two other systems by defined interfaces including RESTful Web Service APIs.

Marcus took over and talked about the Mosaic product suite for sample management, covering inventory tracking (with real-time information on sample location, container information, and substance metadata), ordering, sample processing, assay requesting, integration with automated storage and sample handling systems, tracked shipping, and connectivity. A team of over 40 software engineers works full time on Mosaic development.

The third speaker, Oliver, summarized the features of Genedata Screener which analyzes, visualizes, and manages screening data from in vitro screening assay technologies. The software is designed to import data from any screening instrument, and is used by leading pharmaceutical companies, CROs, and academic research institutions. It is not a fully automated, black-box pipeline, but allows scientists to generate the best data from their experiments, and leaves them to decide on processing details and results analysis. It can, however, be fully automated using business rules, so that checking can quickly be performed by the Screener system and the data automatically reported if the business rules are fulfilled. Scientists are offered a user-specific result review, and interactivity allows them to make changes to the processing or analysis with any change being instantly propagated through the workflow, making for very short analysis cycles.

Oliver then explained how integration of the three systems works. Screener imports numerical results from an instrument, setting the plate barcode as identifier. The plate barcode is used to retrieve compound ID, concentration, and other well information from Mosaic, and the compound ID is then used to retrieve structure and other information from Compound Registration. Anna ended by saying that this long-term partnership offers users an out of the box, integrated solution with well-established software. Benefits include a decrease in overall system implementation costs, efficient workflows that eliminate time-consuming process steps and maintenance, and reduced errors by removal of manual file handling.

Keynote Talk

Driving efficiency and innovation in R&D

Joe Donahue, Managing Director, Global Life Sciences R&D at Accenture, presented some industry challenges and suggested a solution. Over the past two years, Accenturehas engaged with over 40 organizations to further understanding of these organizations' research informatics challenges, and needs for support. Pharma needs to stop following one-directional approaches: no more repeat experimentation, manual data collation, data silos, and long software implementations. The industry needs a decrease in infrastructure costs, and an increase in externalization. R&D IT budgets cannot keep up the vast amount of data that now has to be collected, and lots of out of date software is being used. Collaboration is now critical.

Research support in the future must involve global data accessibility, and end-to-end data management; a secure environment to enable collaboration agility; advanced decision support using machine learning and AI; modern user interfaces; platforms for rapid application deployment and development; and a multi-tenant, open cloud platform, and shared cost structure.

A better way to support existing applications would be to take advantage of cloud computing, and wrap applications around standards for data ingestion, aggregation, search and collaboration. Pharma must adopt new technologies to accelerate research. Accenture recommends fragmentation of large traditional applications into discrete capabilities; getting real-time patient information for translational research by remote access, and control of lab instruments; better collaboration with external research centers, data sharing with CROs, and rapid deployment of firewall environments; and high throughput molecular biology, and hypervariable analysis. There must be a modern, consistent look and feel across the IT application landscape, with mobile-enabled scientific applications for use within the lab.

Pharma should consider adopting a pre-competitive platform, ecosystem approach to research informatics, in the cloud, with open, multi-tenant, shared services, and secure data that can be shared with collaborators. Accenture has already built a cloud environment for a big pharmaceutical company: the concept is proven. Precompetitive approaches are being embraced in the industry, and can provide an opportunity to drive differentiated outcomes, enabling better capabilities, faster, and at a lower cost.

World of Macromolecules

In two short talks during the meeting, Roland Knispel of ChemAxon summarized new features in ChemAxon's solutions for macromolecules. In the Biomolecule Toolkit ChemAxon is experimenting with BioJava for adding bioinformatics capabilities. There have been improvements in support for version 2 of Hierarchical Editing Language for Macromolecules ( HELM), and MOL2HELM conversion. Non-structure entities (e.g., cell lines and viruses) can now be registered. In the pipeline are lot level handling, improved annotation handling, genealogy capture, and application security (authentication and role-based authorization). The goal is to establish a biologics registration solution on a par with Compound Registration. There are some new features in BioEddie: highlighting monomers with free attachment points when drawing bonds; adding and editing domain annotations; use of scroll bars when a molecule does not fit on the canvas; appending and prepending monomers to sequences, and inserting them; and displaying the chemical structure of the molecule.

HELM-driven tools for peptide-based drug design using the ChemAxon Biomolecule Toolkit

Heptares Therapeutics creates novel medicines targeting G protein-coupled receptors (GPCRs); the company has the ability to address highly validated, yet historically undruggable, GPCRs. GPCRs have been identified as targets for a broad range of diseases. GLP-1 Receptor (GLP1R) and Glucagon Receptor (GCGR) are closely related Class B GPCRs and bind long helical peptides of approximately 30 residues. GLP-1 is the endogenous ligand for GLP-1 receptor. It is closely homologous to GCG in structure and function. Several GLP-1 analogues have been approved as treatments for Type 2 diabetes.

In dealing with peptides in drug discovery, Heptares faces the challenges of registering long-chain peptides, some of them with complex modifications, and carrying out drug design procedures on those peptides. The company has to collate data about peptides, and perform SAR analysis in an automated way, integrating sequence information about peptides into the spreadsheet environment of small molecule chemistry. Conor Scully gave a highly detailed presentation about the way in which Heptares has tackled these major challenges.

Despite the structural complexity of the peptides, registration is by molfile. Heptares used to draw peptides manually. The introduction of HELM and the ChemAxon Biomolecule Toolkit has now reduced the number of errors in peptide registration. A monomer database of about 400 amino acid residues and 200 chemical groups is manually curated using the Biomolecule Toolkit RESTful API within Python scripts. BioEddie is used for single monomer registration. Construction of various monomer dictionaries directly from the monomer database using the API has proved to be indispensable for peptide informatics work. HELMs are generated from the monomer database, and molfiles and HELMs are interconvertible. HELMs are also used for sequence analysis, followed by library enumeration.

Peptides with simple modifications (marked by an asterisk below) are not too difficult to lay out in a table:

This system quickly breaks down, and ambiguity is introduced, when multiple modifications are present in a complex peptide. Heptares has devised a numerical notation for mixing peptides and chemical modifications in tables, using Python scripts to pull the HELMs apart, and inserting "n&n" within braces, instead of the simple asterisks above, to avoid ambiguities.

Databases such as Reaxys Medicinal Chemistry and ChEMBL hold a rich set of data for peptide analogues of GLP-1 and GCG which can be computationally linked to the chemical structures. Heptares extracted efficacy data for GLP1R and GCGR ligands from those databases in SDfiles, and filtered them for peptide molecules by substructure search for tripeptide SMARTS. The monomer database was updated to include all residues present in the GLP-1 and GCG collection. The structures were transformed to HELMs using the Biomolecule Toolkit, and subjected to a HELM processing workflow. Molecules in the GLP-1 dataset underwent sequence alignment using GLP-1 as a template. Statistical analyses and other learning methods are now available for application on sequence columns, and intelligent peptide library design has become possible.

Macromolecules in E-Workbook

Vincristine is a "small", yet very complex molecule, while insulin is a classical "large" molecule, said Ian Peirson of IDBS, but there are new hybrid forms of biomolecules such as novel (unnatural) peptides, antibody-drug conjugates, peptide-phospholipid drug conjugates, and small interfering RNA (siRNA) molecules. In the new world of pharmaceuticals, scientists can no longer adopt a unilateral mindset: chemists need to think like biologists, and vice versa. Joint work is critical to success. Chemists have to synthesize unnatural monomers and identify appropriate linkers, whilst working together with the biologists to engineer the target drug jointly.

Market figures show the increase in this next-generation of medicines, with the FDA's approval for biologicals increasing from 35% in 2005 to 42% in 2015. Biologics also accounted for six of the top eight drugs by revenues in 2017, with a growth of 10.7% compound annual growth rate (CAGR). The global market for biologics is expected to reach $386.7 billion by the end of 2019. The global market for antibody-drug conjugates is particularly interesting, being expected to reach $4.6 billion by 2017, growing at 30.5% CAGR.

There is therefore a need for systems that handle rendering in experiments, workflow and data management, registration, inventory management, and genealogy of biologics. Ian showed some screenshots and a video illustrating how E-Workbook addresses these challenges. By integrating with ChemAxon's Marvin JS, JChem, BioEddie and BioMolecule Toolkit, the E-Workbook platform helps the researcher to map out the data flows. The addition of the IDBS BioProcess Execution System templates provides full support for cell fermentation manufacturing, providing audit-by-exception and cross-batch analysis. IDBS is enhancing bio-inventory for visualization and reporting of biological genealogy. It is proposed that ChemAxon provide more to support hybrid chemical-biological molecules for rendering and registration.

Creating and Searching Structures

Scientific data management platform for specialty chemicals R&D

Gai Anbar, the founder and CEO of Comply, spoke about a system that uses Marvin JS. Comply is a provider of consulting services with expertise in quality systems, data integrity, validation of computerized systems, quality of digital health systems, and cloud systems. The company is the developer of Skyline, a platform designed to manage and analyze chemistry-based R&D from idea to production. It combines ELN, LIMS, SDMS and business intelligence features. Centered on experiments, there are modules for project management, inventory, notifications, requests, planning, and reports.

ChemAxon Marvin JS and back-end components are key elements in Skyline. They allow users to draw reactions; identify reactants and products; handle chemical structures of inventory items; calculate physicochemical properties; search the inventory based on structure and properties, and view inventory items; and search for similar compounds as part of product development. Comply has been very happy with the use of Marvin JS and related software, and plans an interface to commercial databases, and additional, deeper tools in future.

Gai showed a number of screenshots from the system. It is Web-based, and has dynamic dashboards, and a powerful search engine. It features tasks and notifications, and spreadsheet and chemical editor components. User preferences can be personalized. It is easy to customize the system to meet business needs. Skyline complies with regulatory standards such as FDA 21 CFR Part 11. Skyline Specialty Chemicals covers end-to-end R&D projects, is user friendly, implements complex business rules and workflows, and is highly integrated, and secure. Customer benefits are operational excellence and reduced time to market.

Integrating ChemAxon software in an inventory and ELN

NovAliX bought eNovalys to re-invigorate the eNovalysELN. A new, Web-based user interface was needed, to replace the one based on ChemAxon's Marvin, Microsoft Silverlight, and Java, and an inventory system had to be integrated. Frank Hoonakker explained how this was accomplished.

The new system uses Microsoft Internet Information Services (IIS) for Windows Server for the user interface server. The data server, under Microsoft IIS, Microsoft SQL Server, and Apache, uses ChemAxon JChem to store molecules and reactions, uses Name To Structure and Structure To Name, includes Marvin JS Services for editing molecules and reactions, and has eNovalys ELN and inventory services.

Installing and using Marvin JS worked like a charm, but regenerating the database to comply with the latest version was more challenging. Frank showed the code used to update the jchem_molecule and jchem_reaction tables. It took 5-6 hours to migrate both molecules and reactions for about 300,000 reactions, but it worked perfectly, even though the software being replaced was very old. Frank also showed the JavaScript used to produce SVG depictions of molecules and reactions.

Some more work is still needed to finish the integration of the inventory and include a barcode scanner. The use of Marvin JS on an iPad has not really been tested yet. Efi Hoffmann of ChemAxon says that Marvin JS works on an iPad, but it is not optimized for iPad.

ChemAxon's technology in Reaxys

Ralph Hössel of Elsevier gave a ChemAxon-oriented talk about Reaxys. The Reaxys and Reaxys Medicinal Chemistry (RMC) solutions support chemistry, pharma, environmental, and material research. Reaxys aims to deliver immediate access to information. Its user-friendly and intuitive interface allows retrieval of bibliographic, substance, property, chemical reaction, bioactivity, and biological target data which are stored and managed in a robust database. This is supported by a wide range of chemistry-related search functionalities, including structure and reaction search, property search and NLP-supported text search based on continuously updated and expanded taxonomies. A "quick search" can be easily entered, but, alternatively, even a non-expert can use Query Builder to create targeted queries. Reaxys interprets either natural language, or truncation and operators, recognizes the search intention, and delivers a ranked list of alternative results suggestions. All relevant data are tabulated for direct use, and filtering and analysis features allow for further refinement.

Many ChemAxon solutions have been used in the Reaxys excerption and search systems. Data are both automatically and manually excerpted from the literature. Marvin and Markush enumeration are used in the excerption and database production workflow. The enumeration tool was built in collaboration with ChemAxon: the two companies established a good relationship. The Reaxys intuitive excerption interface is Windows-based, locally installed software for manual excerption of chemistry and medicinal chemistry data. It is embedded in the overall Reaxys and RMC workflow and ensures high data quality by using taxonomies or completely fixed vocabularies for nearly all text fields, with more than 1,000 check rules. The Reaxys architecture incorporates Marvin JS, JChem Web Services, and MolPrinter. MolPrinter transforms structure and reaction files into image files.

Researchers can use Marvin JS to enter complex Reaxys structure and reaction queries. The version of Marvin JS originally used was not optimized for Reaxys users, so ChemAxon and Elsevier exchanged feedback from end users and came up with an improvement plan. The new version has added icons for maximum substitution count, lock atoms, and atom mapping; it has additional templates; Reaxys generic groups are more visible; and abbreviated groups (also accessible by keyboard) are more easily found.

Future Reaxys projects might use Marvin JS, JChem Web Services, and the Chemical Structure Representation Toolkit. Elsevier is considering developing new predictive tools. For example RMC bioactivity data could be used to build tools for suggesting bioisosteric replacements, and for target prediction. The ChemAxon server tools could be used for transforming molecular sketches into SMILES and 3D structures, for handling molecular topologies, and for creating molecular images.

Rhea, a curated knowledgebase of biochemical reactions

Anne Morgat of the Swiss Institute of Bioinformatics gave a presentation about Rhea, an expert-curated resource of biochemical reactions designed for the functional annotation of enzymes, the description of metabolic networks, and omics-related analysis. ChemAxon software is a key component of the project. Rhea curators appreciate the use of Marvin Sketch to send their submissions to Chemical Entities of Biological Interest ( ChEBI). The Rhea team also use plugins to compute major and microspecies, pKa, SMILES, formula, charge, image, and other facts. In future, they would like to evaluate ChemLocator on scientific papers for Rhea curation.

Rhea is freely available and is very heavily used. Its purpose is to describe biochemical reactions precisely. Most of these are enzyme-catalyzed reactions described in the International Union of Biochemistry and Molecular Biology (IUBMB) enzyme classification and published in the literature. The ChEBIontology is used in Rhea curation. Compounds in ChEBI are normalized using the ChemAxon MajorMicrospeciesPlugin, InChI, and ChemAxon SMILES. Rhea curators also contribute to ChEBI. Rhea's specific synonym in ChEBI is called the UniProt name. Using this process, Rhea is made non-redundant, and the reactions are chemically balanced at the level of mass and charges.

There are three kinds of reaction participants: Rhea small molecules, Rhea generics, and Rhea polymers. Some of the participants are linked together thanks to a ChEBI "is a" relationship. These relationships are exploited to create relationships between Rhea reactions. The Rhea hierarchical reaction classification complements and extends the IUBMB enzyme classification. Rhea generics represent proteins and nucleic acids. Reactions between them are classified by simplifying to the functional group involved in the reaction. A reaction page on the public website summarizes the 2D structure representations of the reaction participants, cross-references, relationships between reactions, and citations that provide the evidence of the reaction.

Rhea will be used to annotate enzymes in the UniProt knowledgebase. The mission of UniProt is to provide a comprehensive, high-quality and freely accessible resource of protein sequence and functional information. Currently, the enzyme annotation is described in text, but very soon this textual representation will be replaced by Rhea reactions, providing a more controlled vocabulary. The use of chemical structures can be seen as a bridge between the worlds of bioinformatics and cheminformatics.

Chemical Data Management

Getting the best out of the JChem PostgreSQL cartridge

Ellert van Koperen of MedChemData reported on his attempts at convincing PostgreSQL and the JChem PostgreSQL Cartridge to be good team players. Files, services, databases, and special programs can be used to store and transfer large volumes of data. Files are useful for a process that runs only once. Services are useful for operations done only a few times, or on a subset. Databases are the most suitable for operations that are done many times. Special programs can incorporate powerful options, but they are often difficult to interface with other tools except through files.

Using a database is an obvious approach for reasons of data consistency, safety and security, central storage and backup, speed, and availability of centralized tools with centralized licensing. Speed is especially important if the task has to be repeated several times. Ellert chose the PostgreSQL database in combination with the JChem PostgreSQL Cartridge. He summarized the reasons for choosing Postgres in the following table.

Ellert ran Reactor on a subset of all commercially available reactants for a Grignard reaction as a "torture test". Pre-filtering was necessary (as, otherwise, Reactor would have needed to try 1,444 million million combinations). Step 1 was to gather the data from nearly 30 chemical suppliers, load them, and create the chemindex. This is where the hard work is done. Step 2 was pre-filtering the Grignard reagent and the reactant. Ellert ran into a problem with one query for a reactant: it is not possible to use a NOT in a clause on a chemindex. He devised a "workaround": a self-join with null condition.

The method scaled but Ellert thought he could do better by using a generalized inverted index (GIN). The GIN-index method and GIN-index-in-table were not a success. He tried using a bit string but it lacked good functions, and there were scalability issues for in-table updates. So he tried adding a pivot table. This did not help. Ellert tried using a bloomindex, but this extension of PostgreSQL is not fully functional yet. In all cases, all these tricks sometimes outperformed a chemindex, but only after the slow pre-calculation and only if using exactly those pre-calculated values. In summary, steps 1 and 2 work, as long as you stick to the pre-calculated fragments. It is not as fast as Ellert may have liked, but it is acceptable if you can be bothered to jump through a lot of hoops.

Ellert now considered what else he might do. Upgrading to a later version of JChem was an idea. Chemindex generation proved to be much faster in JChem 2.9 or 3.0. Ellert decided to try the new SortedChemindex and LIMIT the result. Response times were now measured in milliseconds on substructure and similarity searches! The results were spectacular. Though not directly usable in the test case, this is such a spectacular speedup that it opens up many possibilities, such as interactive Web compound search, a system where you can see the results during the creation of workflows, and doing "nearby chemical space" exploration. Handling large volumes of chemical data in a scalable way, on a tight budget, is now possible, but the real power of the JChem PostgreSQL Cartridgeis in its really fast small searches.

How to use IJC as a lab notebook

Péter Ábrányi-Balogh is in György Miklós Keserű's medicinal chemistry research group at the Research Center for Natural Sciences in Budapest. Ákos Tarcsay of ChemAxon was a co-author of his talk. With more than 30 people working in the lab, the synthetic information accumulated in one year can be more than 5,000 reactions. All this information and valuable experience is available only on paper, and cannot be searched or shared effectively. The reactant database needs to be up to date, but for 5,700 compounds manual registration is not easy. ELN systems available commercially are much too expensive for an academic research group in Hungary, but for most of the group's competitors ELNs make synthetic work more effective.

Then ChemAxon came up with the idea of an Instant JChem (IJC) project used as a basic ELN. Péter showed some screen shots from the system, starting with a grid view for 5,700 building blocks. He created a new row in the reactions spreadsheet and showed how an erroneous structure for sodium hydride was corrected. The reaction was then analyzed, and stoichiometry calculations were carried out. A textual description of the reaction procedure was added. The current status of the reaction (e.g., "purchasing reagents", "unsuccessful") can be edited. Drawings (e.g., TLC) and literature references can be added, and yields calculated. The developers have proved that IJC can make a basic ELN but this is still a work in progress.

Software tools in the academic HTS workflow

David Sedlák of the Institute of Molecular Genetics in the Czech Republic talked about CZ-OPENSCREEN, the Czech National Infrastructure for Chemical Biology, the mission of which is to identify new molecular probes and tools for research, and proof-of-concept compounds for the development of new potential therapeutics. Services include assay development and high throughput screening (on open access); development of tool compounds for biological processes; medicinal biology support, compound profiling (open access to cytostatic, cytotoxic, apoptotic, necrotic and proliferative properties of compounds); free access to biological data; cheminformatics support; and advanced analytical tools such as SAR and data mining.

CZ-OPENSCREEN has a collection of over 85,000 compounds. ScreenX is a LIMS and data mining solution powered by PostgreSQL, RDKit, and ChemAxon software. It handles compound management, experiment design, HTS data storage and analysis, reporting, and cheminformatics intelligence. David showed some screen shots, including HTS plate view, a dose response experiment, dose response analysis, and multiple assay analysis.

Marvin JS is used in the compound management application. IJC is used for substructure and similarity searching, and JKlustor is used in clustering and diversity analysis for chemical libraries. JChem for Excelis used for SAR models. Cheminformatics analysis is available via functions in ScreenX; ChemAxon software is used with IJC to predict physicochemical properties, to handle CAS and traditional names, and for SMILES to structure conversion.

A user experience of ChemAxon software at GSK

Stephen Swanson summarized the past, present and future use of ChemAxon software at GSK. GSK began a major chemistry tools simplification effort in 2008: chemistry research IT simplification program (CRISP). There were more than 520 application components in use, with complex and fragile interapplication dependencies, and overlapping functionality. Some were obsolete and at the end of their lives. In 2009 the SAR tools replacement project began. The many tools on the chemistry desktop were replaced by Helium in Excel, Helium in Spotfire, and IJC. Structure rendering for Helium in Excel is done with JChem for Excel. The system was taken up enthusiastically by scientists, and the legacy applications were not missed.

The ChemAxon chemistry engine was chosen to replace Daylight, Accord and the MDL Relational Chemical Gateway (RCG) back-end systems, and IJC was chosen to replace ISIS, together with MarvinSketch, and JChem for Excel. Early testing indicated that the move from ISIS Base to IJC caused a significant drop in performance outside of the United Kingdom. A less than perfect solution was to access IJC on Citrix Servers in the United States and Asia. In 2011-2016 there was high uptake of IJC among U.K. chemistry groups, but poor performance outside of the United Kingdom, particularly at U.S. sites, plus a frustrating Citrix "barrier", resulted in lower uptake.

Moving to a Web solution presented a number of potential advantages, including a significant performance improvement in the United States, and the removal of Java compatibility issues. GSK and ChemAxon collaborated on "IJC Web" with Plexus Connect, and, in August 2015, phase 1 of IJC Web was initiated to deliver the five most used IJC projects via Plexus Connect. Nevertheless, Wave 1 of the launch was put on hold in 2016. There had been a rush to get Plexus Connect launched before the IT group was broken up because of a site closure, but this led to more limited engagement of business users during testing. Also, U.K. chemists raised some major concerns around Plexus Connect project forms.

In 2017 the Wave 2 project had greater business engagement. The original proposal was to use IJC for form design and admin functions, while Plexus Connect was to be the main tool for querying by program scientists. The top priorities were the performance and functionality gaps between Plexus Connect and IJC. Original migration issues were fixed, new widgets were implemented, and form design was improved, since Plexus Connect is less "fault tolerant". Plexus Connect was relaunched in August 2017.

In 2018, Plexus Connect and IJC performance are now regarded as very similar by key U.K. users, and U.S. Plexus Connect performance is comparable to that in the United Kingdom. U.S. uptake and usage of Plexus Connect has increased. IJC for form design and editing is acceptable in the United States, but IJC is no longer used for querying by scientists. U.K. usage is now split between Plexus Connect and IJC: some established projects continue to use IJC, but new projects use Plexus Connect and existing project templates during set-up. Functionality gaps between Plexus Connect and IJC (e.g., list handling, and structure search query options) are still highlighted by some IJC users.

JChem for Excelis an essential add-in for structure handling, and is part of the Helium installation. JChem for Excelfunctionality was not promoted at launch, but users ended up discovering it for themselves; IUPAC naming is a particular favorite. JChem for Excelis also one of the few tools available to chemists for creating and sharing SDfiles. Unfortunately, the add-in contributes to slow opening of Excel, particularly following GSK's migration to Office 2016. GSK plans to move to JChem for Office in 2018.

In the short term future, GSK will move to long-term support (LTS) versions of the JChem applications. In the past, moving between Microsoft Office versions has caused some problems with Excel stability related to JChem for Excel; GSK hopes that moving to the long term support version will help. In the medium term, to manage "big data", the Hadoop platform, better Spotfire integration, and containerization of Plexus are under consideration. In the slightly longer term, chemists want a "compound CV". They regard their Plexus Connect project form as an overarching view of a compound's properties, but the form has to be manually created and prelinked to data sources, so has limited flexibility. A "smart" form would create itself and make the necessary connections, and help chemists to keep up with the ever increasing sources of data that need to be processed and analyzed. With advances in AI, machine learning, and so on for enabling molecular design, Stephen foresees the a real probability that the age of the medicinal chemist may be fast drawing to a close, so how will current, or future, ChemAxon applications fit into the new model?

JChem for Office goes online

Ákos Papp showed a video of a test version of JChem for Office. Using the JChem for Office Java API you can easily create Word and PowerPoint reports from any system that stores your structures and related data, and you do not need Microsoft Office or even Windows. You could, for example, do this on a Linux server, with structures and data taken from a warehouse or ELN, and the reports are placed into a repository or emailed to the appropriate people. The reports can be created based on Word or PowerPoint templates, where the target location of the structures and the data is identified by bookmarks. Word's own bookmark system is used, while in PowerPoint a similar one is available by a simple add-in. In the video Ákos showed that JChem for Office Java API can place a reaction structure, and the corresponding formatted text, into an existing document in the place where the corresponding bookmarks were assigned, and it can also insert a table containing multiple structures. The result is a document where all structures are editable using JChem for Office. The application can add and edit structures in Office documents, change their display properties with one click, and display the structures inside Excel cells, so they are automatically resized with cell resize. Ákos has demonstrated proof of concept for JChem for Office online but user feedback is now needed.

Knowledge Extraction

Daniel Bonniot introduced this session with a brief update on continued enhancements to ChemAxon's naming technology. Naming is easier to use. The automated format detection is now able to recognize all uppercase names. (They were previously recognized as peptide sequences.) Naming is more versatile, now having a cosmetics dictionary. It is more correct: Structure-to-Name can now use the dehydro nomenclature to name benzyne-like compounds. Soon naming will understand more domains (e.g., polymers and herbicides), and still more IUPAC standards (e.g., M/P nomenclature for axial chirality).

ChemAxon's naming technology to accelerate extraction of chemical information from unstructured data

Erwan David of DEXSTR applauded the new relationship between his company and ChemAxon. Inquiro, DEXSTR's scientific knowledge management system turns unstructured data into actionable insight. It combines scientific capability with innovative technology, including indexing, automatic metadata generation, big data storage and analytics. It allows users to collect and archive information from any source automatically. Erwan showed a screen shot of the template mechanism and the connector to reference systems. Entities are detected by NLP. Inquiro automatically identifies key information from users' files: Erwan showed a dynamic view. This is a bit like iTunes. Some users have Pipeline Pilot. Inquiro organizes data the way the user wants them organized; and allows users to retrieve their archived data.

DEXSTR partnered with ChemAxon so that users of Inquiro could benefit from a wide range of chemistry features such as smart chemical detection, storage, indexing, and chemical structure searching. Erwan showed screen shots of chemical entity recognition, and structure editing and search.

Inquiro is a "hub" which gives users access to all the information. It lets the user know that the information exists. As an example of data awareness in Inquiro, Erwan showed a visualization of multiple companies at a conference, drawing attention to the fact that the Allotrope Foundation is significant. He gave a clinical example of data completeness in Inquiro: cells were colored in a matrix to show the user where data were missing. Inquiro is a controlled access, collaborative platform, and can highlight experts in a given field.

Inquiro uses JChem Base to check for duplicates and store structural entities. Document to Structure and Name to Structure are used to identify chemical entities and turn them into structures. Marvin JS, JChem Engines, and Name to Structureare used to manipulate chemical structures and run structural queries. DEXSTR recognizes that ChemAxon has quality products, a dedicated team, and frequent updates.

Patent application management using ChemCurator and Marvin Live at Sprint Bioscience

Sprint Bioscience is using a fragment-based drug design platform to create small molecule probes quickly, with properties suitable for drug development. The final aim is to optimize the probes into first-in-class drugs for novel targets in the cancer area. The company toolbox contains protein science, fragment screening methods, X-ray crystallography, medicinal and computational chemistry, and biochemical and cellular systems.

Jenny Viklund showed a timeline: after selecting a fragment starting point, the first in vivo compound in a series was found after 10 months; it was then further optimized and two "front-runner" series were prepared. In February 2016, two patent applications were filed, covering the synthesized compounds in the front-runner series. ChemCurator was used to double-check that the claims covered all the exemplified structures before the patent application was submitted.

Back-up series were then created, and four more patent applications were filed in August 2017. Sprint Bioscience wanted to file these applications before all the intended compounds in the back-up series were synthesized. Again, ChemCurator was used to double-check that the claims were covering both synthesized and envisioned structures. Even though the Sprint Bioscience researchers enumerated and double-checked all compounds that they thought that they would synthesize, they realized that they still might get new ideas, not covered by the claims, during the priority year. Thus, the previously generated ChemCurator Markush files, that were created to double-check the new claims, were integrated with Marvin Live, for double-checking new ideas on the fly.

Jenny changed topic and discussed a new "dream" idea called Claim Curator: a program that ChemAxon could write to facilitate both writing and double-checking claims. It is tedious to read and write all the claims in a patent. Programming functions could help. Jenny used a metaphor for patent claims: the layers of an onion, where each layer can be described by a programming function. For instance, for each of the given core structures in the claim, the inner layers of the onion contain the most important substituents in R1, R2 etc. Jenny creates a programming function called "the most important substituents in R1", where the input is the structures of these substituents, produced either by drawing, or writing, or even extracting the substituents from an SDfile covering the original compounds by some type of R-group extraction program. Moving outwards, the next layer has a function for "other substituents synthesized but somewhat less promising", and the next a function for "additional substituents not exemplified but reasonably believed to be also active". These are similarly filled with input by drawing or extracting substituents. This procedure is then repeated for all R-groups and all core-structures, until all structures (to be covered by the claims) have been input into a function with an understandable name.

You then write and double-check the claims, using the name of the functions, and let the "dream" program expand the functions into the usual patent lingo. The method would be easier and faster than plain writing; it would be intuitive for medicinal chemists; there would be fewer errors; and costs when using patent attorneys would be reduced.

ChemAxon's technology integration for efficient patent searches and intellectual property (IP) landscape analyses

Aurélie Brunet, of Questel made this presentation. Orbit Intelligence is a Web-based solution for IP business intelligence. It has patents from over 100 patent authorities. It also has litigation content, licensing agreements, standards, designs, trademarks, business information, and more.

There are over nine million chemical structures in Questel's chemistry module, and 15.5 million documents are indexed. ChemAxon's JChem Base technology is integrated in the advanced search in Orbit's chemistry module, and chemical search can be easily combined with applicants' names, legal states, and keywords. Since ChemAxon's Naming software is used, you can search for all chemical synonyms at once and see a structure, rather than struggle with a name. The user enters a structure with Marvin JS; it is converted into text to retrieve all corresponding patents. Similar structures can be rapidly identified thanks to the substructure search and highlighting features. Substructure search retrieves all molecules having as core the searched molecule in the full text, that is, in the abstract, claims and description.

Molecules are quickly extracted every time a patent enters the database, but images are not processed. Orbit Chemistry indexes common names, drug names, acronyms, and IUPAC names, using Name to Structure technology. Orbit also allows search of CAS RNs and SMILES. These are not indexed but are translated to chemical names by the system.

Orbit Intellixir offers advanced statistical analyses. Chemical names have been extracted from all the data, using ChemAxon tools. Natural language programming can be used for articles, clinical trials, and in-house Excel files, and chemicals can be analyzed. Users can create their own chemical categories for in-depth competitor analyses, and technology watch.

Being able to detect newly published molecules gives Questel a competitive advantage. As ChemAxon's modules do not rely solely on thesauri, Questel can detect new patents almost immediately. There are plans to expand the chemistry module in future. Questel would like ChemAxon to expand Name to Structure to fields other than organic chemistry.

Partner Session

Nóra Lapusnyik of ChemAxon introduced this session. She said that ChemAxon does not itself provide all of the tools necessary to accomplish its vision, so it is developing an integration platform: Synergy. Synergy will have a public API to accommodate common user and role management, project management, and application-to-application communication. It will be an open interface for partnering vendors to deliver functionality into the ChemAxon platform. This platform is designed to work in a Web environment. It is currently working in the cloud; automated deployment on premises will be implemented in the near future.

There were nine partner presentations. First Jonathan Gross talked about Labguru, the ELN product of BioData, a Digital Science company. Labguruuses Marvin JSand JChem Web Servicesto allow structure editing, chemical registration, and property calculation. Labguruis a user-friendly ELN for planning and documenting experiments, tracking progress, streamlining lab logistics, and sharing results.

Fabian Rauscher of Certarapresented D360, in which ChemAxon tools such as MarvinSketch, Marvin JS, the JChem Oracle Cartridge, JChem Web Services, Compound Registration and property calculators are supported. D360federates data from different sources and allows analysis and visualization of those data. D360 Capture is an add-on product for the capture, prioritization and sharing of new chemical concepts using a central storage repository for the ideas. D360 Partner is an add-on that simplifies providing secure data accessto external research collaborators.

ChemPass, represented by Gergely Makara, is an artificial intelligence design technology company focusing on software solutions (SynSpace and SynScope) that help chemists generate ideas, design novel scaffolds or lead analogues, and reach a significantly expanded synthesizable chemical space. Features such as chemical drawing, structure checking and standardization, searching, and enumeration are all powered by ChemAxon functionality.

The Readily AccessibLe ( REAL) database (described by Andrii Buvailo) is a chemical space of 337 million synthetically accessible drug-like compounds, which is searchable using ChemAxon's MadFast tool. The compounds should have an 85% success rate of synthesis. Compounds can be ordered at fixed prices at EnamineStore.com, for delivery within four weeks.

INEON Biotech, a new company founded in 2017, supplies INEON Free and INEON Smart Screening, which integrate logistical , physical , chemicaland structural information about compounds. Bruno Sargueil showed how these toolsassist users in screening campaigns to edit cherry-picking worklists, and follow projects from hit-to-lead to lead optimization stages.

ChemAxon functionality is available as KNIME nodes. That functionality can be mixed and matched with other technologies and techniques. Marvin JS is in the KNIME Server WebPortal. Mihály Medzihradszky of KNIME suggested that the audience views videos about the m onster model factory, a tool for automating the process of building, validating, and deploying predictive models using KNIME.

Mcule has a compound sourcing solution which uses ChemAxon tools to calculate properties and filter compounds, and will soon have integrated the Compliance Checker, reported Benjámin Kováts. The company is also working on an EU-funded project to create a database of 500 million compounds, including completely novel molecules predicted to be synthesizable, but not yet synthesized. Criteria for inclusion in the Ultimate database are a minimum of 80% success rate of synthesis, a maximum of 6-week delivery time, and fixed prices.

Quattro researchhas a team of about 25 people who do data management, data integration, system integration, customs solutions, and consulting. Cathrin Pautsch spoke about Quattro/CM for compound management, including support for biologicals. Marvin JS, the chemical editor in Web/CM, is liked by quattro developers because it is very easy to incorporate, and ChemAxon offer great support.

Last, but not least, Gerd Blanke of StructurePendium Technologies talked about the IUPAC International Chemical Identifier for Reactions ( RInChI). RInChI is based on the IUPAC International Chemical Identifier ( InChI), a unique representation of a compound it describes. InChI and InChIKeys can be read by, for example, Marvin. The RInChIformat is a hierarchical, layered description of a reaction with different levels based on the Standard InChI representation of each structural component participating in the reaction. The formats and algorithms of InChI and RInChI are non-proprietary, and the software is open source. ChemAxon is discussing if and how they will introduce RInChIs into their software.

Conclusion

ChemAxon is celebrating its 20th birthday. Apart from sharing in the splendid gala dinner, we celebrated with a big birthday cake in the Akvárium Club. In an amusing talk before the "cake break", Daniel Bonniot of ChemAxon presented a "time machine", and introduced a Pac-Man-like arcade game in chemistry that he had developed as a competition for us. The game proved very popular, and there were prizes for the best entries, the first two of which were from ChemAxon, although, from the user community, Jenny Viklund was a proud winner.

It is 10 years since I attended my first ChemAxon user meeting, in Budapest in 2007. What a lot has happened since then! Now ChemAxon is aged 20. In most places, personal maturity occurs at 18 or 21 years. So is ChemAxon now "mature"? In terms of global impact it is reaching maturity, but it is still growing both in size (the company now has 150 employees) and in achievements. It has not peaked yet. In his introductory talk, Ákos Tarcsay said: "Over the last two decades ChemAxon has invested its efforts in developing technologies to bring the fascinating world of chemistry into the digital environment, and to facilitate scientific decision making, but innovation never stops: it is just getting started." That seems to be an excellent way to end my own summary.