ChemAxon’s 2013 European User Group Meeting Report

news · 4 years ago
by Yvonne Martin (Martin Consulting)
Summary Social Aspects and Demographics
Keynote About the ChemAxon presentations
Desktop track: Applications for chemists Platform Session: Toolkits for building systems and services
ChemAxon Science: Extending capabilities for discovery Partner Session

Summary
The release of Version 6.0 was an important thread of the meeting. Not only does it continue the tradition of improvements in performance, but new capabilities of existing products and whole new products were announced. Marvin, JChem, JChem for Excel, Instant JChem, JChem for SharePoint, and JChem Web Services have been updated to provide new capabilities. Name to Structure now recognizes Chinese names. In addition to Compound Registration (released in January), five new products were announced at the UGM: Marvin for JavaScript, Instant JChem Web Client, REST Web Services, Metabolizer to predict human metabolites, and the Plexus project to provide a simple web application for chemists. The staff is excited about the new version. Miklós Vargyas provided an under-the-hood description of how ChemAxon develops and maintains software using the agile development method Scrum.

Another important thread of the meeting was the use of ChemAxon software in both customer’s workflows as well as in the products of partners. Four user talks addressed the design of compound libraries for HTS and/or software to support HTS. Collaboration was also a popular topic, either software to support collaboration within a working group or to expose cheminformatic software to outside scientists while keeping the structures of the compounds private to the outside scientists. Two new standards were discussed: the HELM standard for macromolecular representation developed at Pfizer and open sourced via The Pistoia Alliance, ChemAxon made the editor company agnostic; and the proposal to develop an Assay Definition Standard that would define assays, experiments, and project concepts in a structured language. Other user talks highlighted Tversky similarity, conformational searching, information to be gained by searching Markush structures of patents, and chemical information on the web.

Partners continue to use ChemAxon software in ELNs, in registration systems for small and macromolecules, in structure-activity and patent databases, and for property calculations. Name-to-structure is used in three partner products, two of which use natural language processing to identify key relationships. Patcore uses ChemAxon chemical information tools in an application that checks structures that are included in government regulations. KNIME offers ChemAxon components for pipelining.




Social Aspects and Demographics

· return to TOC
ChemAxon User Group Meetings are known for their delightful mix of serious science and technology enhanced with a spirit of fun. This is captured in the meeting thumb drive that is shaped like a duck and the orange robes again emblazoned with a duck (as well as the ChemAxon logo) that we wore at the “Gala Dinner”.

The UGM was held at the Hotel Novotel Budapest Centrum. This is an excellent facility—there are good sight lines and sound systems in the meeting rooms and the food is excellent. Of course, at a meeting such as this serious networking is also the order of the day—the hotel provided lovely miniature sweets and savory treats that fostered moving about the break room. The evening before the training sessions a large group walked to a nearby restaurant/bar for food and drink. The night before the first day of the meeting buses took us to the traditional garden party, cold and rainy as usual, held at the company site. The informal atmosphere encouraged participants to socialize. In a repeat performance, the largest event, labeled as “Gala Dinner” in the program, took place at the Széchenyi Bath House. We changed into our bathing suits, donned our orange robes emblazoned with the ChemAxon logo, slipped into ChemAxon flip-flops, and enjoyed a warm evening on the terrace of the baths. One special treat was provided by Yoshiko Matsumoto from Patcore as DJ; the second treat was a live band of ChemAxon employees. Their music encouraged enthusiastic dancing by the younger set. After the close of the meeting on the second day Alex Drijver led many of those remaining on a 90-minute subway ride and walk, with commentary, through Budapest. After this we joined the others at the Anker'T ruinpub for yet more food, drink, and networking. Photographs from these events are available on the ChemAxon web site event gallery.

Eighty-four delegates from 56 institutions registered for the meeting. They represented 44 companies and 12 non-profits or government agencies. In addition, ChemAxon had 128 representatives—much of their staff. Of major pharmaceutical companies, AstraZeneca, Boehringer Ingelheim, Eli Lilly, GlaxoSmithKline, Merck KGaA, Novartis, and Pfizer were represented. In addition there was one participant from each of six universities.


Keynote

· (View the Presentation) ·return to TOC

The meeting opened with a Keynote at which Alex Drijver queried Oliver Wissdorf from Boehringer Ingelheim about how and why they decided to migrate to ChemAxon products. The role of the IT department is to provide state-of-the-art software for innovation that is both flexible and persistent. BI selected ChemAxon because it provides a balance between cost, functionality, innovation support, responsiveness and reliability. In their selection process, they emphasized the need to use the same software at all sites. They decided on the software following an analysis of the various sites and local needs, the legacy back-end and front-end systems, gaps in capabilities in existing systems, workflows, and dependencies between systems. IT worked closely with the research organization in a council established by management to formulate a plan. ChemAxon was chosen because it provides flexible, modular, analysis capabilities based on non-redundant data sources. The strong management support encourages user acceptance.

Dr. Wissdorf expects that as time goes by there will be an increasing need for data exchange with partners, for support of users' mobile devices, for the analysis of large data by bench scientists, and for the integration of public and company data. He also expects that ChemAxon will keep up with these needs.


About the ChemAxon presentations

· return to TOC

The focus of ChemAxon’s talks was on the just released, Version 6.0. Marvin, JChem, JChem for Excel, Instant JChem, JChem for SharePoint, and JChem Web Services have been updated to provide new capabilities. Name to Structure now recognizes Chinese names. In addition to Compound Registration (released in January), five new products were announced at the UGM: Marvin for JavaScript, Instant JChem Web Client, REST Web Services, Metabolizer to predict human metabolites, and the Plexus project to provide a simple web application for chemists. The staff is excited about the new version.


Desktop track: Applications for chemists

· return to TOC



Jon Patterson started the session by reminding the group that the ultimate success of the ChemAxon platform is if scientists actually use it. These users want software that has the functionality that one needs, that one can afford, and that is intuitive and easy to use. (Read the Slides)

Eufrozina Hoffmann described the enhancements to Marvin. These include the options to display lone pairs as lines, improved bond fitting, user-selected bond scaling, unlimited number of S-group (superatom) attachment points, server-side image import, and modifications to the GUI. Their plans include providing publication quality drawing and better support for text boxes.

The Marvin team has also developed Marvin for JavaScript, which is lightweight, easy to integrate, and runs in a browser. It includes many of the features of Marvin, but there are plans for more query features, support for reactions and reaction queries, lone pairs, custom templates, and ultimately, 3D display. (View the Presentation)

Petr Hamernik described Instant JChem 6.0, using a sample real-world workflow to show its integration with the range of ChemAxon products. He demonstrated improvements and simplifications to the overall user interface, including a new dashboard. Other features presented included importing structures, substructure searches and property calculations on these structures, exporting to a new table, and connecting with assay data in another file. The IJC-Spotfire integration allows one to select items in one application and see them selected in the other. Version 6 includes enhancements to form design. He also introduced their latest product: a web-based client for Instant JChem. One can use it to browse, search, manage lists, use forms saved from IJC, and export to various formats. (View the Presentation)

Anna Gulyás-Forró reported on the enhancements to JChem for Excel. The main focus has been on usability. For example the basic search options are the same as the IJC web client and the R-group decomposition option screen has been simplified with detail provided by mouse-over. Improvements of the ribbon include the option to use either standard or advanced menu sets or to use a custom Chinese or Japanese menu. She demonstrated R-group decomposition to produce a table with the R1 substituents as rows and the R2 substituents as columns. This process will be completed with a one-click SAR table option in the next release. Future work is to extend JChem to all Microsoft Office products (View the Presentation)

A presentation by Tímea Polgár showed ChemAxon products for 3D visualization. MarvinSketch and MarvinSpace are familiar to many readers. Marvin Space has the required facilities for viewing macromolecules and small molecules as well as ligand-ligand alignment for pharmacophore perception or for ligand alignment within a protein pocket. The tie between MarvinSketch and MarvinSpace allows one to edit a ligand in 2D and view it in 3D. However, the new Marvin for JavaScript and the new WebGL-based MarvinSpace expands the possibilities by supporting the integration of visualization with complex display pages. (View the Presentation)

András Strácz described the evolution and features of ChemAxon's new product, Plexus. This is a desktop application that works in browsers. It grew from ChemAxon's experience with chemicalize.org. Chemicalize.org now has a database of 378,000 structures and 642,000 names. When these structures and names were added to PubChem, it added 90,000 unique structures to the database. The intuitive interface is one attractive feature of chemicalize.org. Plexus is designed with the look and feel of a web application with clear and intuitive layout and high visibility of important actions. Marvin for JavaScript fits well. The original focus of Plexus is library design by enumeration from Markush structures. It includes a data table view with import, searching, and export functions. In addition, there is a view for all data on one structure. Plexus includes Document to Structure, JChem Base, Standardizer, Markush Enumeration, Calculator Plugins, and Screen for similarity search. Current plans are to provide project and user management, Reactor, JKlustor, and data visualization. (View the Presentation)

Yoshiko Matsumoto from Patcore described the CRAIS Board, a whiteboard that supports collaborative discussions involving chemical structures. CRAIS Board combines a collaborative drawing tool with a chemistry engine that is based on JChem Base and Marvin. It supports sharing and editing of one structure diagram by multiple users, retrieving information from databases, and storing the discussion history. Structures may be added to the viewing pane from a drawing or database lookup or structure search. Other data is also retrieved from databases and can be used as search filters. Users can select which users can see or edit chemical structures. The history log can highlight recent changes. There are plans to add property calculation via web services, a check if a drawn structure is already registered, and touch-panel control. Patcore is actively seeking partners or tools that will further enhance development of the product. (View the Presentation)

Derek Marren from Eli Lilly described their Open Innovation Drug Discovery program. It allows academic and biotech investigators to submit their compounds for screening in Lilly’s proprietary assays. Users enter or edit their compounds in a website that accesses Lilly computational tools but is not seen by Lilly personnel. Once compounds are selected to be submitted for biological testing they are evaluated with filters for compound desirability and lack of similarity to Lilly compounds. Those that pass the filters are not identified by structure, rather by a structure fingerprint. The testing results are reported back to the submitter, who is free to publish them. Lilly’s only stipulation is that it be granted first right of refusal if the compound is offered for development. (Read Abstract Here)

Karen Worsfold from GlaxoSmithKline reported on their latest effort at deploying Instant JChem as a replacement for ISIS Base. This involves at approximately 1200 scientists across six countries. IJC replaces 400 ISIS Hviews. By the end of 2012 users could no longer access ISIS databases and Hviews. The talk is an update of the status of an 18-month project with the main focus on performance issues. The structures and data are stored in the UK, except the ACD database stored in the US and some inventory information stored in France. The US sites immediately complained of the long time that it took to do an assay or project ID search or to move from record to record. This was solved by having users in the US, China, Singapore, France, and Spain access the data via Citrix at a cost of a longer start-up to the login screen. Now every action except start-up takes less time than ISIS. However, performance issues remain with database searching and browsing.

A specific challenge at GSK is that R&D IT has changed its support model to that in which “Business Experts”, aka scientists, self-support. They developed a SharePoint Portal to aid this change. A difficulty with this new model occurs when an IJC project schema is modified in the development environment. Although there is a tool to support this activity, GSK would prefer that it would be part of IJC. A survey revealed that Project Owners spend an average of up to five hours a month creating or changing IJC projects. Although the figure will go down with more experience, 29% complained that they have to retrain each time they need to make a change. GSK will address these issues with additional training and better on-line help.

To enhance performance GSK plans to optimize Citrix, create different views of biological data tables, and replicate data to additional sites. They also plan to upgrade to a newer version of IJC. She made a plea for ChemAxon to establish a security model for authentication and authorization so that GSK scientists and external partners can see only the data for which they are authorized. They would also appreciate improved performance of IJC so that they could eliminate the need for Citrix as well efforts to make IJC less “chatty” and easier to modify projects. (It should be noted that GSK was using IJC 5.7.2.1, so some of the performance issues may have already been addressed.) In summary, GSK has migrated from ISIS to IJC and are addressing performance and other issues. (Read Abstract Here)


Platform Session: Toolkits for building systems and services

· return to TOC



Mihály (Medzi) Medzihradszky briefly described the ChemAxon platform showing how the various capabilities interact to make an efficient, robust, and scalable system that can handle millions of structures and thousands of users while being enterprise ready. One central node in the system connects the Java, .NET, and web services API. The other central node is JChem Base and Marvin, which connect to JChem for SharePoint; to compound registration and Standardizer & Structure Checker; to Instant JChem, JChem for Excel, Plexus, and 3rd Party Apps as well as Discovery Toolkit. (View the Presentation)

Ákos Papp next described ChemAxon's Compound Registration, which was released in January and is now updated to version 6.0. A web based GUI supports form-based and bulk registration, corrections to submissions, audit, administration, search, and reporting. The system is highly configurable to match a corporation's business rules for standardization, preferred structure editor, and the format of the database for the structures. (View the Presentation)

István Rábel and Imre Barna described how Standardizer and Structure Checker can be integrated into corporate workflows and distributed within a company. This integration ensures that all compounds registered into the corporate structure database are standardized and checked with the same rules. They also presented how the users can write their own standardizer actions, checkers and fixers, and how these can be integrated into the system. The integration is simple: once the code is written it is packed into a Java archive, which is transferred to the server. If necessary, the archive is then integrated into a configuration, which is also copied to the server. The result is that everyone uses the same Standardizer and Structure Checker options. (View the Presentation)

In a departure from the usual talk about the capabilities of ChemAxon software, Miklós Vargyas presented an overview of Scrum, the method used at ChemAxon to develop, maintain, and support JChem Base and JChem Oracle Cartridge. The issue is how to respond the increased size of the user community, which doubles every two years, with the same size staff. Scrum is a formal method that supports fast and robust software development. It has two main features; the specific roles of various people and specific strategies for rapid development. The Product Owner's role is to represent the customer, typically by writing and prioritizing simple user requests that details the need in one sentence and the acceptance criteria in a second sentence. The Product Owner also maintains the list of these needs and their priorities. The role of the Development Team is to deliver software at the end of each sprint, typically a few weeks long. In any one sprint the team focuses solely on the deliverables that they have agreed on with the Product Owner. The actual coding is done by pairs of developers. The physical setup of the offices reflect the Scrum method of operation—each team of developers works together in a large room that usually has a large diagram of progress of the sprint underway. Miklós illustrated the process with the work on improving a full tautomer search to finish in reasonable time using reasonable memory. The average full fragment search time decreased from 45 seconds in 5.11.4 to less than one second while using less memory. (View the Presentation)

Gábor Guta reminded the audience that until Version 6.0, JChem Web Services were based on the SOAP protocol. ChemAxon decided to move to the REST technology because it provides services to build JChem thin clients, provides an easy to learn access to JChem products from non-Java environments, and it provides a cost-effective way for customers to integrate JChem products. Currently all features of the existing web services product have been migrated except Standardizer and Reactor, which will be in Version 6.1. In addition, the REST product supports administrative functions and will support authentication in the next version. Plans for the product include exposing all of the existing ChemAxon features, making the Java side extendable, and integration with the Registration System and IJC. (View the Presentation)

Attila Szabó described how users can collaborate and search using JChem for SharePoint. It is easy for a user to create collaboration sites to be used by colleagues at different physical locations. JChem functionality is available in blogs, discussion boards, and Wikis. JChem for SharePoint includes security features that support collaboration with CROs, facilities to track and reverse changes, and the ability to do a structure, substructure, or similarity search on SharePoint documents, including JChem for Excel Workbooks. Future development will include support for all JChem search options, integration of Marvin for JavaScript, optical structure recognition, and the ability to search ISIS and ChemDraw for Excel files. (View the Presentation)

Bernd Rupp from the Leibniz Institute for Molecular Pharmacology described their design of a library to be used for HTS. Their library now contains approximately thirty-six thousand compounds obtained from many different sources. They also maintain a virtual library of available chemical substances, now approaching 30 million structures, that contains not only structures but also ordering information. They developed the FMP-Data Management Tool to process vendor SD files, correct structures, and map data from the SD files to their database structure. This library can also be used for docking and other virtual screening exercises. They plan to connect to other public databases and to develop virtual screening and docking interfaces. (View the Presentation)

Michael Dippolito from Deltasoft discussed the challenges and opportunities offered by migration from one chemical information system to another and how his team helps manage this process for customers. A company might decide to migrate because the current software has become obsolete, because two companies with different software systems have merged, or because the current system has become too expensive to continue to use. There are several approaches to migration of chemical structures; the approach to use depends on the specific situation. (View the Presentation)


ChemAxon Science: Extending capabilities for discovery

· return to TOC



Douglas Drake presented a quick introduction to ChemAxon science and how it can help drug discovery. He highlighted the ability of Plexus to support open innovation, Metabolizer to forecast metabolic transformations of drugs, Screen for virtual screening, Reactor to manage chemical reactions and enumerate libraries, and ChemAxon tools to transform chemical names in documents to live structures. (View the Presentation)

Anna Tomin described ChemAxon’s reaction library. Reactor uses various types of chemical intelligence to support reaction selectivity, which is encoded in the reaction library. The knowledge-based library now contains more than 300 reactions culled from journal articles and SciFinder. Each reaction provides examples, a chemical description, experimental details, and references. Reaction classes include the recently added heterocycle formation but also heteroatom alkylation and arylation, acylation and related processes, carbon-carbon bond formation, reduction, oxidation, functional group interconversion, and functional group addition. Reactions for protection-deprotection will be added soon. Reactor is accessible from command-line interface, JChem for Excel, KNIME, Instant JChem, as a standalone application and will be available from Plexus and Web Services soon. (View the Presentation)

Miklós Szabó provided a cheminformatist’s view of how ChemAxon’s capabilities align with recent successful drug discovery strategies represented by 259 agents approved by the FDA. For example, Screen3D perfectly aligns structures that were reported as a “scaffold hop”. They showed that adding Screen3D to the analysis of the results in a JCIM paper (2010, 50:2079) revealed that Screen3D shape outperforms competing shape or fingerprint methods in identifying actives within the DUD dataset. As a current effort, Discovery Tools, including Screen and JKlustor, will be well integrated into Web GUI tools, JChem for Excel, Instant JChem, JChemBase, JChem Cartridge and into workflow tools, such as Knime and Pipeline Pilot. New in this version is a self-describing API for both the descriptors and the similarity metrics. They also showed a prototype of a web-based GUI for molecular profiling tool that includes list management charts to visualize complex data, and a tabular display. (View the Presentation)

György Pirok presented the new product Metabolizer that predicts and ranks the products of human liver P450s. Metabolizer operates on a library of 159 generic rules for human biotransformation collected from 897 articles. The goal of the product is to final all experimental metabolites with no structural errors. In addition, it should identify at least one major metabolite for most of the substrates and it would be nice if it identified most of the experimental metabolites correctly. It was realized that at this stage it is impossible to enumerate all and only experimental metabolites. The difficulty of reaching these goals is compounded by the ambiguous definition of what is a major metabolite as well as the lack of reliable data both on complete identification of all metabolites and especially on the relative rates of the competing reactions. To address the latter issue, a computational model was trained on observed metabolic pathways to estimate the relative rates of competing reactions. The GUI sorts the predicted products by their relative likelihood as well as molecules metabolized by that particular pathway. The user has control over how many generations of change will be considered. A test set of 310 substrates with 826 known metabolites was assembled and processed for four generations to produce 366,795 metabolites. It found all experimental metabolites with no structural errors; identified at least one major metabolite in 95% of the examples; and identified most of the experimental metabolites correctly. The software is available as a desktop application, a command-line tool, and an API. The human phase I library can be replaced or augmented with user information. (View the Presentation)

David Deng presented a summary of the new features in naming—Structure to Name and Name to Structure, including Document to Database, JChem for SharePoint, and chemicalize.org. When a structure is identified in a document not only the structure but also its location in the document is returned. Extracting chemical information from documents will be enhanced with the integration of two additional optical structure recognition engines: CliDE and Imago, in addition to OSRA. It is often necessary to correct OCR text before it makes sense as a chemical name. Name to structure does some of those corrections automatically. JChem for SharePoint now indexes chemical information, either as text or a structure object, in documents. Name to Structure can now use a remote Webservice to convert corporate IDs and names, in addition to the local custom dictionary.

The growing IP climate in China led to the development of a Chinese Name to Structure capability; this is now part of Name to Structure. There were many challenges to this project: Chinese texts have no spaces; Chinese characters can have different meanings; English names remove one of a double vowel; Chinese names are usually abbreviated; and the official numbering systems differ. Never-the-less, conversion of a test set of 38,600 Chinese names (including unusual and incorrect names, radicals and inorganic salts) with associated CAS numbers was successful 50 – 78% with an accuracy of 91%. They are searching for another test set to aid further improving the conversion rate. (View the Presentation)

Péter Englert reminded the audience that the calculation of the similarity between two molecules does not always agree with one’s perception of their similarity. Identifying the Maximum Common Substructure complements similarity calculations by providing a different view of the similarity of two structures. MCS calculations are also used in reaction atom mapping and 3D alignment. The new MCS module provides improved accuracy even up to cases in which the larger of the input molecules has >275 bonds. The run time has been reduced to a few seconds even with such large molecules. At the same time, testing shows that the accuracy is improved such that 95 – 100% of the maximum common substructure is found, even for molecules with >100 atoms. A further improvement is seen with memory usage, which is reduced ten-fold in the case of 130 atoms in the larger input molecule. The examples are truly hard cases and also show that the new version produces fewer disconnected fragments in the MCS. (View the Presentation)

Roland Knispel described the work at ChemAxon on tools for handling biologics. MarvinSketch 6.0 now supports the input of structures in FASTA format and allows the user to convert to this format MarvinSketch 6.0 also now includes provision for unlimited attachment points within S-groups, superatoms. Work in progress will allow Marvin to display bridges, cyclizations, and end groups that are generated when a bond is broken. In addition, they plan improvements to the clean-up when sequence residues are expanded into structure diagrams. (View the Presentation)

Sergio H. Rotstein from Pfizer reported on the emerging HELM standard for macromolecular representation. This is a project of the Pistoia Alliance, a precompetitive alliance that aims to lower barriers to innovation by improving the interoperability of R&D business processes. HELM, Hierarchical Editing Language for Macromolecules, was developed at Pfizer to address the gap between the representations of unmodified macromolecules and small molecules. The software has the ability display, store, search, and manipulate the structures of unnatural or modified biopolymers. Each monomer is stored as a full structure in the database along with an abbreviation that may be used in structure displays and the location of attachment points. By releasing HELM software to Pistoia they hope to make it the industry standard for the manipulation and exchange of data on such structures. ChemAxon took on the responsibility of making HELM agnostic to its Pfizer beginnings. It is available as open source at openhelm.org. The citation for the article is: T. Zhang, H. Li, H. Xi, R. V. Stanton, and S. H. Rotstein J. Chem. Inf. Model., 2012, 52, pp 2796–2806 DOI:10.1021/ci3001925.

The registration of biologics presents distinct challenges compared to that for small molecules: (1) Representation standards for various biological entities have not been established. (2) Workflows are not static. (3) Uniqueness might depend on the genealogy of the substance. ChemAxon has taken up this challenge by starting development of a Java-based biomolecule toolkit and web services. It will provide methods to standardize, canonicalize, and compare large biomolecules. It will provide database storage and support simple queries. In addition, the toolkit will provide full support of the HELM notation, including a editor applet. The HELM editor and notation toolkit has been externalized from Pfizer for the Pistoia Alliance project. The second phase of the biomolecule toolkit will provide will provide calculators, sequence similarity searches (e.g. BLAST), and provisional integration with ChemAxon's Compound Registration, the end of this year. Spring 2014 is the target date for bio-material registration. (View the Presentation)

Dragos Horvath from Université de Strasbourg-CNRS described his investigations of using Tversky similarity in virtual screening simulations. The commonly used Tanimoto similarity weight features present in the query and target equally; in contrast, Tversky similarity down-weights features found in only one of the molecules with a term labeled α. They investigated the effect of different choices of  on the ability of the function to identify pairs of molecules with equal potency. They found, as have others before them, that the Tanimoto similarity level that works best for virtual screening depends on the fingerprint used. Tversky at α between 0.9 and 0.7 is an excellent choice for virtual screening. It may identify actives that are more complex that queries. A manuscript describing these studies has been accepted by the Journal of Chemical Information and Modeling. (View the Presentation)

Ödön Farkas from Eötvös Loránd University presented their investigation of using cool dynamics to search for low-energy conformations. This is the basis of the new molecular dynamics plug-in for Marvin. Cool dynamics has the advantage of being able to relax bond angles and distances to prevent the production of artificially high energy, distorted structures despite the applied high temperature, which is necessary for efficient conformation search. Users can select to optimize the conformations with the MMFF94 or Dreiding force field. The new method outperforms RDKit, considered the best freely available tool in finding the bioactive conformation of molecules. (View the Presentation)

Ian Berry from Evotec described their analysis of 21 million commercially available compounds considered when they prepared EVOsource. EVOsource is their internal application for identifying compounds either available in-house, already ordered, or to order. Within the application one can order a compound from a supplier or request a quote. The system is based on the JChem Cartridge accessed through Java Persistence API. He reiterated the challenges of loading supplier catalogs: mainly fixing errors and removing expired data. They classify compounds based on SMARTS substructure alerts and physical properties. Although historically the properties were calculated with MOE, they have switched to ChemAxon tools because these will be available to all scientists and are easier to update. These updates are done weekly. However, their solubility prediction is based on MOE—this is manually updated every two months. They also converted from MOE to ChemAxon SMARTS, a process that had to be done manually. Their new process also calculates the QED of each compound—assigning it a high, medium, or low QED score to indicate if it is a “beautiful” molecule. QED classification agrees well with their previous empirical scoring system. In addition, the selection criterion for EVOsource is validated by the observation that EVOsource molecules and orally administered drugs have a similar QED distribution. The structures have been added to ChEMBL 16. (View the Presentation)

Björn Windshügel from the European ScreeningPort GmbH reported on their service to provide academics high-throughput and high-content screening as well as computational chemistry. They store their biological results and chemical structures in Instant JChem. They are deeply involved in the Innovative Medicines Initiative “New Drugs for Bad Bugs”. (Read the Slides)

Andrea de Souza from the Broad Institute described their work to create the Bioassay Research Database, BARD. It will augment the data in PubChem with a standardized representation of bioassays that will support sophisticated queries and data mining. The data will be represented in context with structured assay and results annotations. They have devised a simple means for data producers to identify key terms by including various ontologies in a hierarchical data dictionary. A Google-like interface provides searches over the 4000-plus assays with the ability to save the results, perform calculations on the molecules, or view the results in Starburst. From their work on BARD, it was obvious that a more structured definition of a screening protocol is needed. To address this problem, they have developed a prototype of Assay Definition Standard, ADS, in an attempt to define assay, experiments, and project concepts as experimental variables in a structured language. The standard will provide a controlled vocabulary for terms to be used to describe the assay. Community input is requested. (View the Presentation)

Christopher Southan from TW2Informatics presented a talk entitled “Chemicalize.org, SureChemOpen, PubChem, and the InChIKey: A heavenly conjunction with transformative utility”. He reminded users that the chemical structures from chemicalize.org and SureChemOpen have been added to PubChem and InChI keys are available for these structures. Because ChemSpider, another database of chemical structures and information, also includes InChI keys, Google now contains approximately 50 million InChI keys. He illustrated how substructure searching these vast data sources allows one to jump from a manuscript that describes a biological property of a molecule to another article and finally to a patent. He also described starting with a Google text search, identifying the structures with chemicalize.org, performing an InChiKey search with Google, then uploading biological results from PubChem. One can cluster the structures using the publically available CheS-Mapper, select one cluster for further analysis, and investigate the sources of data on these compounds with the free Venn Diagram visualization tool Venny. He concluded by saying that recent developments have now linked chemistry, biology, and documents and made this available in web searches. This will be augmented by open-access publishing. Complete realization of this vision is limited because extracting chemical structures from journal articles is still an ad-hoc effort, many journals restrict text-mining, authors are rarely required to submit the manuscript structures in a separate annotation, and reports from pharmaceutical companies of advanced testing on a compound rarely include its chemical structure. (View the Presentation)

Steve Hajkowski from Thomson Reuters discussed the analysis of Markush structures in patents. Their database contains 1.6 million structures from patents back to 1978. These structures can be loaded into JChem and then searched and enumerated. The search identifies not only the structure but also the relevant information about the patent including activities and mechanism of action. Such searching is used to assess novelty of potential inventions, to identify potential white-space in a patent, or to provide structures for further investigation. In 2012 there were more than 8000 patents that claimed new pharmaceutical Markush structures—Merck and Company and Roche each contributed more than 100 of these patents. The US produced the most pharmaceutical patents with China second and Europe third followed by Japan, Korea, India, and the rest of the world. The contribution is different for chemical patents with Japan, China, and the US producing three-fourths of the patents followed by Europe, Korea, and the rest of the world. He illustrated the datamining capabilities of Markush searching by showing a sub-structure search on

which identified 122 patents. Of these the most, 17, were from Merck & Co, but nine other companies also patented molecules with this substructure. There were two peak years of this patent activity, 1992 and 2008. No one Merck inventor is on all their patents of this class of molecules. In summary, Markush searching and enumeration of patent data opens the opportunity for the analysis of patent data in a whole new way. (View the Presentation)


Partner Session

· return to TOC



Many companies incorporate ChemAxon capabilities into their products or consulting business. Nóra Lapusnyik from ChemAxon introduced the session of “lightning fast” talks by reminding potential partners of the advantages of partnership with ChemAxon. The partner list contains at least forty companies that span the IT needs of drug companies, but also support scientific publishing, text mining and online education. (View the Presentation)

Renaud Acker from AgiLab described the first of their web applications, the ELN Chemlab. It is built on the Marvin applet, JChem Server, and JChem Cartridge. ChemLab supports a Gantt view of the project plans, bibliography, and attachments. Users can store and search reactions, batches, compounds, and starting materials. The hazards of potential starting materials for a reaction are highlighted with icons. They plan to incorporate ChemAxon Chemical Terms and to provide an online bibliography via PubChem and ChemSpider. (View the Presentation)

Jan Holst Jensen from Biochemfusion ApS highlighted the informatics difficulties faced when treating substances that arise from the chemical modification of polypeptides. Their product Proteax® provides a two-way translation by recognizing the abbreviations for standard amino acids and integration with MarvinSketch for chemical modifications. It provides a clean function for peptides within MarvinView. KNIME notes are available. (View the Presentation)

Diana Soto from DeltaSoft, Inc. reported that they have been a partner with ChemAxon since 2005. Their ChemCart suite supports drug discovery from reagent inventory through synthesis, registration, sample handling, bioassay, and SAR analysis. Her video demonstrated creating a ChemCart application with a ChemAxon back end in less than five minutes. (View the Presentation)

Ahmed Abdelaziz from eADMET GmBH described their web-based platform for predicting the pharmacokinetic, toxicity, and physicochemical profiles of molecules. Along with self-generated modules they use ChemAxon property calculators. They have recently added DMSO solubility, melting point, and aryl hydrocarbon toxicity models to their portfolio. (View the Presentation)

Jonathan Davies from IDBS, a research data management company, described the registration system they developed for their Inforsense Suite and the E-Workbook Suite of ELNS. ChemAxon's products such as Marvin, Calculators, and Reactor, are key components of their chemistry applications. They are considering integration of the JChem Cartridge and the possible use of Markush features in metabolism pathways. (View the Presentation)

Anikó Valkó from Keymodule Ltd described improvements to the CLiDE software that extracts structure diagrams from documents. The challenge is to not only detect the diagrams but also to find and correct errors. The latest version increased accuracy in documents from USPTO from 60 to 90%, the Maybridge database from 77 to 91%, and non-Markush structures in WO from 12 to 91%. Keymodule has integrated MarvinSketch and also Document to Structure into CLiDE. The batch version saves the structures in various formats, including MRV. (View the Presentation)

Aaron Hart from KNIME described JChem extensions for this pipelining tool. It already contained four cheminformatics contributions: RDKit, Indigo, Erl Wood Chemoinformatics, and CDK. The specific JChem extensions cover over 90% of ChemAxon's functionality. They were implemented by Infocom with support from ChemAxon. The Marvin Family Nodes contain Marvin Sketch, Marvin View, Marvin Space and Converter. These are offered free of charge to all customers. Other nodes read and write JChem for Excel files, perform R-group decomposition and Markush enumeration, access Property Calculations, and run a combinatorial enumeration. (View the Presentation)

David Milward from Linguamatics described how they use ChemAxon tools in their natural language processing product I2E to combine chemical identification and searching with text searching—for example identifying compounds and their melting points in a particular patent. Their product makes heavy use of Name-to-Structure, Mol Converter, and substructure and similarity searches. (View the Presentation)

Márk Sándor from Mcule.com described their business as an integrated drug discovery platform with two components: First, a compound procurement search that allows users to find and order compounds with only a few clicks, and, second, a web-based molecular modeling platform that provides high quality modeling tools. They use ChemAxon property calculators in their triage of compounds for purchase. (View the Presentation)

Imants Zudans from MolPort described their company’s easy-to-use compound ordering system. They provide support for EU customs clearance for compounds ordered through them. (View the Presentation)

Lutz Weber from OntoChem GmbH gave a brief talk entitled “Big Data Analytics: Semantic Knowledge Discovery”. Their OCMiner® uses natural language processing to extract knowledge triples from documents at a rate of 25 million pages of text a week. Their chemistry ontology, one of twelve, is built on SMARTS classifications. The relationships extracted support gaining knowledge from these relationships. (View the Presentation)

Yoshiko Matsumoto from Patcore Inc. described their product CRAIS Checker Server that identifies compounds that are subject to government regulations. Controlled substances might be drugs; toxic, deleterious, and dangerous compounds; compounds regulated because of a possible security threat; those related to industrial safety or environment protection; and those subject to international conventions and regulations. The server may be accessed from a web or windows client, via a SOAP interface, or as a batch process. The recognition is performed with ChemAxon JChem, Marvin, and Markush query language software. (View the Presentation)

Andreas Witte from Schrödinger reminded the group that they are now the distributors of Seurat, which is based on JChemBase, MarvinSketch and other ChemAxon modules. This fits nicely into their design platform that can show as the left pane a 2D chemical structure and its calculated properties and on the right pane a 3D visualization. (View the Presentation)

Elizabeth Piveteau from SureChem reported that since the last UGM they have deployed a new data generation and API architecture, completed regeneration of text backfile and images back to 2006, licensed ChemAxon property modules for chemistry filtering, and deposited the complete structure data collection into PubChem. They use Name2Structure as one of the structure-recognition programs in concert with Image2structure programs, clean up and normalize the structure with ChemAxon tools, and store structures in JChem Base. Chemical text annotations date from 1976 and images from 2006. The file contains nine million structures. Elizabeth reminded the audience that SureChemOpen is free to all users, whereas SureChemDirect API and SureChemDirect DataFeed are useful for integration and batch processing. Their plans include classifying the compound based on its use in the patent, links to additional online public data, and creating Knime nodes. An additional effort will focus on integrating SureChem with SciBite, a text mining application that provides semantically annotated current awareness. (View the Presentation)

Zsolt Skribanek from Sysment Kft. reported that their ELN handles both small molecules and proteins equally well. It uses Biochemfusion’s Proteax for proteins and ChemAxon’s JChem Base and Marvin for small molecules. A key feature is that its reaction tool easily handles the reaction between a protein and a small molecule. (View the Presentation)

Andrew Daniel from Certara/Tripos described their D360 product that provides the scientist with the ability to access, transform, display, and share data in many different ways. One of its main features is that it federates data from multiple sources and makes this data searchable from one desktop application. Search results are easily sent to external programs such as SpotFire, Excel, or a chemical inventory application. It has recently partnered with ChemAxon so that it can connect to MarvinSketch and store and search structures in JChem Cartridge. (View the Presentation)


Summary

· return to TOC

The release of Version 6.0 was an important thread of the meeting. Not only does it continue the tradition of improvements in performance, but new capabilities of existing products and whole new products were announced. Marvin, JChem, JChem for Excel, Instant JChem, JChem for SharePoint, and JChem Web Services have been updated to provide new capabilities. Name to Structure now recognizes Chinese names. In addition to Compound Registration (released in January), five new products were announced at the UGM: Marvin for JavaScript, Instant JChem Web Client, REST Web Services, Metabolizer to predict human metabolites, and the Plexus project to provide a simple web application for chemists. The staff is excited about the new version. Miklós Vargyas provided an under-the-hood description of how ChemAxon develops and maintains software using the agile development method Scrum.

Another important thread of the meeting was the use of ChemAxon software in both customer’s workflows as well as in the products of partners. Four user talks addressed the design of compound libraries for HTS and/or software to support HTS. Collaboration was also a popular topic, either software to support collaboration within a working group or to expose cheminformatic software to outside scientists while keeping the structures of the compounds private to the outside scientists. Two new standards were discussed: the HELM standard for macromolecular representation developed at Pfizer and open sourced via The Pistoia Alliance, ChemAxon made the editor company agnostic; and the proposal to develop an Assay Definition Standard that would define assays, experiments, and project concepts in a structured language. Other user talks highlighted Tversky similarity, conformational searching, information to be gained by searching Markush structures of patents, and chemical information on the web.

Partners continue to use ChemAxon software in ELNs, in registration systems for small and macromolecules, in structure-activity and patent databases, and for property calculations. Name-to-structure is used in three partner products, two of which use natural language processing to identify key relationships. Patcore uses ChemAxon chemical information tools in an application that checks structures that are included in government regulations. KNIME offers ChemAxon components for pipelining.




Return to Table of Contents