2012 Boston UGM Meeting Report

news · 8 years ago
by Yvonne Martin

Social Aspects and Demographics Keynote
CEO Wrap-up Overview of Talks
Compound Registration Compound Workflow Support
Migration from ISIS/Daylight Solutions JChem
ChemAxon's Informatics Service Team Integrating text and chemical structures
Marvin/querying Processing HTS Results
Handling Patents Using ChemAxon components in teaching
Cheminformatics in the cloud Linking Diseases to Targets to Compounds
Open Innovation Integration with Spotfire
NMR Predictor Metabolizer
Partner Sessions Pre- and Post-meeting Sessions

Overview of Talks
An important thread of the meeting was developing registration systems using ChemAxon tools. Although each institution has different requirements, ChemAxon will offer a customizable registration package soon. Registration is also integrated into compound progression work-flows and reagent purchasing and tracking. Four talks from customers reported their successful migration from ISIS or Daylight to JChem. Another theme of the meeting was integrating text with chemical structures and, in two cases, with targets and diseases. Although SharePoint is used at most companies, no user reported experience with JChem for SharePoint. There were also talks on Marvin/querying, processing HTS results, patent databases, Markush structures, ChemAxon in teaching and in the cloud, open innovation at Lilly, and the calculators NMR Predictor and Metabolizer.

Social Aspects and Demographics

· return to TOC
ChemAxon User Group Meetings are known for their delightful mix of serious science and technology with a spirit of fun. The informal tone of the meeting was established when I saw Alex Allardyce in a kilt accessorized with the appropriate shoes and stockings but topped with a Hawaiian shirt over a ChemAxon t-shirt. The photo gallery shows photos of his outfit but also an indication of the informal information exchanges that are such an important part of such meetings.

The UGM was held at the Babson Executive Conference Center in Wellesley MA. This is an excellent facility—there are good sight lines and sound systems in the meeting rooms and the food is excellent. Of course, at a meeting such as this serious networking is also the order of the day. The evening before the formal sessions we loaded into buses for a ride through the New England countryside to The Sherborn Inn in Sherborn MA. There we were greeted by drinks and a delicious buffet followed by musical treats from various attendees. The night after the first day of the meeting we again boarded buses, this time to be taken to King's Bowling Billiards and Lounge in Boston. Several bowling lanes were reserved for us, but equally popular were the billiards tables and skee ball lanes. The jovial atmosphere and the fact that food arrived at seemingly random places encouraged participants to mingle. The dinner after the second day included mainly those who would attend training the next day. It was held at the Babson and afterward the group adjoined to a private lounge with yet another billiard table.

Ninety-four delegates from 60 institutions were registered. In addition, ChemAxon had 34 representatives—a little more than 25% of their staff. The make-up of the delegates was 23 from major pharmaceutical companies, 24 from partner companies, with the remainder from start-ups, chemical industry, universities or research institutes, etc. Of major pharmaceutical companies, AstraZeneca, Boehringer, Bristol-Myers Squibb, Eisai, Eli Lilly, GlaxoSmithKline, Merck, Novartis, Pfizer, Takeda, and Vertex were represented as were Dow Chemical, DuPont, and Honeywell. In addition there was one participant from each of six universities.

The tone of the meeting was different from that of other user-group meetings that I have attended. The most striking difference is the interaction of ChemAxon with the users—Many talks emphasized the fact that ChemAxon's employees truly listen to comments and usually quickly respond with changes to the software. As a result, the users expressed their appreciation for the interactions with ChemAxon staff. It is unusual for a UGM that I heard no complaints or grumbling! In addition there were no high-powered sales pitches nor was there the attitude that they know best about a particular issue. The ChemAxon talks were straightforward reports of the current state and planned improvements of the software.


· (View the Presentation) ·return to TOC

The meeting opened with a Keynote entitled “Future-proofing—Managing Systems in Rapidly Changing Times”, particularly the role of cheminformatics in the future of the pharmaceutical industry. Rather than a formal talk, it took the form of a conversation between Alex Drijver, CEO of ChemAxon and Ramesh Durvasula, Director for Molecular Sciences, Candidate Optimization & Drug Safety at Bristol-Myers Squibb. The dynamic interplay between the two speakers illustrated the key role that customer interaction plays in the vision and workings of ChemAxon. A major challenge to the pharmaceutical industry is managing systems in rapidly changing times while doing more with less. For example, automation in chemical purification changes the chemistry work-flow, which in turn impacts systems designed for a more traditional work-flow. In addition, registration systems are challenged by the use of impure compounds in screening, by the need to register macromolecules, and the widespread use of electronic notebooks, ELN. A key question is if registration should be built into ELN or whether ELNs make registration obsolete. A second challenge is that the pharmaceutical industry is increasing the use of cloud computing and hosted software. This presents a challenge to companies about data security but also to the software companies as to how to license such uses. The take-away message for ChemAxon is that in this environment the software vendor needs to adjust to new realities, to listen and understand customers' needs, to deliver as promised, and to become a real partner in solving problems. Further talks in the meeting will emphasize how ChemAxon meets these challenges.

CEO Wrap-up

· return to TOC

Alex Drijver presented the wrap-up talk. He emphasized that ChemAxon has increased its presence in the US and is opening its new East Coast Headquarters in Cambridge, MA. The company continues to grow—It now has 140 employees world-wide. This growth is testament to both the quality of the software and the care and attention that ChemAxon pays to its customers. The expansion necessitated moving to a larger and less quaint space in Budapest.

Alex is especially proud of the Real IT Award that GSK received for its project, in which ChemAxon was a key player, to support research for cures of diseases in the developing world. It shows the power of collaboration.

By the end of the year there will be roll-outs of GSK Registration, Document to Structure and Structure to Document, Screen 3D, NMR Prediction, SharePoint implementation at customer's sites, marvin.js, and Web Services. The philosophy of development is to provide consistency of platforms and the underlying core, and key product availability.

Overview of Talks

· return to TOC

An important thread of the meeting was developing registration systems using ChemAxon tools. Although each institution has different requirements, ChemAxon will offer a customizable registration package soon. Registration is also integrated into compound progression work-flows and reagent purchasing and tracking. Four talks from customers reported their successful migration from ISIS or Daylight to JChem. Another theme of the meeting was integrating text with chemical structures and, in two cases, with targets and diseases. Although SharePoint is used at most companies, no user reported experience with JChem for SharePoint. There were also talks on Marvin/querying, processing HTS results, patent databases, Markush structures, ChemAxon in teaching and in the cloud, open innovation at Lilly, and the calculators NMR Predictor and Metabolizer.

Compound Registration

· return to TOC

Julian Fowler from Constellation Pharmaceuticals described how they solved the problem of registering compounds so that the structures are consistent with the rules of a collaborator. At the start of the collaboration, Constellation realized that their registration system was inadequate and that they had to quickly develop a replacement. For the replacement they used ChemAxon's Structure Checker, Standardizer, and JChem Cartridge They implemented the same complex registration rules as the collaborator within Standardizer. The new registration system was in place within six months. (View the Presentation) ·

Compound Workflow Support

· return to TOC

Matthew Pustelnik presented how ChemAxon products facilitate the Takeda compound progression workflow process. The workflow is designed to streamline the drug discovery process by providing timely information and tools to project teams. It is an essentially a concept-to-clinic chemical tracking and decision-support system that empowers project teams to make decisions by providing all relevant information about compounds “in play”, recording team decisions about these compounds, and triggering advanced testing based on team decisions. Compounds move from registration, to primary testing, to “no further interest” or to best-in-class and ultimately to lead status. All compounds not relegated to “no further interest” are discussed at compound progression team meetings facilitated by project leaders. The system records decisions made on the progressability of each compound. For the workflow management Takeda built custom in-house software systems for compound registration, compound management, querying, decision capture, project home pages, and action configuration. Compound registration triggers compound delivery for the first phase of testing, which is configurable by each project and changeable as project needs change. Results of the testing are captured and reported back to the project team as part of the project home pages, which contain tabs that link to advanced testing as compounds are progressed or removed from active consideration. ChemAxon JChem Base is used for compound registration business logic and structure image generation and rendering; the JChem Oracle cartridge is used for structure normalization and structure searching; the calculators support decision making; and Instant JChem is used for compound information and visualization. (View the Presentation) ·

Migration from ISIS/Daylight Solutions

· return to TOC

Debra Kassabian from Novartis discussed their in-progress efforts of migrating from ISIS to ChemAxon (and CambridgeSoft) software. JChem Cartridge will be used for chemical searching, whereas ChemDraw will replace ISIS/Draw and ChemDraw/Excel will replace ISIS for Excel. In addition, Instant JChem was selected as a replacement for ISIS/Base. However, Novartis requested a few new features that were promptly provided. Migration of local ISIS databases to Instant JChem is being accomplished with hands-on sessions.

Debra described the more challenging issue of converting the historical collection of Novartis reactions—this required a new web application because the data are stored in several ISIS/Host databases with different designs as well as in their CambridgeSoft ELN. The solution, implemented by ChemAxon, was to bring and standardize into JChem only the information needed for searching (reaction, date, chemist, source, project, yield, solvent, reagents) and to relegate other information to a “details” page that is different for each original data source. At this point the 400K historical reactions from ISIS have been entered and they are working to capture the 600K additional reactions from their ELN. Debra summarized the collaboration with ChemAxon with the phrase “great support and responsibility”. (View the Presentation) ·

Zhenbin (Benjamin) Li from Boehringer Ingelheim Pharmaceuticals, Inc. described their chemistry infrastructure migration. The main challenge at BI was to migrate away from the ISIS platform while maintaining all of the functions of the current infrastructure. Their consideration of vendors centered on the quality of the software, the availability of consulting and customization, the financial aspects of migration, and the culture of the company. As a pilot project they migrated their external compound request management system that integrates reaction parsing, inventory management, ordering from the vendor, tracking shipment, as well as customs issues. In summary, the migration was accomplished easily in 60 hours, providing a benchmark for migration of other chemistry applications. As well as the technical issues of migration, Li emphasized that it will be important to curate the data in the process and to involve key stakeholders in all decisions. (View the Presentation) ·

Continuing on the theme of integration of ChemAxon capabilities into an environment that contains legacy databases, Jim Engler from The Dow Chemical company reported on the efforts of Becci Ruyts and himself. Dow's historical registration application consists of two separate databases, one for chemical structures and one for associated data. The challenge at Dow is that they register not only single small molecules, but also mixtures, polymers, reaction products, and biological materials. In a collaboration with ChemAxon and DeltaSoft using rapid cycles of design, test, and deploy they developed an updated database that contains both single molecules and multiple component systems and supports structure and substructure searching of all. It supports registering indefinite structures, for example with an unknown attachment point for a substituent; coordination compounds; and polymers by molecular formula. The system is up and running with 450,000 chemicals registered of which 281,322 are single molecules and the remaining, mixtures, polymers, etc. He concluded by stating that the JChem Cartridge is robust and flexible; that ChemAxon's partnership with DeltaSoft provided a superb registration application; that ChemAxon focuses on the customer and customer needs and expectations; and that Dow implemented a system that is easier for registration, easier to manage, and gives better service to customers. (View the Presentation) ·

Continuing the high-praise-for-ChemAxon theme, Rama Bhamidipati from GlaxoSmithKline reported on their collaboration with ChemAxon. The goal was to replace and enhance the functionality of their legacy registration system that was based on ISIS and Daylight software. One goal was to move away from the use of SD files and to instead use a submission web service to transfer chemical structures and support bulk registration. The project represented a learning experience for both ChemAxon and GSK as they struggled with issues of handling tautomers, peptides, and records with no chemical structure. They were also uncertain about their mutual ability to migrate more than five million records. The year-long process was managed by aggressively frequent communication, extensive use cases, and rapid reporting and fixing of bugs. One happy result was that the transition of the business rules in Cheshire to ChemAxon's Standardizer and Structure Checker was especially smooth. The system went live in March with new releases scheduled to add additional functionality and handle issues raised in the early use. The legacy system has been retired. One disappointment is that although they initially hoped that the system would be totally outside their fire-wall and managed by a vendor, data security issues led to the system being inside GSK. (View the Presentation) ·


· return to TOC

ChemAxon's Szabolcs Csepregi presented an overview of the JChem family. The core components are JChem Base, JChem Oracle Cartridge, JChem Web Services, Markush search and enumeration; and JChem GUI applications and integrations that include Instant JChem, JChem for Excel and JChem for SharePoint. In addition there are ChemAxon functions integrated with the JChem database platform; Marvin, Standardizer, Calculator plugins, Reactor, Screen, Naming, and Document to Structure. Integrations include connectors to KNIME, PipelinePilot, Spotfire, and InfoSense. In addition, partners are also extending the capabilities of JChem.

Szabolcs presented a case study on the use of JChem at Evotec for compound registration, compound sourcing, and chemical information management. Hundreds of users are spread out over Germany, India, UK, and US. A second case study highlighted the polymer registration system at Dow Chemical integrated by DeltaSoft. It is being used by 2000 scientists.

Szabolcs also reported that searching speeds have increased four-fold with queries involving charge, isotope, and query properties. junique is a new command-line program to filter out duplicate structures. In addition, JChem Manager has improved the connection dialog, JChem Oracle Cartridge has improvements in error handling, and Markush searching speed has improved along with new features and increased robustness. A new Markush viewer has also been developed. (View the Presentation) ·

ChemAxon's Petr Hamernik reminded the group of the important role that Instant JChem plays in the deployment of ChemAxon tools to the user community. This was very evident in their involvement with GlaxoSmithKline, Novartis, and Bristol-Myers Squibb. Petr used a novel presentation method that you can see it by following the link to his presentation at the UGM http://bit.ly/ijc-usugm-2012. The Instant JChem developments have focussed on improving performance and visualization capabilities while also addressing interoperability with MSOffice and Spotfire, improving scripting, and adding more documentation, examples, and tutorials. Especially impressive was the ability to copy-and-paste from Instant JChem to JChem for Excel. New visualizations include interactive Box plots, tree tables, and the ability to connect to more than one database table. Aligning with ChemAxon's philosophy, the features to be added are generated from user requests. (View the Presentation) ·

ChemAxon's Iván Solt demonstrated using JChem for Excel and Instant JChem to manage compound databases. Using JChem for Excel he showed using other sketchers, Structure to Name and Name/SMILES/InChI to Structure conversions. It is not necessary to show the structures even though they are present behind the view. A new function allows one to embed the structure diagram in a SMILES so that the original orientation is preserved when the diagram is regenerated. It is now possible to copy Marvin structures as OLE objects.

ChemAxon's Erin Bolstad continued the discussion of managing compound databases to elaborate on ways to extend basic Instant JChem: through Java Extension Modules and through Groovy Scripting. For Java development, an event model hook for double-clicking is now available. Event model hooks in development include selection and insert/query fields as drop-down lists; future development will include hooks for right click and customized add/delete row. Of course, JChem can also be extended by accessing external libraries. As typical for ChemAxon, they are eager for customer suggestions for options. Lastly, the scripting capabilities introduced in 2011 provide a simpler way to extend the capabilities of Instant JChem. Erin showed the example of a registration system that was designed at the Mount Sinai Hospital to register research compounds, track their location, and associate spectra with compounds. The database was cleaned with a script and another script was written to register new compounds. Scripting was further explained in the developer training later in the week. (View the Presentation) ·

Tamás Pelcz showed how ChemAxon's JChem extensions for Microsoft SharePoint differ in utility from Instant JChem and JChem for Excel. Specifically, the JChem for SharePoint allows users to collaborate and share information about chemistry. ChemAxon has added the ability to provide a list of editable chemical structures with associated calculated properties; a method to track and reverse changes to structures; a substructure or similarity search filter; images of structures with calculated properties such as bond charge or polarizability; and chemistry-specific blogs, discussion boards, and Wiki's. JChem for SharePoint also supports searching for structures embedded in documents. (View the Presentation) ·

ChemAxon's András Strácz described how the experiment with chemicalize.org produced important results for ChemAxon. From the beginning, chemicalize.org concentrated on user experience with intuitive features, persistent settings, simple start, and speed. This, coupled with universal access, led ten thousand users to analyze 323,000 web pages that yielded 483,000 names that are associated with 291,000 unique structures, while another database gathered more than a million unique molecules. As an outgrowth of this experiment, ChemAxon will offer the all new JChem Web Services and JChem Web GUI products in early 2013. These will offer feature parity with desktop applications, support phones and tablets, and keep with chemicalize.org's intuitive and convenient user experience. The software will provide configurable workflow, collaboration tools, import and export anywhere and cloud support. (View the Presentation) ·

ChemAxon's Informatics Service Team

· return to TOC

ChemAxon's Tim Dudgeon described their new informatics services team that provides consultancy, application development, and migration support. Members are seasoned problem solvers with expertise in cheminformatics, bioinformatics, information technology, and ChemAxon and competitor products. To support this effort, ChemAxon is creating consultancy toolkits to assist data migration and integration of products as well as components for building applications.

Examples of their efforts include migration of ISIS forms to Instant JChem for hundreds of projects at GSK and migration of hundreds of local ISIS databases to Instant JChem at Novartis. A second example is their work at EpiTherapeutics to help build an SAR database using Instant JChem and MySQL. As noted above, they worked with Novartis to migrate and expand their reactions data warehouse. Lastly, the team used scripts to migrate and enhance the laboratory compound collection at the Mount Sinai Hospital.

The informatics services team can interact with an organization by providing hands-on assistance and specialist knowledge, by training, by singly or jointly developing an application or by enhancing a ChemAxon product. (View the Presentation) ·

Integrating text and chemical structures

· return to TOC

ChemAxon's Daniel Bonniot de Ruisselet described the work to index chemical names and structures from documents. Key components of the work are ChemAxon's Structure to Name, Name to Structure, Document to Structure, and Document to Database. The Structure to Name (IUPAC, traditional name) technology is mature with the exception of peptides and some complex ring systems. Name to Structure technology is more complex; the structures associated with common and drug names are stored in a custom dictionary that is constantly expanding, whereas generating a structure from a systematic name requires breaking the name into tokens and parsing the results to produce a structure. If the name is retrieved from a document, then errors in optical character recognition (OCR) must be recognized and corrected.

ChemAxon's Document to Structure capability identifies chemical compounds in a document if they are a known name, a SMILES or InChI representation, a CAS number, or an image (using an optional third party Image to Structure tool, such as OSRA). Both the structure and its location in the document are returned. It operates on MS Office documents, embedded structure objects, PDF, text, XML, and HTML files. The final step, Document to Database, makes a JChem database that contains the structures, the source document, the location of the structure in the document, the authors, and metadata. The power of the technology is that although it extracts valuable chemical information, it is still component-based and responsive to user requests. (View the Presentation) ·

Neil Pearson from Wiley described their Functional Chemistry project for storing and managing chemical content of published material. The goal is to allow chemists to search Wiley content by text, structure, substructure, or chemical identifier; to use the found structures as chemical entities; and to link to public chemistry databases. The chemical structures in each article are obtained from the author or typesetter and by ChemAxon Name to Structure and stored in the JChem Oracle Cartridge. To enhance the usability of the structures they are weighted according to their role in the article. A development version will be available in November. The continuing developments will include linking of spectra from other sources to the structures, and expanding to more journals, reference works, and reaction databases. (View the Presentation) ·

David Milward presented an update of the collaboration between Linguamatics and ChemAxon for the ChiKEL project to develop a chemically aware text mining system. It uses the I2E interactive text mining platform from Linguamatics and Name to Structure as well as substructure and similarity searching from ChemAxon. The objective of text mining is to extract information from documents to identify, extract, and analyze relevant facts and relationships. It uses natural language processing, terminologies, regular expressions, and chemical substructure searches to find information. Results are presented as tables, networks, graphs, or diagrams. ChemAxon's Name to Structure tool identifies structures that can be used to answer such questions as “What chemicals with this substructure act as kinase inhibitors?”. Searching patents for structures can also identify their physical and biological properties, even if the data is linked by a text identifier such as “Example 4”. The challenge remains to correctly recognize names in documents and to associate them with the correct structure. Although every version of the ChemAxon software improves in this respect, correct recall is estimated to be only 86%, with the result that further improvements in OCR are needed. I2E will soon include Name to Structure for enterprise customers and in their hosted full-text patent content. (View the Presentation) ·


· return to TOC

Matthew Pustelnik from Takeda California opened the second day session with a description of their chemical query tool, ChemQ. Chemists demanded this tool because it is inefficient to search many chemistry databases with different GUIs and there are intellectual property risks of searching over the web. JChem provided a good starting point because it already contained much of the needed functionality. Currently the system provides access to the Takeda Global Compound Collection, ChemNavigator Market Select, PubChem, Zinc, and the database of the top 500 brand name prescription drugs. ChemQ searching includes the ability to search by an identifier and to limit search results by such things as therapeutic area or type of molecule. At search time the user can select which fields will be displayed in the results. ChemQ provides sophisticated navigation of hits by linking to records in all of the databases, linking to record details, and the ability to show only selected fields. One can select only certain hits to be saved or further processed. The components of the system include Marvin and JChem Manager at the Client level and JChem Base at the middle tier. The intention is to develop this tool to be open-source aside from the ChemAxon components and to collaborate with others to enhance it. (View the Presentation) ·

ChemAxon's Eufrozina Hoffmann elaborated on the changes needed for Marvin to move into the world in which Java is less frequently used on the client and new devices such as the iPad are being used by scientists. By using JavaScript the goal is to make a fast platform-independent application. The Marvin JS will be ready for beta test later this year and the first version will follow soon thereafter. She demonstrated drawing, display of atomic properties, query atoms and bonds, valence check, file import/export, and basic calculation. The next version will include more drawing and query features and the ability to display macromolecules. Future milestones include providing support for touch-screens, reactions, and complex S-groups.

Not to forget traditional Marvin, new features include the ability to cut-and-paste native PDF into MS Office documents on the Macintosh platform, the inclusion several new graphical arrows (e.g., retro-synthetic, equilibrium, curved, and dashed arrows), peptide display using 1-letter abbreviations, the option to display lone pairs as lines, and an image server for optical character recognition. (View the Presentation) ·

Processing HTS Results

· return to TOC

Dennis Moccia from Cognitive Dataworks, which was established this year, showed screen-shots of their products for visualization of hit-to-lead results. The software is built on Java/Groovy/Grails, open source JavaScript libraries, and Oracle/Mongo databases. It uses widgets from the ChemAxon tool box. They support a modern approach to picking hits for follow-up by identifying the activity of compounds in the same cluster as the identified hits and by removing compounds with reactive groups or unfavorable physical properties. Another view of the data shows the dose-response curve and calculated potency of all compounds that share a scaffold. The results of scaffold clustering are shown with a cluster dendogram that is color-coded to highlight the differences in scaffolds. Various views of the data support progression of individual scaffolds. (View the Presentation) ·

Carol Mulrooney from the Broad Institute described their stereochemical structure-activity viewer, S/SAR. This new viewer was needed because their diversity-oriented synthesis produces all stereoisomers of the molecules made. Viewing is based on TIBCO Spotfire and R-group Decomposition using JChem Base. S/SAR uses the heat-map paradigm to display the relative potency of all stereoisomers that share a particular covalent structure. A whole library of compounds can be shown on one screen that has the capability of drilling down to display the structures of active compounds. Because R-group Decomposition is at the heart of their analysis, this procedure is run nightly and the results are stored in a database. The TIBCO Spotfire viewer supports user-selected color assignment, selection of specific stereochemistries at each site of variation and the R-groups to include. It has a panel that displays the core structure, position of R-groups, and stereo-centers. It should be noted that the Broad Institute's chemical biology informatics platform also uses ChemAxon software for their registration and compound management systems and their screening results database. Their wish-list includes displaying R-group structures on the axes of the heat map and the ability to display on an iPad. (View the Presentation) ·

Handling Patents

· return to TOC

Dana Vanderwall from Bristol-Myers Squibb described their application that makes the nine million structures and associated data from the IBM patent database available to research scientists, not just patent searchers. They do this principally through Instant JChem forms, but they also provide the capability to retrieve structures from an Excel spreadsheet of patent numbers. Users can also search on specific fields such as title, claims, abstract, assignee, inventor, or PubMed ID. Results can be viewed in pre-constructed IJC summary and detail forms. This new application has put patent-searching in the hands of scientists with resulting faster, higher-quality decision-making in research. It allows scientists to not only search prior art, but also to keep tabs on competitor's activities and compounds and to identify structural motifs in ligands for a target of interest. (View the Presentation) ·

ChemAxon's David Deng demonstrated the latest version of the Markush technology. He showed extracting exemplified structures from a patent and importing them into Instant JChem. There are now additional Markush query features for substitution, hydrogen count, and ring bond count on atoms; chain or ring topology of bonds; and other features such as R-group queries. In addition there are new Markush structure features including multiple attachment points for R-groups, nested R-groups, position variation bonds, link nodes and repeating units, and homology groups. He demonstrated making a Markush structure from a series of molecules by doing an R-group Decomposition based on the Markush core. He also demonstrated the new search interface for searching the Thomson-Reuters patent database. It includes buttons to export exemplified structures, to retrieve the patent document, and to add notes. There is also a Markush enumeration interface and structure search with improved R-group visualization and Markush viewer. (View the Presentation) ·

Using ChemAxon components in teaching

· return to TOC

A team consisting of Jeanne Zalesky, Seann Ives, and Laura Loiselle from Pearson Education described the integration of MarvinSketch, Marvin View and JChem into their online homework, tutorial, and assessment system for introductory organic chemistry. Their software gives individualized coaching and remediation by identifying the reason that the student made a mistake and providing a hint tailored to that specific error. The project started with the ACE Organic program created by a chemistry and a computer science professor using JChem Base Java Class Libraries. It was integrated using perl to add a new JChem service to handle student input. Currently the product is being used for 25,000 students with excellent performance on a trio of quad core machines shared with other programs. Initial testing revealed that Marvin needed to be simplified for this task. They removed options that made it easy to cheat or are not needed and also rearranged toolbars to make buttons easier to find. They also added tutorials, help, and an instructor guide for using MarvinSketch. (View the Presentation) ·

Cheminformatics in the cloud

· return to TOC

Rajarshi Guha from the NIH Center for Advancing Translational Science discussed their efforts on informatics in the cloud. He emphasized that a major advantage of is the ability to easily handle unpredictable loads. However, he also stated that it is important to support parallel computation, which will require algorithm redesign. Hadoop is an open source software framework for distributed processing of large datasets. Rajarshi pointed out the utility of Pig and Pig Latin, which are simpler to write but can be translated to Hadoop code. Integration of the JChem library into Hadoop provided fast SMARTS matching and bioisostere identification. For bioisostere searches the run times were reduced at least four-fold. Although not every cheminformatic problem is amenable to the Hadoop approach, large virtual databases and combinatorial problems make this an approach that should be considered. (View the Presentation) ·

Linking Diseases to Targets to Compounds

· return to TOC

John Irwin from the University of California at San Francisco changed the subject from cheminformatics to the larger problem of linking diagnoses of disease to compounds by way of the targets at which the compounds act. The impetus for the work was SEA's (Similarity Ensemble Approach) ability to predict the possible biological target of compounds in the Zinc database of purchasable compounds. SEA predicts a protein target for 25% of the 21M compounds in ZINC. However, without knowing the clinical relevance of the target it is not clear if the predicted compounds should be purchased and tested. To address this problem the Shoichet Laboratory has developed the freely available DxTRx. DxTRx collects information on the following questions: (1) What are the molecular targets of a particular disease? (2) What compounds for the target have made the furthest clinical progress? (3) What diseases are related to this target? It also suggests compounds to purchase for screening against a target. Discovering the molecular target for a drug or disease is complicated because the literature, reference works, and drug label may not include identification of the molecular target of a drug and because many drugs act on more than one target. DxTRx is designed to be a database curated by expert volunteers who will annotate the current state of particular indication-target relationships. It is currently available for beta-testing and they are seeking volunteer curators. (View the Presentation) ·

Open Innovation

· return to TOC

Daniel H Robertson from Lilly described their Open Innovation Drug Discovery Program. The objective of the program is to enable collaboration with academic groups to expand the diversity of the Lilly compound collection and to identify novel leads for important projects. Their business model considers biological data open but the chemical structures of the submitted compounds are confidential to the investigator. Academics are free to publish their results. They designed a web site that supports the design and submission of compounds for testing by Lilly, reporting the biological data, and recording Lilly's decision on the compound. There are currently 640 user accounts from 280 affiliations including 186 research universities and institutes and 78 small biotechs. To date, more that seventeen thousand compounds have been received for testing in one or more phenotypic or target-based assays. Of those not accepted for screening, the principal reasons are failure to pass the medicinal chemistry rules and structures that are too similar to other tested compounds or to controlled substances. The submitted compounds are quite diverse in that approximately one quarter of the structures are very different from PubChem structures. If a compound is of interest to Lilly, then negotiations as to the character of a collaboration are initiated. Recent collaborations include a simple statement of “right to evaluate” further compounds, support of a post-doc, or fee-for service. (View the Presentation) ·

Integration with Spotfire

· return to TOC

ChemAxon's Krisztián Niesz described the integration of Instant JChem with TIBCO's Spotfire. This provides not only visualization for ChemAxon users but also gives them access to statistical capabilities such as Ward's clustering on properties and multiple regression analysis. In addition, Spotfire's selection capabilities make it easy to filter potential libraries and to evaluate the effect of various filters on the library size. (View the Presentation) ·

NMR Predictor

· return to TOC

ChemAxon's Csaba Fábri described ChemAxon's NMR Predictor of the 13C and 1H shifts and coupling constants of organic molecules that contain H, C, N, O, F, Cl, Br, I, P, S, and Si. As is typical of ChemAxon's products, it has an easy-to-use graphical user interface. The predictions are based on a mixed model that uses HOSE codes and decision trees. HOSE codes are similar to the more familiar circular fingerprints with radius 1, 2, or 3. To assign an NMR property to an atom, the program first checks if its HOSE code, with a radius larger than a preset value, matches that of any atom in the reference dataset. If matches are found, then the values for the hits with the largest radius are used. If no hits are found, then the decision tree shift model is used. Eighty eight percent of the 13C NMR chemical shift predictions are within 5 ppm; 97% within 10 ppm. For 1H NMR chemical shift predictions 91% are within 0.5 ppm and 98% within 1 ppm. They plan to include the option for the user to expand the training set with their own data. The talk also indicated that work has started on a model to predict intrinsic solubilities so as to be also able to predict pH-dependent solubility curves. (View the Presentation) ·


· return to TOC

ChemAxon's György Pirok described their new product, Metabolizer. It includes an embedded human biotransformation library that has 169 generic human phase one biotransformations that are classified by mechanism and ranked by priority of competing biotransformations. The goal is for the program to enumerate all observed metabolites and predict at least one major metabolite. It would be nice to predict most of the observed metabolites without making false predictions, although the available data is so sparse that this is unlikely at the present. The challenges faced by this effort are that metabolism is complex, reliable data is sparse, there is no good definition of a major metabolite nor measure of prediction quality.

The application shows both the predicted metabolites and the type of transformation that produces it. For each compound on the pathway it lists the fraction of the compound that is converted as well as the percent that accumulates—is not further metabolized.

Validation on a library of the human metabolism of 310 molecules indicates that Metabolizer identifies all known metabolites. The top 10 predictions for each molecule contain 56% of the known metabolites, whereas the top 100 contain 92%. Improvements in the predictions are needed, but Metabolizer does enumerate all metabolites and likelihood of formation while supporting the use of custom biotransformation libraries. Further work will enhance the library and develop libraries for other species. (View the Presentation) ·

Partner Sessions

· return to TOC

Partner sessions highlight how various companies incorporate ChemAxon capabilities into their products or consulting business. Erica Del Monaco from ChemAxon introduced the session of “lightning fast” talks by reminding potential partners of the advantages of partnership with ChemAxon. The partner list contains at least forty companies that span the IT needs of drug companies, but also support scientific publishing, text mining and online education. (View the Presentation) ·

Christian Lang from Acelot described their PharmaMiner suite of tools to analyze and predict protein interactions, SimFinder that does fuzzy and topological searches to find molecules that are similar to one or more query molecules, SigFinder that automatically mines databases to find significant fragments, and ActPred that contains models for various biological properties of compounds. They use MarvinView for structure viewing, cxcalc to generate conformers, and the Calculator plugins. (View the Presentation) ·

Jeff Carter from Arxspan described their Software as a Service, SaaS, offerings that are based on ChemAxon software. ArxLab is a secure, collaborative platform that includes an electronic notebook and a registration system providing a searchable repository of research results. They use the Calculator Plugins, JChem Base, JChem Web Services, and Naming. (View the Presentation) ·

James Pearson from Ceiba Solutions described their Helium product that allows the user to connect and explore disparate data within Microsoft Excel, TIBCO Spotfire, OneNote and web browsers. Helium retrieves and connects data in Excel for further analysis by JChem for Excel. It also can directly leverage JChem and Marvin functionality as plug-ins. He emphasized that the architecture of Helium is such that it saves IT costs because it simplifies the IT infrastructure while increasing the productivity of the scientists. (View the Presentation) ·

Chip Allee from CeuticalSoft described their product OpenHTS add-in to Microsoft Excel for the analysis of high throughput screening. A newer effort for the company is their full-featured registration module for OpenHTS that is powered by JChem. (View the Presentation) ·

Yvonne Shimshock from DeltaSoft described their Discovery in a Box that includes integrated software for reagent selection, an ELN for tracking synthesis, registration, sample handling, biological testing, and SAR analysis. It can be hosted internally or in the cloud. All of this is built on the JChem Cartridge, Marvin, Structure Checker and Standardizer, Calculator Plugins, and JChem for Excel.

Janice Stevenson from Digital Science described the SureChem database of structures in patents. The structures are automatically mined from text and images. The patents are processed by ChemAxon Name to Structure, Image to Structure (complemented by CLIDE to increase reliability), Standardizer, Structure Checker, and JChem Base. They are then stored in JChem Cartridge. (View the Presentation) ·

Patrick Morrill from GVK Bio indicated that their GOSTAR product integrates 18 million biological data points, 5.7 million chemical structures, therapeutic indications, and ADME data. They use JChem cartridge, MarvinSketch, Molconverter, and Screen in their system.

Steve Boyer from IBM Research discussed in-line tagging and classification of chemical names by inserting an InChikey back into the original document as well as into the index. Key components of the workflow are ChemAxon's Name to Structure, Chemistry libraries and JChem Base. A complementary effort is to also tag and classify targets by their geneid's and MeSH terms related to diseases and to signs and symptoms. As a result of this effort, one can now search the patent database with any combination of their fields and see the results of all of the annotations for any compound. (View the Presentation) ·

Scott Mayer from IDBS, a research data management company, described the registration system they developed for their InforSense Suite and the E-Workbook Suite of ELNS. ChemAxon's products are key components of their chemistry applications. They are considering integration of the JChem cartridge and the possible use of Markush features in metabolism pathways. (View the Presentation) ·

John McNeil from John McNeil & Co, described the company as one that helps customers select and integrate commercial software, develops registration systems and LIMS, develops custom software for analysis, QC and data mining. Their philosophy is that to do this well they must use the best tools for the science. In the first example described they used SEURAT and the ChemAxon cartridge, in the second they used the JChem plugin to Pipeline Pilot, and in the third example they used Marvin and JChemBase. (View the Presentation) ·

Aaron Hart from KNIME described JChem extensions for this pipelining tool. It already contained four cheminformatics contributions: RDKit, Indigo, Erl Wood Chemoinformatics, and CDK. The specific JChem extensions cover over 90% of ChemAxon's functionality. They were implemented by Infocom with support from ChemAxon. (View the Presentation) ·

Yoshiko Matsumoto from Patcore Inc. described the CRAIS applications for drug discovery organizations. ChemAxon software is used extensively in their system: For example, CRAIS Registration is built on Marvin, the JChem Cartridge and Standardizer and can be integrated with Structure Checker and Calculation plugins. (View the Presentation) ·

Matthew Wessel from Schrödinger discussed the role of Seurat in the breakthrough discovery of Nimbus's development candidate in 12 months from target indication. Nimbus keeps a database of ideas for compounds and their associated calculated properties. (View the Presentation) ·

Melissa Neal from SciQuest described the use of ChemAxon tools in their ERM, Enterprise Reagent Manager. ERM supports procurement, supplier management, and chemical inventory management for research laboratories. It uses JChem Cartridge, Standardizer, Structure Checker, and Marvin. (View the Presentation) ·

Pre- and Post-meeting Sessions

· return to TOC

The afternoon before the formal meeting was devoted to one-on-one meetings of customers with ChemAxon personnel and also a one-to-many session. The one-to-many session was divided into discussions of JChem for SharePoint and registration of biological materials. In line with ChemAxon's philosophy of close interactions with customers, in both sessions the presenters encouraged audience participation by asking questions and soliciting opinions. For the biological registration session attendees were asked to fill out a 45-item questionnaire. The registration of biological materials is complicated by the fact that biological materials span the whole universe from cell lines to molecules with known covalent structure. It also presents the challenge of how to depict the object.

The day after the formal meeting I attended the Developer Training to get a feeling for how ChemAxon tools work. This was extremely well presented by the programmers responsible for each aspect and there was ample time for questions. It was interesting even for one who has no intention of programming! At the end of the day there were tutorials on scripting, a topic that had been introduced during the regular sessions.

Return to Table of Contents