ChemAxon US User Group Meeting, San Diego, September 26-29, 2011

news · 6 years ago
by Wendy Warr
Preamble Markush technology
Introduction to user meeting Registration
Other informatics applications Update on JChem server products
JChem for Excel Instant JChem
SharePoint Reactions
Chemical name related software and text mining Computational chemistry applications
Partner session CEO’s concluding remarks
Conclusion UGM archive – all presentations

Conclusion
If you were hoping for the usual short report, I apologise. This time there was so much happening that I feel that more detail is justified. I could not help comparing this event to the dozens of MDL user meetings I attended in various countries in the 1980s and 1990s. Ongoing themes at the ChemAxon meeting included the guy with his list of 40 most wanted enhancements, Yet Another Registration System, some of the same old faces, useful gossip, and so on. Admittedly, the dress code is smart casual rather than three-piece suits and earrings (not combined!), and there are more Hungarian speakers than German speakers at the European event, but the elements of déja vu are there. In the 1980s we used to have the “3D day” at the MDL User group meeting: half a dozen enthusiasts would attend and the rest of the delegates took a day off sightseeing in San Francisco. ChemAxon users are not afraid of computational chemistry and 3D. I never went to many Daylight meetings but I suggest that the ChemAxon user meeting is also a cross between a MUG and an MDL meeting. At least one of the two annual ChemAxon user meetings is now an essential date in my calendar and the dates are blocked out a year in advance. I say all this in order to give readers an idea of the significance that ChemAxon is achieving in the cheminformatics arena. I strongly recommend that you keep an eye on this company.




Preamble

· return to TOC
About 100 attendees gathered in sunny San Diego. They came from big companies (such as Pfizer, GSK, BMS, Lilly, Roche, Novartis and Takeda), biotechs, academia, and even CAS (although CAS is not a ChemAxon user). They were an international bunch, coming not just from the United States but also from the United Kingdom, Germany, Belgium, Japan, India, and Taiwan. In addition there were 22 ChemAxon attendees: fewer than would have attended a meeting in Budapest. The Gala Dinner was held at the attractive roof top restaurant of Stingaree in downtown San Diego. On the first night of the meeting we met in a rooftop bar to watch sunset over the Pacific, and on the last night we met in a lively (i.e., very hot and noisy) sports bar with as many soundless TV screens per square yard as could possibly be packed in. “Mustn’t grumble” (as Bill Bryson says): ChemAxon generously pays for all this fun.


Markush technology

· return to TOC

Markush Forum · return to TOC
A “Markush Forum” was held immediately before the user meeting. ChemAxon and Thomson Reuters have formed a partnership to enable storage, search and enumeration of Thomson Reuters’ Markush format (VMN), data, interlinked with Derwent World Patents Index (DWPI) and Derwent Chemistry Resource (DCR). The Markush Forum provided an opportunity for forum members to meet and discuss developments and future directions.


Use of Thomson Reuters’ Markush data · return to TOC
Steve Hajkowski of Thomson Reuters outlined the history of Markush searching at Thomson Reuters since 1987. (In the main meeting he gave a much more generalised presentation on the nature of patents and Markush structures, and the use of searches to determine whether an invention is patentable, whether you might have freedom to operate, and whether a third party’s patents are valid or can be “busted”.) In the current partnership with ChemAxon, Thomson Reuters supplies content, not software, and ChemAxon does the software development.

Don Walter and Steve described the Thomson Reuters content and how it is extracted. About 90 indexers build the database which currently covers Markush structures in about 550,000 patent families, plus 1.7 million related specific compounds, from 26 issuing authorities. The Markush package has been supplied as in-house data feeds, with new updates every 3-4 days, since late 2010, as VMN, cXML and SDfiles. The DWPI records are in cXML format with English language abstracts. Customers can then use all these data in conjunction with ChemAxon tools, if they wish, but there is no single product as such. In other words Thomson Reuters does not have an exclusive deal with ChemAxon, and ChemAxon could supply its tools for use with other databases. Steve introduced a 47-question survey that the attendees were asked to discuss and then complete in writing.


ChemAxon’s Markush technology · return to TOC
David Deng emphasised the importance of visualisation of Markush structures; it is hard to cover this in a textual report but ChemAxon has tried to make it easy to look at structures with twenty and more R-groups. Hit alignment, hit colouring and structure cleaning are used. The display of a query within an original Markush structure is quite complex, so another option is a “reduced result” picking out only the most relevant R-groups. Conditions between R-groups (e.g., if R2 is dependent on R1) add complexity that Thomson Reuters handles by assigning more than one Markush structure.

Tímea Polgár did a demonstration of Markush structure handling in Instant JChem (IJC). Markush Search is an add-on to JChem Base and IJC and allows the registration of generic structures into the database as well as substructure and full structure searching in the Markush database (without enumeration of library members). Markush search currently can handle the following generic features: R-groups (nested to any depth), atom and bond lists, link nodes and repeating units, position variation bonds, and homology variation (predefined generic atoms, such as alkyl and aryl, and user-defined ones, such as protecting group).

IJC displays are customisable. Charts can be done, for example, a bar chart for molecular weight distribution of exemplified structures. Clicking on a bar gives a display of the relevant structures. Tables can be produced of structure, registry number (in DCR), and Thomson Reuters “role” (e.g., catalysts, reagent, solvent, known compound). It is possible to work with only exemplified structures. The spreadsheet can be sorted and coloured. A results set can be exported, for example, to Excel. The Markush viewer is a new component of JChem; it can also be used standalone or with IJC. A Markush structure can be created from a set of individual structures.

The Markush enumeration plugin can be used to generate a whole library or a subset of the library of a Markush structure. It is also capable of calculating the total number of specific structures present in a Markush library. Full, random or partial enumeration can be carried out (partial enumeration by highlighting R21 and R22, for example). A customisable property filter can be applied after enumeration.

In the main meeting as opposed to the Markush forum, Timea showed other features of ChemAxon’s Markush technology. The proposed architecture uses IJC as the client with Internet or intranet connection to JChem Server and the cartridge on the server. Conditional formatting (allowing data to be coloured based on some conditions) can be used in browsing hit lists. Users can create statistics, plot results, sort a hit list, and archive or combine hit lists. There are KNIME nodes for Markush enumeration, Markush search, and Markush structure generation.

Szabolcs Csepregi of ChemAxon described what is new in chemical representation and performance. ChemAxon has been very successful in speeding up online Markush searching; an in-house system could use a hundred or more processors and search even faster. Version 5.5 of the software does simple R-group queries in Markush databases but “cheats” by enumerating the generic structures and searching the exact structures. Enumeration is only done on the query side, and even that will be eliminated in the future. Szabolcs also outlined new query, visualisation, enumeration and interface (IJC) features. The whole of the Markush data set (as opposed to subsets) can now be searched reliably. A demonstration of Markush search implementation is available at http://revamp.chemaxon.com/jchem/examples/db_search/index.jsp.


Introduction to user meeting

· return to TOC

Instead of starting with a “state-of-the-union” address, ChemAxon’s CEO Alex Drijver conducted an interview with Sarah Blendermann of Pfizer on how R&D informatics can best support science in an ever-changing world. Business changes in pharma include outsourcing, open innovation, collaborations, and acquisition and divestiture. Technology changes in informatics include business analytics, mobility, cloud computing, and social media. To drive a stable and strategic informatics portfolio you must move quickly and understand both domains (the pharma business and informatics); maintain a smaller, strategic portfolio; clarify the intent of customer requests; set clear expectations and manage those closely; and plan for the future. The future lies in utility R&D services, collaborative ventures, and cross-industry consortia: pre-competitive alliances such as the Pharmaceutical Research Information Systems Management Executives (PRISME) Forum Association, and the Pistoia Alliance.


Registration

· return to TOC

GSK’s registration system · return to TOC
Those of us who have been going to software user group meetings for over 30 years can remember many presentations on registration systems, especially ones from GSK and its predecessor companies. So it was with some amusement that I embarked on listening to Charles Wilkins of GSK on the subject of registration (View the Presentation). Charles did not name names of course, but let it be known that GSK has ended up with a 10-year old legacy system based on outdated hardware and a Daylight cartridge that is going nowhere, and multiple systems bolted together with SMILES at one end and MDL molfiles at the other. In addition to the maintenance headache, GSK also faced a 40% reduction in the number of skilled registrars and loss of their knowledge of specialised business rules. Obviously it was time for YARS (“Yet Another Registration System”).

This time the company considered the novel approach of Software as a Service (SaaS). The initial choice of vendor was not ChemAxon but a company in India that was to be paid on a fee for service basis (by number of compounds registered). Liability proved to be a key issue: the risk of letting a developer group have access to GSK compounds in an Indian service centre. When GSK started to assess the value of its compound collection, the proposed registration deal no longer made sense.

In the meantime an internal team at GSK had started to evaluate JChem and within three months had come up with a prototype registration system. ChemAxon did not have an off-the-shelf registration system at that time but by mutual agreement one to be developed between ChemAxon and GSK could become such a system. SaaS delivery would have the benefits of lowered support and maintenance costs, quicker turnaround of releases, and a single payment structure based on number of compounds registered. Liability issues could be avoided in Registration as a Service if GSK data were protected at GSK sites using GSK security. The challenge was to develop and implement a JChem based registration system that would allow the quality of the registered data to be maintained while reducing the registrar support needed. Some of this “QA” burden needed to be pushed back to the chemists synthesising the compounds.

Phase I involved migration of existing registry data, re-integration of supported systems, changes in registrar business practice, registrar acceptance, and changes in end user business practices. The rules engine had to be tweaked to produce the same level of automatic registrations (or better) and chemists needed to be able to bulk load purchased compound sets. In Phase II (in 2012) there will be a feedback mechanism to chemists to assist in the definition of structure to increase automated registration percentages and registrar tools will be improved to aid productivity.

In the following talk, Jonathan Lee described the features of the registration Web Service on which ChemAxon and GlaxoSmithKline are collaborating using an agile development methodology (in Scrum). This future ChemAxon product is based on JChem Base, MySQL and Oracle databases, a Tomcat Web container and an AJAX Web client, in a Service Oriented Architecture. It will be Marvin and ChemDraw compatible. Compounds are input as V3000 molfiles with chemically significant metadata.


Registration at RTI International · (View the Presentation) · return to TOC
Danni Harris of RTI International has developed a different registration system involving DeltaSoft’s ChemCart. He faced two challenges: getting the data in and ensuring that principal investigators (PIs) could protect access to their own data and external collaborators and project teams could collaborate securely. The data came from multiple places: 10% was in IJC, 30% in applications such as Filemaker Pro and Excel, and 60% on paper, as PDFs etc. Getting the data in was perhaps the lesser challenge: for example, Chemical Literature Data Extraction (CLiDE Pro) has been almost 100% effective, he claims, while admitting that the structures in question are not of low quality.

Providing the required fine-grained security was not possible using out of the box ChemCart so the Institutional database was partitioned into customised data vaults amongst which project-specific collaborations can also be defined. Role-based permissions ensure that postdocs have read-only access on projects within their lab, PIs can administrate their own data and submit data to the Institution, and management can view and search all data across projects. Access is customised using views and triggers. There are different privileges for collaborations of different types.

Using the Oracle JChem cartridge and JChem Server the RTI system pulls properties into the database automatically. Danni illustrated heat maps and pivoting in spreadsheets. He said that it would be good if next year, instead of talking about IT, he could talk about mining GPCR data and use of the Structure-Activity Landscape Index within ChemCart (Guha, R.; Van Drie, J. H. Structure-Activity Landscape Index: Identifying and Quantifying Activity Cliffs. J. Chem. Inf. Model. 2008, 48 (3), 646–658). It would also be good to integrate with GPU-accelerated state-of-the-art searches and predictions, in work similar to that demonstrated by OpenEye.


Structure Checker · (View the Presentation) · return to TOC
Structure Checker can recognise, report or automatically fix most of the structural errors in chemical structure representations and thus, combined with Standardizer, it is a key component for registration systems, said David Deng of ChemAxon. Structure errors arise from drawing errors, optical structure recognition errors, inconsistent representations, aliases and so on. There are 42 different checkers, twelve of which are new: checkers for metallocene errors, OCR errors, atom query properties, empty structures, explicit lone pairs, molecule charges, racemates, rare elements, star atoms, substructures, unbalanced reactions and valence properties. David demonstrated some of the configuration and output options. Structure Checker can be used as a standalone GUI, in batch mode, in KNIME and Pipeline Pilot, and in MarvinSketch. It is accessible via Chemical Terms in IJC, and in JChem for Excel (JC4XL), JChem Cartridge and JChem Web Services.

Other informatics applications

· return to TOC

Informatics at the Broad Institute · (View the Presentation) · return to TOC
The Chemical Biology Platform within the Broad Institute screens both the Molecular Libraries Small Molecule Repository (MLSMR) and a unique compound collection synthesised internally through diversity oriented synthesis (DOS, Nielsen, T. E.; Schreiber, S. L. Towards the Optimal Screening Collection: a Synthesis Strategy. Angew. Chem. Int. Ed. Engl. 2008, 47(1), 48-56). In the DOS Build, Couple Pair technology all possible stereoisomers of a molecule are made, if possible. Benjamin Alexander of the Broad Institute talked about the supporting informatics needs which include sample tracking, analysis of high-throughput assay data, hit evaluation, and analysis of compounds by patterns of activity across multiple assays, indeed, nowadays, capture of the entire discovery life cycle (assay development, HTS, lead optimisation, ADME etc.). For Version 1 of the Chemical Biology Informatics Platform (CBIP) in March 2010 the Daylight cartridge was abandoned in favour of the ChemAxon family of products: Marvin, JChem API, Standardizer, Oracle Cartridge, Pipeline Pilot plugin, Chemical Terms and Instant JChem.

In version 1.5 in September 2011, compound registration was integrated with the CambridgeSoft ELN, PubChem data submission was automated via an in-house tool, and hit calling and cherry pick workflows are now facilitated using TIBCO Spotfire. SAR has been improved with Seurat and with a stereo-structure-activity relationship (SSAR) visualisation tool written in-house. The SSAR tool uses a file for data and structures from the database, and the JChem API for rendering and sorting of building blocks. In one example “traffic lights” colouring in an R1/R2 spreadsheet makes it clear that only two out of 8 stereoisomers are active. The tool can also be used for quality control analysis of DOS libraries.

In 2012/2013 the transactional/operational CBIP model will be changed into a multi-dimensional querying and reporting architecture using a dimensional data model. The transactional CBIP will be modularised and there will be a transition to a technology stack emphasising Groovy and Grails.

Ben reported great satisfaction with the ChemAxon toolkit. The Broad has integrated many ChemAxon tools across many systems and has found that the Java API is extensive and well-documented, and ChemAxon’s support is excellent. Frequent releases of the software are nice but it is hard to keep up: testing cartridge upgrades is expensive. The Broad would love to have Grails plugins from ChemAxon, so they can integrate molecule and reaction data types into Grails Object Relational Mapping (GORM), and integrate Marvin into Groovy Server Pages (GSP).


Reagent ordering and inventory at Takeda · (View the Presentation) · return to TOC
In 2002, when Matthew Pustelnik and his colleagues at Takeda first made out a business case to build a system to identify, procure and track reagent samples, no end-to-end solution was commercially available and the company’s ERP solution was undecided, so they planned to build a flexible “Amazon.com of reagents” in phases. They eventually developed two Web-based systems, Chemical Requisition System (CRQ) and Chemical Inventory System (CIS), which are integrated into the Enterprise Resource Planning (ERP) System. The foundation architecture consists of J2EE Servlets (WebMacro, AJAX, AppManager), ChemAxon JChem Base, the cartridge and Marvin, and Apache, Tomcat cluster and Oracle.

In 2003 the Available Chemicals Directory was imported and an Extract, Transform, Load (ETL) program was written to convert it to a JChem Base database of more than 1.2 million products (over 300,000 chemicals) after application of purchasing business rules, flagging of highly hazardous materials, and analysis and flagging to meet Export Administration Regulations (EAR). In 2003 the structure search software used the Daylight cartridge (ChemAxon had no cartridge at that time). There was no integration with the ERP system until 2006 and purchasing requests were initially handled by automatic form filling and exporting an Excel purchasing request form. In 2004 the system was upgraded to the ChemAxon cartridge and the Screening Compounds Directory (SCD) was included. In 2008 the reagent stockroom was redesigned and stockroom operations were modified. In 2009 a second stockroom was added. By 2011 more than 1.1 million chemicals (more than 22 million products) are included in CRQ. In future, structure search performance will be improved, the user interface will be redesigned (extJS, Dojotoolkit), CRQ will be integrated with the ELN, cloud solutions will be considered, and eMolecules will be used to reduce costs.


Informatics at the University of Montana · (View the Presentation) · return to TOC
The Core Laboratory for Neuromolecular Production (CLNP) at the University of Montana (UM) hired Erin Bolstad to centralise the drug discovery data of several neuroscience drug-design groups. Structural, biophysical, characterisation, and literature data, and information on stock, inventory, location and age are included. Erin used Oracle and the JChem cartridge. Multiple users access a single schema indexed in the cartridge. Chemical language is via SQL. Structure conversions in and out of the database are input to screening and docking routines. IJC version 5.5 was deployed via Java Web start. Two structure identifiers (UM identifiers for internal and literature compounds, or ChEMBL identifiers) are used in the one to many links of structures to assay results, characterisation data (NMR and MS), location and quantities, and screening and docking tables.

Triggers are used in a number of cases, for example, a trigger fires off the summary table update if a new assay result is registered. As yet, data other than structures and assay results are not in the main part of the system. Grid- or form-based displays can be produced from all data calculated together, or UM-only data, or literature plus ChEMBL data. Additional data (on physical properties, other targets, screening and docking) are propagated to the summary table via a trigger. In future Erin would like to be able to bind widget data from other tables. Authentication involves the Oracle database matching the IJC local user: IJC roles and Oracle SQL privileges are used. Row-level security means that users can be restricted to viewing only compounds in the user’s group. Security is propagated across tables via triggers.

Data utilisation, in 2D and 3D virtual screening, and docking, are parts of an ongoing story. The overall goal is to use IJC, JChem Web Services and JC4XL to query, search, share, and view all the data in multiple ways in a simple summary format, with conditional formatting so that different values can be readily distinguished for quick browsing. Several levels of easily managed security and access control are needed. Combining ChemAxon tools with SAR studies should lead to a better understanding of differentiating specificity and increasing potency.

Update on JChem server products

· return to TOC


Szabolcs Csepregi of ChemAxon detailed the latest and future developments in the JChem server products: JChem Base, the cartridge, Web Services and Markush search. Since the last meeting, there have been many improvements to the search engine: hit visualisation of similarity search results using maximum common substructure; more consistent R-tables for symmetrical scaffolds; multi-threading enhancements giving faster first results and similarity search; non-tautomer duplicate search on tautomer duplicate tables; an enantiomer stereo search option; ECFP and FCFP in similarity search (with JChem Screen); sophisticated molecular formula search; support for new stereo types (syn and anti, cis/trans of cumulenes, and axial (atrop, allenes and cumulenes)); and faster R-group search.

In JChem Base there is now a duplicate filtering table option and “Jcunique” carries out duplicate filtering of a stream in a command-line. It is possible to run database administration tasks in the JChem Web Services product, including its Web-based GUI; previously this was only possible using the standalone JChem Manager application, which is part of the JChem Base product. New Web Services include molecule search in lists, retrieve or export related table data, Markush search and enumeration, batch insert and delete in a JChem table, and batch Chemical Terms evaluations. Under development in JChem Base, Web Services and the cartridge are a computational cluster/grid solution, a more usable JChem Manager, API support for an arbitrary table structure and easier installation for the cartridge.

JChem for Excel

· return to TOC


Migration issues · (View the Presentation) · return to TOC
Tamás Pelcz of ChemAxon said that an Excel add-in is probably not the ideal choice for custom chemistry applications, and there are many challenges in developing under it. On the other hand, Excel is there on every desktop, almost everybody knows how to use it, it has a familiar user interface, it is the standard for spreadsheet applications, and it is used in other areas of research. Tamas presented a useful table comparing Excel add-ins:

Feature Excel Shape Based Window Rendering
Examples Accord for Excel
ChemDraw for Excel
ISIS and Isentris for Excel
JChem for Excel
Structures visible without add-in Yes No
Print sheet as it is Yes No (insertion-removal required)
Structure resize with cells No Yes
Maximum number of structures About 3000 No limit
Import speed Slow Fast
Workbook size Large (contains images) Small
Advanced functionality (e.g., structure functions) No Yes

It is easy to convert ISIS for Excel workbooks to JC4XL but for other add-ins it is not so easy because the structures are in non-standard formats. Tamas reported on some of the issues that arose in deployment of JC4XL in GSK, Evotec and the Genomics Institute of the Novartis Research Foundation (GNF). In the case of GSK, JC4XL was successfully integrated with GSK’s own Helium for Excel. Helium uses Visual Studio Tools for Office (VSTO) but JC4XL is a Component Object Model (COM) add-in. The solution was to load the Common Language Runtime (CLR) to start as soon as Excel starts.

The in-house add-in interacts with GSK’s own Web Services and retrieves structures. JC4XL does the structure rendering and other chemistry related work. The in-house add-ins communicate with JC4XL through the ChemAxon API. Some extra user requirements were met in the new system: ChemDraw integration, copying structures and data in a single step to the clipboard, copying filtered structures to a new sheet, and renaming the worksheet to the file name in import from file.

In migration from ISIS for Excel, Evotec needed specific features in import to and export from file, and in converting to image: Tamas demonstrated these. At GNF, migration from Accord for Excel was only partially successful: printing and copy-paste to PowerPoint, embedded work book mode, and sharing workbooks cause problems in the JC4XL (or ISIS for Excel) sort of add-in. ChemAxon’s goal is a hybrid mode, enabling both types of Excel add-in, so the user or administrator will be able to specify how JChem for Excel will work in certain scenarios.

Since last year the rendering in JC4XL has moved from .NET to C++ and Office 2010 is supported. Problems with connection to Oracle in .NET have been solved by using Devart’s dotConnect for Oracle. In the past JC4XL could import and search only Oracle tables; now Views are supported as well. It is also now possible to see other schemas as well as the user’s own schema. Many other enhancements are planned for the future. Suggestions from the audience were “JChem for Office” and a Macintosh version.

JC4XL at Takeda · (View the Presentation) · return to TOC
Microsoft Excel is familiar and ubiquitous and well suited for data visualisation said Matthew Pustelnik. It facilitates communications in the form of print, presentations and e-mail. It has proven cheminformatics add-ins. JC4XL is mature and can be integrated and extended. Matthew and his colleagues at Takeda made a business case for enterprise deployment of JC4XL because it was aligned with the company’s solution delivery strategy, it was a value proposition immediately and in the long term, a Takeda cheminformatics extension was not in place, the overhead for Excel would be low, and Excel deployment and maintenance would be easy.

Many operating systems and many versions of Microsoft Office are in use; Takeda homed in on 32-bit Windows 7 and Office 2007. The JC4XL licence file is managed through a Web server. JC4XL was pushed successfully to more than 80 workstations through a login script. Thirty percent of potential users adopted the solution after one training session; non-users get all the functionality they need through other tools. JC4XL needs further customisation and integration in future. Identifier services and chemical descriptor calculation and profiling are useful.

The GNF Excel add-in · (View the Presentation) · return to TOC
Yingyao Zhou of GNF illustrated how the add-in idea can be further extended to integrate Excel with the rest of the cheminformatics infrastructure constructed at his company. Although Excel as a development platform has some advantages it lacks out of the box chemical intelligence, it has no fuzzy logic, it creates data silos and it leads to performance issues on large data sets. Nevertheless it is still the place where most data are captured initially, it is the tool used by most people to analyse the data, and it is the format with which most users share data. For these reasons, GNF wanted to use Excel as another application platform, but did not want to do Excel programming. So, one person built a custom Excel add-in, leaving other developers free to concentrate on server-side features.

Yingyao described some applications of the GNF Excel add-in. In one of them data are selected and reported by highlighting compound IDs, and clicking on “Compound Report”. The selected IDs are used to construct the target URL string for a Web application. Other Web application ideas include visualising dose-response curves behind an average IC50 value, or automating a data upload process.

“Auto Fill” is another application of the Excel add-in. It takes a column of IDs and automatically creates additional attribute columns, so that, for example, compound IDs can be converted to SMILES. It fetches data from an Excel sheet, invokes the corresponding Web Service to obtain results in a C# DataTable format, and then modifies the original sheet with the additional attribute data. The Web Services are not aware of any Excel API. “Auto Fill” provides a generic user interface for converting attributes; new conversions can be added without redeploying the Excel add-In program. As an alternative to JC4XL, the add-in enables users to carry out various calculations without learning Excel formulae.

Instant JChem

· return to TOC


New features · (View the Presentation) · return to TOC
Tim Dudgeon ran through some of the new features of IJC 5.4, 5.5 and 5.6, including visualisation widgets, conditional formatting, more form widgets, form builder improvements, cherry picking, calculated fields, scripting support (the scripting language is Groovy), an improved version of Reactor, the ability to train logP and other prediction routines with your own data, improved Markush structure handling, R-group decomposition, performance improvements, and more options for security. In future “more of the same” will be done to enhance core and chemistry features and IJC will implement Odata integration. Odata is a Web protocol supported in Excel 2010 and SharePoint, and pushed by Pistoia as an interoperability standard.

IJC at GSK · (View the Presentation) · return to TOC
GSK has deployed IJC on an enterprise scale. Working closely with ChemAxon, GSK has to moved the data from the 200 most frequently viewed sets of project data (out of 400) from ISIS to IJC. Brett Hiemenz described how automation cut average migration time for a project from 2-3 days down to one day, although each step required some manual intervention. A custom script parsed an ISIS HViewfile (.hvd) and produced Oracle SQL. A Java application drove the Instant JChemDiscovery Informatics Framework API to create objects needed in an IJC form and another Java application using the Framework translated controls in ISIS Forms (.frmfiles) to IJC widgets. User authentication integrates with the corporate Lightweight Directory Access Protocol (LDAP)/Active Directory system. ChemAxon introduced new capabilities to satisfy all GSK’s proposed authentication scenarios in IJC 5.5 but, unfortunately, the IJC 5.5 security mechanism only governs project-level security, not views.

IJC URLs are useful in that IJC launches directly from a Web page link and automatically loads a specific project. Also, some IJC application settings can be standardised at launch time. Microsoft SharePoint is used to manage the 200 URLs but one problem is that SharePoint limits URLs to 255 characters, and IJC URLs can be quite long. Building new URLs is also not straightforward.

Performance is currently limited by both GSK and IJC constraints. GSK chemical structure and biological databases are hosted at one UK site, and not replicated. Performance is thus poor in the United States and even worse in China. The databases are not in the same Oracle instance and the ISIS project database has thousands of tables and views feeding the forms. IJC cannot join across Oracle schema instances; if the JChem cartridge is to be used, the IJC schema must be located in the same database instance. Also, IJC search results are obtained individually rather than in batches, slowing record-to-record navigation.

GSK has implemented some temporary solutions to overcome these issues, but expects to eliminate these stopgaps as ChemAxon make strides toward reducing IJC’s application schema overhead, improving search result browsing performance, and supporting virtualised data. Ultimately the goal is to see a multi-tiered architectural solution, though Instant JChem in its current form is still much closer to its standalone desktop application roots.

SharePoint

· return to TOC


Tamas Pelcz of ChemAxon gave two talks because the company has two products, but eventually there will be just one. JChem for SharePoint, now available, is a product name covering all ChemAxon components related to Microsoft SharePoint, including JChem for SharePoint Search. In future SharePoint 2007 will no longer be supported; SharePoint 2010, Firefox and Google Chrome will be supported. New features since the 2010 US UGM are inline editing, and third party editor support in the ChemAxon structure field; ChemAxon linked structure, calculated and calculated structure fields; import and export of Web Parts; and visualisation with Microsoft Chart and Visifire, in a service application based architecture.

JChem Search for SharePoint is an extension to Microsoft SharePoint Search and FAST Search to allow indexing and querying of chemical structures. Currently indexed are JChem for SharePoint content; structure files (molfiles, SDfiles, and CDX, .skc and .mrv files) extracted from IUPAC Name or SMILES; configurable words and synonyms; OLE Objects (Marvin, ISIS Draw, Symyx Draw); and JChem for Excel. Tamas demonstrated substructure search, and he listed features that should be available by November.

Reactions

· return to TOC


David Deng gave a presentation on features of Reactor, ChemAxon’s reaction enumeration tool. Since this is a relatively mature product I will not dwell on the details. Reactor comes with over 200 synthetically feasible generic reactions and a graphical interface for users to create customised reaction schemes. David also illustrated manual selection of possible products; sequential versus combinatorial enumerations given multiple reactants; multi-step reactions using Reactor in KNIME; Reactor integration in JChem for Excel and IJC; and synthesis code generation.

Zhenbin Li of Boehringer Ingelheim talked about reaction schemes captured from the company’s external request management system (ERMS). Three levels of collaboration with a contract research organisation (CRO) are possible: the entry level in which a chemist buys a compound, the mid-level where compounds are made by a collaboration partner based on a long term service contract, and a top level where Boehringer Ingelheim and the collaborator are partners. ERMS is aimed at the middle level.

Previously, compound requests were tracked by each individual group, causing all sorts of problems, so a robust new tracking system was needed. The user enters a target structure and synthesis scheme and the system automatically checks for duplication for the target compound in various databases. If the compound is not available, the user enters the desired amount, desired purity, desired completion date, expected number of synthesis steps, and so on. The project leader approves the request and assigns the task to a chemist internally or externally. When the synthesis is complete the chemist updates the status of the request (amount, container barcode, etc.), and an email notification goes back to the requester.

The system has had many benefits but has raised one major issue: although the compound and logistics are stored in a database, individual reactions in the synthesis scheme are not. The reaction scheme is captured as an image file. The .mrv file holds all information including reaction conditions, catalysts, etc. and can be reloaded into MarvinSketch to regenerate the reaction scheme. “MDL” (Accelrys) software is used in ERMS and the RDfile format keeps the first reaction and leaves all other compounds as either reactants or products. The molfile format takes all compounds into one molfile as a mixture of seven compounds.

Boehringer Ingelheim wants to parse the individual compounds using JChem. Ideally, reactions should be parsed and ordered according to the arrow direction (up, down, right, or left) and the generated reaction should retain the conditions, catalysts, solvents, etc. This is not easy. Zhenbin had two suggestions for how ChemAxon might help.

Chemical name related software and text mining

· return to TOC


Structure to Name, Name to Structure, Document to Structure · (View the Presentation) · return to TOC
ChemAxon has tools for extracting names from documents, converting names to structures, and converting structures to names. Daniel Bonniot of ChemAxon gave us an update. Structure to Name is mature and converts 99.9% of structures to a name; there have been incremental improvements over the last year. Name to Structure is improving steadily; there have been major upgrades, coverage and correctness are both improved, and quality is competitive. Document to Structure has improved with Name to Structure enhancements, PDF support, and more OCR and syntax correction.

Chemically informed knowledge extraction from literature (ChiKEL) · (View the Presentation) · return to TOC
Text mining is well established in drug discovery, and its use is growing in other applications. Patents are being used for scientific knowledge, intellectual property and competitive intelligence. Therefore Linguamatics and ChemAxon wanted to be able to combine natural language programming (NLP) based text mining with chemical search and visualisation based on structure, and they became partners in the ChiKEL project, a two-year project supported by the European Union EUREKA’s Eurostars programme. David Milward of Linguamatics and Daniel Bonniot reported on progress.

The existing integration between the two companies’ software enables substructure and similarity searching for known compounds. For example, it is possible to interrogate the literature to find properties of compounds that have a particular substructure, such as the targets the set of compounds inhibits, and to answer precise queries such as finding IC50 values from unstructured text, and finding EC50 and IC50 values from tables within documents.

Phase I of ChiKEL aims to evaluate and improve ChemAxon’s Name to Structure, integrate it with Linguamatics’ I2E (to ensure it can find and understand novel chemicals in, for example, patent documents), and to investigate clustering techniques. IUPAC, CAS, common and drug names and abbreviations are supported. Improvements have been made to Name to Structure in the areas of systematic names and the dictionary, and in OCR error correction and detection of punctuation, including brackets, in Document to Structure.

The capabilities are being evaluated with respect to the “gold standard” SCAI corpus; a new gold standard for patents is being developed. Version 5.6 of the ChemAxon software scores over 50% on recall, and about 78% on precision, with an F-score of 61%. The figures for OSCAR 4 are approximately 28%, 83%, and 42%, respectively. After integration with I2E, chemicals will be found by Name to Structure as well as by dictionary matching. Terminology will be created on the fly, with different matches brought together as a single chemical concept via an ID (SMILES, InChI or InChIKey). David gave an example of finding chemical names in patents and restricting the query to chemicals mentioned as the product of a reaction (e.g., “gives 2.4g of …”).

Clustering can be used to produce a heat map of chemical-disease associations, from association using text mining, clustering of chemicals via maximal common substructure, and agglomerative clustering of diseases. Phase II will see continuing improvements to Name to Structure, integration of chemical drawing and display of chemical structures in results, and research on image recognition, automated clustering and Markush search integration.

chemicalize.org · (Open chemicalize.org) · return to TOC
ChemAxon has started implementing a new service for chemicalize.org: the Document Viewer allows users to view a PDF document in the browser, find and highlight the location of structures, and extract chemical structures from the document. Alex Allardyce gave an impromptu demonstration of the new features.

Computational chemistry applications

· return to TOC


Chemistry for biology · (View the Presentation) · return to TOC
John Irwin of the University of California San Francisco gave a presentation about the renowned ZINC database. ZINC is a proxy for purchasable chemical space and ChEMBL is a proxy for medicinal chemistry space; where they overlap is the interesting area of purchasable medicinal chemistry space. The ZINC Web interface is now powered by ChemAxon. It has been redesigned to simplify the process of finding commercially available small molecules of biological interest and then organising and prioritising these compounds for purchase. Perhaps the single most significant feature is the use of a shopping cart metaphor to organise and manage results.

ZINC has search options other than chemical structure. John showed searches for the ChEMBL annotation MDM2_HUMAN, and for the target HDACs. There are many HDACs so he homed in on human HDAC3 with binding less than 1 µM and displayed the purchasable compounds. In the grid view of results, hovering the mouse over a structure brings up a window with vendor and physical properties for that compound. For compounds added to the shopping cart, cells are highlighted in blue. Other views are also possible. Mol2, SMILES and SDfiles can be downloaded. The purchasability report is sorted by vendor and emails can be sent automatically to vendors.

But what if there are no active compounds, i.e., there is no overlap between medicinal chemistry space and purchasable chemical space? To address this problem the Schoichet lab has developed the similarity ensemble approach (SEA; Keiser, M. J.; Roth, B. L.; Armbruster, B. N.; Ernsberger, P.; Irwin, J. J.; Shoichet, B. K. Relating protein pharmacology by ligand chemistry. Nat. Biotech.2007, 25 (2), 197-206). SEA relates proteins based on the set-wise chemical similarity among their ligands. It can be used to search large compound databases rapidly and to build cross-target similarity maps. Michael Keiser and his colleagues have done retrospective tests of SEA on drugs already being used in man. From that work (Keiser, M. J. et al. Predicting new molecular targets for known drugs. Nature 2009, 462, 175-181) John showed a network of relationships among new targets, old targets, and drugs, for 32 new targets and 19 drugs, with 1.2 nM to 14 µM activity. ZINC also has a scriptable interface where more complex searches can be carried out, for example, “ligands for histamine H1, for sale, single representation at pH7, in SMILES”.

Patsy compilation · (View the Presentation) · return to TOC
The identification of a specific subgraph within a graph (subgraph isomorphism) is typically used to identify a substructure in a molecule connection table. A backtracking atom-by-atom match is very efficient for matching a single pattern against a single molecule, and well known optimisations exist for scanning a large database of target molecules. Analogous to the way chemical database search systems use fingerprint pre-screens to optimise searching a single pattern against multiple molecules, pre-processing analysis and pattern compilation can be used to optimise matching multiple patterns against a single target molecule. Patsy takes a set of patterns and generates a Java class that performs this matching using the ChemAxon toolkit.

Roger Sayle showed that Patsy improves the run-time performance of subgraph matching dramatically. The biggest gains are seen when matching multiple patterns against a single target molecule. Examples of matching multiple patterns include filtering of undesirable properties, atom typing, descriptor and physical property calculation, pharmacophore perception, feature-based fingerprint generation and IUPAC name generation. In these cases, current practice is typically to match each of the SMARTS patterns independently and sequentially, but Patsy compilation allows some patterns to be matched simultaneously.The matching of patterns for radioactivity in JChem 5.5 was speeded up 700 times. ChemAxon MACCS fingerprinting was speeded up 7.8 times.The method could be applicable to Thomson Reuters CPI codes used in patents. Increasing speed could enable previously impossible queries to be carried out.

Patent analysis tools · (View the Presentation) · return to TOC
Christopher Kibbey and his colleagues at Pfizer have developed a suite of patent analysis tools and made these available to medicinal chemists throughout the research organisation. In addition to performing substructure, similarity and exact match searching against databases of exemplified structures from patents, two of these tools offer additional analysis features. Substructure and fingerprint-based similarity search results can be misleading when applied to structures in patents. Substructure search cannot account for homology, substitution, position and frequency variation, and fingerprint-based similarity search algorithms may not take connectivity of fingerprint features into account. Reduced graphs offer an alternative to fingerprint-based similarity search algorithms. Kibbey showed how a structure could be fragmented and re-assembled with the terms “ring”, “chain” and “substituent” as nodes in place of the fragments.

In Pfizer’s Feature Analysis, fingerprints are encoded for each node based on structural features. The reduced graphs of query and target can be overlaid and scored on similarity. The output of Feature Analysis also provides “substructure-like” match of ring, chain, and substituent features. Results can be visualised with “traffic lights” colouring for similarity. Users can select a query feature to display the corresponding substructure features for exemplified compounds, and per-atom differences are highlighted in blue for 100% similar fragments. Kibbey displayed coloured structures showing how a benzimidazole ring in a query structure mapped to a variety of ring nodes within the target set. Feature Analysis can be applied to ranking patents by structural relevance to one or more query molecules.

Substructure and exact match searches may be ineffective when applied to patents represented by multiple Markush structures and similarity search against Markush structures is not supported by the JChem toolkit. Enumeration of virtual libraries is an ineffective strategy for similarity analysis: random enumeration yields structures with variation far away from the core scaffold within an extremely large virtual space, and sequential enumeration generates smaller virtual libraries of close-in structures but ignores much of the chemical space within the Markush. Level enumeration generates structures through random enumeration of R-groups up to a maximum specified nesting depth. This yields smaller libraries of close-in structures while maintaining representative R-group coverage. Level enumeration plus Feature Analysis is an improved strategy for Markush analysis: the level-enumerated structures are encoded as a set of reduced-graphs.

Lastly, a Markush tool has been developed to facilitate analysis of the detailed chemical space disclosed within a patent. Using this tool, medicinal chemists can perform substructure and exact match searches against Markush structures from patents of interest and enumerate representative structures from the search hits.

Research at the University of New Mexico · return to TOC
Oleg Ursu of the University of New Mexico has validated ChemAxon’s Screen3D (http://revamp.chemaxon.com/products/screen/) on several targets from the ChEMBL database. Structures were cleaned up with Standardizer and converted to 3D with MolConverter. One percent of the actives were used as queries and the rest, plus 30,000 random decoys, were used as targets. Binary files were prepared and Screen3D was run. ECFP fingerprints were used for comparison. Performance and hit recall rates for Screen3D were good: better than for ECFP. For two targets the percentage of hits recalled did not reach 90%. In one case, tyrosine phosphatase, this was because of the high proportion of compounds with more than 8 rotatable bonds. In the case of thymidylate synthase bad sampling was the problem. Oleg showed some typical alignments. He recommends Screen3D as a new method to perform 3D shape virtual screening.

Oleg has also developed an automated tool to identify promiscuous patterns in a database of biological activities, based on the matching molecular pairs (MMP) concept. He downloaded and cleaned the Molecular Libraries Small Molecule Repository (MLSMR) library from PubChem, identified all non-redundant single, double, and triple cut MMPs and mapped “active” MMPs to HTS assays. He developed a promiscuity score based on the number of substances with one or more “active” MMPs, the number of “active” MMPs for each fragment, the number of distinct assays with one or more “active” MMPs and the number of samples mapped to each “active” MMP. He showed the audience structures for the top-ranked promiscuous fragments, and literature precedents for some of them. The system uses the ChemAxon JChem/Marvin API for I/O; fragmentation; SMARTS match and generation; SMIRKS generation; and chemical structure depiction. The promiscuous chemotypes identified can be used as a filter for prioritisation of HTS hits, and for compound library design.

Molecular fragments at NIH · (View the Presentation) · return to TOC
Rajarshi Guha and Trung Nguyen of the NIH Chemical Genomics Center have been exploring how molecular fragments can be used to explore large chemical spaces. Rajarshi presented some results. Molecular fragments can be considered to be the n-grams of chemistry. Assuming you obtain fragments from a large enough collection you can learn from them in QSARs and generative models, use them as filters, as an alternative to clustering; explore chemotypes and activity; and study scaffold level promiscuity. One tool that Rajarshi and his colleagues have developed is a network-oriented view of fragment (scaffold) collections, coloured by arbitrary properties, which allows a quick assessment of the utility of a scaffold. This scaffold activity diagram can be tried out at http://tripod.nih.gov/.

Topological and physicochemical descriptors for the R-groups can be evaluated to characterise an SAR landscape, which should not be too flat or very bumpy. The fit to PLS or a ridge regression model can be studied. Scaffold QSAR, however, has some drawbacks. Rajarshi and his colleagues are interested in exploring summaries of activity, grouped by scaffolds and targets, and for these fragment activity profiles they have pre-processed the ChEMBL database, generating scaffolds by exhaustive enumeration of combinations of smallest set of smallest rings and normalising activity data so that they can compare the activity of a molecule across different assays. The pre-processing steps are available as a Java servlet at http://tripod.nih.gov/files/chembl-servlets.zip. Rajarshi showed how Z-scores are calculated for activity. In the fragment activity profiler available at http://tripod.nih.gov/ a user can draw a molecule, fragment it on the fly and use generated fragments to create activity histograms.

Target families or individual targets can be selected from the ChEMBL target hierarchy. Rajarshi found no obvious correlation between fragment similarity and activity profile similarity but the method is probably not rigorous when a scaffold has few parent molecules. He also explored profiles for fragment pairs and tried to Identify fragment pairs with very different activity distributions and identify fragments with a preference for a certain target or target class. Rajarshi gave one example of selectivity that worked and another that did not.

Thus far he had been using the terms “fragment” and “scaffold” interchangeably but this is not always true. To encode the idea of scaffold-like or fragment-like he uses the concept of signal-to-noise ratio (SNR) where SNR is defined as the size of the fragment divided by the standard deviation of the number of atoms not in the fragment, considered over the parent molecules. Large SNRs are associated with Murcko-like fragments but a useful SNR cut-off is an open question. Rajarshi and his colleagues labelled the parent molecules as core-like or noncore like. If the number of atoms not in the fragment is greater than the standard deviation, the molecule is noncore-like, otherwise it is core-like. The activity distributions of the parent molecules were visualised, grouped by the label. Thus far the “corelikeness” work has not been very successful.

Rajarshi and his colleagues use the JChem toolkit primarily for the core cheminformatics representations, file format loading and depictions. In the backend they also use the JChem cartridge for structure searching in Oracle.

Partner session

· return to TOC

Keith Taylor of Accelrys presented Pipeline Pilot: a scalable, secure, Web deployable, enterprise solution, and not just for chemistry. ChemAxon components extend Pipeline Pilot’s capabilities and give more choice in chemistry. Pipeline Pilot is a good prototyping environment for ChemAxon’s advanced science.

Acelot’s tools (presented by Christian Lang) use Marvin for result visualisation. SimFinder performs chemical searching and lead generation; SigFinder is for fragment mining and lead optimisation; ActiPred carries out activity prediction; and Pharma Miner analyses pharmacophores in Joint Pharmacophore Space.

DeltaSoft’s ChemCart is a Web-based, configurable forms interface to Oracle that integrates with pipelining and analysis tools. It is an integrated suite of applications that makes use of ChemAxon components. Yvonne Shimshock presented four case studies: registration, reagent inventory, a cloud implementation and a system to be accessed by an external collaborator.

Steve Muskal of Eidogen-Sertanty presented his company’s mobile apps for iPhone or iPad (or both in some cases): iKinase, iKinase Pro, iProtein, Mobile Reagents, Reaction 101 and Yield 101. Reaction101 and Yield101 are built collaboratively with Molecular Materials Informatics. JChem and JChem Web Services are used for structure searching in iKinasePro.

Sreenivas Devidas presented the GVK Bio Online Structure Activity Relationship Database (GOSTAR, also available in-house). Information provided includes SAR, ADME, toxicity, preclinical, clinical and structural data, for over 5.1 million compounds with over 16 million quantitative SAR points. Several analysis features have been built into GOSTAR. ChemAxon software used includes MarvinSketch, MolConverter, the JChem cartridge, the Library MCS API, Name to Structure, JChem Base, Calculator plugins, chemical workflow tools, Standardizer, IJC, JChem Web Services, Structure Checker, Screen, Reactor and Fragmenter.

Steve Boyer of Collabra said that IBM Research uses ChemAxon tools to clean up the structures generated by name to structure conversion after capture from images or names in the scientific and patent literature. IBM uses DB2 not Oracle for the database. Having built the database IBM is now concentrating on front end tools such as “SIMPLE” (later to be “SIIP”) which carries out substructure search with ChemAxon tools. IBM is starting a new project, Deep QA, a massively parallel probabilistic architecture.

Jeff Tishler of IDBS said that ChemAxon technology is used in E-WorkBook and InforSense. A solution for parallel synthesis integration in E-WorkBook uses Reactor APIs and there are spreadsheet chemistry functions such as Structure to Name. Multiple ChemAxon tools can be used in InforSense workflows. Workflow applications can be published to an integrated Web portal and interactive graphics and analytics are provided.

Takahiro Ohshima of Infocom asserted that 90% of ChemAxon functionality is available in Infocom’s JChem Extensions for KNIME. The KNIME WebPortal and QuickForms give access to workflows through a Web Interface. A customised Web UI can be easily created, to prompt for workflow variables, upload a data file, download results, and mail a notification for long-running tasks.

John McNeil has a team of eight, most of them with laboratory experience as well as programming skills. They make tools to help bench scientists be more productive, efficient, and creative. The JChem cartridge and IJC can be linked to John McNeil and Company’s LabSynch Semantic LIMS.

Kelaroo applications are “vendor agnostic” but Kelaroo urges its customers to go with ChemAxon’s solutions. Kelaroo has re-oriented its business towards products rather than software contracts over the last two years. Robert Feinstein presented the Kelaroo Synthesis Request System for customer and CRO management and the Kelaroo Reagent Management System for vendor and catalogue management. He appealed to ChemAxon to get MarvinView and MarvinSketch to work robustly with recent and present versions of the Java Runtime Environment (JRE) and all major browsers, including Internet Explorer.

Linguamatics’ I2E is used widely in the top 50 pharma. It allows users to discover indirect associations. For example, if cyclosporin is linked with a target in one document and that target is linked with psoriasis in a second document, I2E can link cyclosporine with psoriasis. David Milward, who did the partner presentation, also gave a talk about ChiKEL.

Mike Bossom of TIBCO Spotfire showed how Spotfire Server links to the JChem cartridge and structures are rendered in the Spotfire client with MarvinView. TIBCO Spotfire is one platform with many applications, including high throughput screening, project key criteria (“KPIs”), SAR, selectivity analysis, dose response, competitive intelligence and gene expression.

CEO’s concluding remarks

· return to TOC

Alex Drijver wound up the 2011 US UGM by referring to my comments and our discussions in 2008. In 2008 I wrote: “As ChemAxon seeks to become a more “serious” contender in the cheminformatics marketplace I do hope that it will not lose too much of its informality and friendliness. The users will not want it to lose its justifiably high reputation for support. ChemAxon aims to retain its existing virtues with a more confident and serious business face”.

Alex said that the company has changed. It now has 60 permanent staff and 30 “consultants”. About 75 of these people are developers. There is now more need for formality in developing code. The size of the rooms in the office facility in “house on the hill” is a problem now that the company is doing agile development: for agile development, teams have to have desks close together. So, everyone is leaving the current offices in November 2011 and moving to a science park, but this should not be allowed to affect the ethos of the company. ChemAxon is still known for its informality, approachability and good support. The ethos is in the people: very few leave and the company now has employees who have served for 10 years. The common core of software was all developed in-house. Behind the people are the company values.

ChemAxon stands for quality. It is customer-oriented and transparent: other companies do not run a public forum for support or allow everyone within the company to see everyone else’s email messages, but ChemAxon does. The company is personal and informal, not snobby or hierarchical. It is efficient and keeps costs down so that the customer gets a value for money proposition. It has no huge finance or administrative departments. It uses no-frill airlines and shares hotel rooms. Most importantly, ignore all rumours about ChemAxon being acquired. ChemAxon is not for sale; it is independent and has no debts, no investors and no tedious committees.





Conclusion

· return to TOC

If you were hoping for the usual short report, I apologise. This time there was so much happening that I feel that more detail is justified. I could not help comparing this event to the dozens of MDL user meetings I attended in various countries in the 1980s and 1990s. Ongoing themes at the ChemAxon meeting included the guy with his list of 40 most wanted enhancements, Yet Another Registration System, some of the same old faces, useful gossip, and so on. Admittedly, the dress code is smart casual rather than three-piece suits and earrings (not combined!), and there are more Hungarian speakers than German speakers at the European event, but the elements of déja vu are there. In the 1980s we used to have the “3D day” at the MDL User group meeting: half a dozen enthusiasts would attend and the rest of the delegates took a day off sightseeing in San Francisco. ChemAxon users are not afraid of computational chemistry and 3D. I never went to many Daylight meetings but I suggest that the ChemAxon user meeting is also a cross between a MUG and an MDL meeting. At least one of the two annual ChemAxon user meetings is now an essential date in my calendar and the dates are blocked out a year in advance. I say all this in order to give readers an idea of the significance that ChemAxon is achieving in the cheminformatics arena. I strongly recommend that you keep an eye on this company.

Return to Table of Contents