Back to nature in the heart of the city

news · 12 years ago
by Wendy Warr (Wendy Warr & Associates)
Platform: JChem and Marvin applications Pipelining and workflow
Cloud computing and Web Services More from ChemAxon
Similarity and the third dimension Posters
Text Mining Partner presentations
Instant JChem Conclusion

This year’s meeting was held at a resort hotel and spa on Margaret Island, an island in the Danube connected to Buda and Pest by the Margaret and Árpád Bridges. It is one of the most beautiful open spaces in Budapest. The meeting opened with the traditional evening garden party at ChemAxon’s offices on a hill overlooking the city. The conference dinner was held at the Gödöllő Palace, where we were treated to an interesting historical tour and a dinner in elegant surroundings.

Despite the recession, user attendance at the meeting proper was about the same as last year, and there were even more ChemAxon employees and adherents. As usual, the meeting had an informal air about it, people were very sociable, and drinks were on the house on more than one occasion. The baths (an essential feature of these Hungarian events) were perhaps less appealing than at the previous two venues, but on the whole it would be hard to fault the venue and the entertainment.

Alex Drijver, CEO of ChemAxon, opened the proceedings with his "state of the company" address. ChemAxon continues to show solid growth in its business despite the atrocious state of the market. The company has been achieving this by keeping support as the number one unique selling point; adding new products and features to the shop window; expanding the partner programme; and expanding to new markets, especially China and India. Alex told us that support is the cornerstone of the ChemAxon business; it is not only the unique selling point but also the way the company finds out what users want. Partnering is also important: 6 or 7 new companies have been added to the programme. "We provide the carrot you make the cake", quipped Alex. The company has been profitable since 1999. It is financially stable and has no cash flow problems.

Alex was followed by David Spender who gave us an overview of what’s new in the ChemAxon product line. Later presentations gave much more detail. There cannot be many readers unfamiliar with ChemAxon but it might be worth mentioning at this point that Marvin plus JChem is a software suite of application programming interfaces (APIs) and graphic user interfaces (GUIs) used to build chemically aware, platform independent and Web-ready, enterprise informatics systems. Marvin includes structure and reaction editing, visualisation and structure based property prediction; JChem includes structure management and search, library enumeration and library profiling. Version 5.2 of the suite was released in March 2009.

Platform: JChem and Marvin applications
The first user presentation was given by Gert Thijs of Silicos (archive). Silicos is a small company with a de novo design technology (Computational Optimisation of Small Molecule Structures (COSMOS)) implementing proprietary Spectrophore field-based molecular descriptors. Gert has used JChem and Instant JChem (an all-in-one chemical database application) to build "Simosa", a database of more than 7 million commercially available compounds, and to handle internal inventory and compound registration systems. Individual researchers use Instant JChem for their own smaller projects. ChemAxon software was used (amongst other reasons) because it allows systems to grow as the company grows. Gert commented that 80% of Accelrys’ functionality was not needed at Silicos. It these circumstances, perhaps it was not fair for Gert to complain that Instant JChem requires a huge amount to memory to browse tables, and substructure search of the complete Simosa database is not feasible on user laptops. ChemAxon points out that the alternatives are JChem Cartridge, Instant JChem Server or Web Services. Maybe this is what "growing with the company" is all about.

Akos Papp of ChemAxon mentioned lots of new features in Marvin but the key ones were covered in other talks. Version 5.3 (due in late October) will have a .NET version of the GUI. There will be a new bracket tool and it will be possible to add data to a bracket, so that, for example, mixtures can be stored with data such as "20 mass percent" attached to a component. There will be a Chemical Terms editor in the GUI. (Chemical Terms is a language for adding chemistry and mathematical functions including property predictions, functional group recognition, isomer enumeration, conformer selection, ring and distance based topological functions, etc. The functions are currently integrated via an open plugin interface.) Template handling will be redesigned, and more import and export options will be added. Version 5.4 will have structure checking in the GUI, leaving group definition, and multistep reaction support.

It fell to Szabolcs Csepregi of ChemAxon to talk about what’s new in the JChem back end and in storage and search of Markush structures. ChemAxon’s chemical database products include JChem Base (a library for adding chemical structures into relational database systems, now available in .NET as well as Java and JSP), JChem Cartridge for Oracle, Instant JChem, JChem Web Services (the new SOAP interface to JChemBase) and JChem for Excel, which was recently launched. Apart from JChem Web Services and the .NET API, new in JChem Base are polymer storage and search, some new query options, and new metrics for similarity search (e.g., the Tversky coefficient). There are some new features in Markush search, in particular, support for repeating units and for built-in and customisable homology groups (e.g., "alkyl" and "aryl"). ChemAxon is planning to add .VMN import, multiple graphical attachment points of R-groups, homology variation queries, overlap analysis of Markush structures (without enumeration), homology group properties (e.g., number of atoms) and conditions for Markush variables.

Catherine Reisser of Evotec described two applications using ChemAxon tools. The first was a chemical ordering system, EVOsource, using a Java Web application, plus Marvin Applets, EclipseLink, Oracle, and JChem Cartridge. The former ordering system had three schemas in the same database with same data model: the Evotec Supplier Database (ESD), the Screening Supplier Database (SSD), and Symyx ACD, for which Evotec had only 34 named user licences. There were different naming conventions for the identifiers in each schema. Users could search only one schema at a time, and only one structure at a time. Data were difficult to import; SD files were inconsistent; there was duplication of information between ESD and SSD; the ordering system was linked to ESD only and there was no easy way to order chemicals from SSD; and there was no direct link between the chemical ordering system and the company’s purchasing system. The new EVOsource system, which will be in full production by July, overcomes these limitations. All scientists will have access to ACD via ACD Web Services. ChemAxon’s Standardizer is used to apply Evotec business rules, and molecules are rendered as pictures (for memory reasons). In future it would be good to have links to other external data sources, to avoid searching public Web sites (with inherent intellectual property risks) and to have links to the new ELN and registration systems.

The second application was a registration system. The former Evotec registration system was developed externally using Daylight software. Data were stored in non-standard formats and security was inadequate: only three roles were defined and everyone had access to all the data. The new registration system has defined project and user roles and applies Evotec business rules, albeit with the flexibility necessary within a service company. The system is defined around preferences for displaying and exporting data in a required format. It uses a Java Thick Client plus Marvin Beans, PL/SQL, Oracle and JChem Cartridge. Chemists register and novelty check compounds but once the data are locked only a registration scientist can edit them. Catherine described some of the issues faced during data migration (problems with extended SMILES, tautomers, compatibility with ISIS for Excel, and aligning molecules to templates). Future plans include an option to record data for compounds without structures; modifying the security model to allow a "public" project; linking to the new ELN, to EVOsource and EVOseek (for biological data); using the Netbeans platform and perhaps Instant JChem; and splitting off the library enumeration routine so that it can be used as a standalone module.

The JChem suite now has four interfaces: Java, SQL, .NET, and Web Services, said Jonathan Lee of ChemAxon. He talked about the new .NET, and Web Services interfaces. ChemAxon is offering a pure .NET solution for all non-GUI elements of JChem. (Marvin 5.3 will have .NET GUI components.) The new open source IKVM translation provides a .NET library that is simpler to use and runs faster than the earlier solution powered by JNBridge. JChem Web Services include JChem Search, Standardization, molecular conversion, and Chemical Terms evaluation. Jonathan did an Asynchronous JavaScript and XML (AJAX) demonstration of the JChemSearch Web service. He scrolled down a spreadsheet, showing how fast browsing was, and generated a URL link for sending structures and other data to a colleague.

Sorel Muresan of AstraZeneca gave a 2008 update on some work he and his colleagues have published comparing public and commercial databases of bioactive compounds in 2006 (Southan, C.; Várkonyi, P.; Muresan, S. Current Topics in Medicinal Chemistry, 2007, 7(15), 1502-1508). A manuscript has been submitted to Journal of Cheminformatics this year, (link to article). The structures in 23 databases, including public databases such as DrugBank, BindingDB, ChEBI, and PubChem, and in commercial databases such as GVK BIO, the Dictionary of Natural Products, MDL Drug Data Report, and WOMBAT, were standardised (normalised, neutralised etc.), and unique molecular hashcodes were generated using software written by Jens Sadowski and Niklas Blomberg. Unique structures were retained after comparing these hashcodes and numerous different comparisons were made.

The number of non-overlapping substances in GVK BIO, WOMBAT, and PubChem is little changed between 2006 and 2008 despite the fact that PubChem has almost doubled in size. A large overlap was expected between the GVK BIO drug database, DrugBank approved drugs and "MDDR Launched" but this is not so. Perhaps researchers in pharma need access to all three sources. Sorel and his colleagues also compared PubChem (14,965,539 structures in 2008) with a database of all commercial sources merged (2,284,461 structures in 2008); there were only 1,043,399 structure in common. In short, researchers need to look at both public and commercial sources to get the whole picture. Moreover the biological annotations may differ for the same compound in different databases. The AstraZeneca system for exploiting annotated data uses Marvin and JChem.

Sorel concluded that both shared and unique content can provide value. Based on content per se, he feels that the pendulum is swinging in the public direction. Patent compound coverage is increasing in PubChem. Public and commercial sources offer different linking and mining functionality. Journal and patent compound-assay-protein mapping is covered on a larger scale by commercial databases but public sources have essential complementarity to commercial ones for the exploration of bioactive chemical space.

Cloud computing and Web Services

Michael Dippolito of DeltaSoft talked about cheminformatics in the cloud. In "cloud computing", computing resources are provided as a service over the internet. They can then be dynamically scaled to meet current needs. Payment is by usage; for example, it might cost $0.10 – 0.40 per instance hour for the server, $0.15 per GB month for storage and $0.10 – $0.17 per GB transfer, in or out, for the network. There are three layers of "cloud": software as a service (SaaS) is built on platform as a service (PaaS) which is in turn built on infrastructure as a service (IaaS). In response to increasing demand for software as a service and hosted cheminformatics solutions, DeltaSoft and ChemAxon have teamed up to provide a suite of fully hosted applications, including compound registration, inventory, bioassay, and structure activity searching and reporting. The hosted solutions free IT groups from the installation, ongoing maintenance and upgrades of hardware and software infrastructure. Users can access applications and data anywhere from a Web browser.

Andy Mott of Contur Software discussed another cloud computing application. The ConturELN, a server-based ELN for all disciplines, was released in 2003. A SaaS-based ELN, called iLabber, building on ConturELN and offering most of the same features, was launched in May 2009. An ELN contains a lot of unstructured data including chemical structures and reactions. A cloud-based ELN does not allow for customisation or integration with other systems. Chemical searching must be enabled in a way that allows users to find their information without restricting the way they enter data. Andy discussed the way that iLabber uses JChem to meet these requirements. His slides used Accord for Excel in examples because JChem for Excel was not available. He ended by asking the audience whether this was the best approach, which suggests that the solution is a work in progress.

Richard Bolton of GlaxoSmithKline gave an update on the Pistoia Alliance. This is an Open Source initiative established to streamline non-competitive elements of the pharmaceutical drug discovery workflow by the specification of common business terms, relationships and processes. The advent of Web Services and Web 2.0 allows proprietary data to be decoupled from technology, and a service orientated approach allows for precompetitive discussions such as those among the Pistoia collaborators. The founding parents of the Pistoia initiative are at GlaxoSmithKline, Astra Zeneca, Pfizer and Novartis. The initiative was first publicised over a year ago, and I have heard at least three talks about it, so it was remarkable to discover that the Pfizer delegate in Budapest had never heard of Pistoia. The Alliance was officially launched as a non-for-profit corporation in February 2009 and is now accepting member applications. ChemAxon is already a member.

A first success is the LHASA Web Service. Until recently, each company had developed its own DEREK interface (using the Windows-based API) which might need changing after LHASA updates. After Pistoia involvement, the DEREK Web Service is available to all customers: a single interface that decouples implementations. Richard himself is co-ordinating the ELN Data Mart working group, which began work in March. Work on the biology workstream has not yet started: for this, Pistoia is seeking pragmatic people who can deliver.

Similarity and the third dimension

At the 2008 user meeting Miklós Vargyas of ChemAxon got my vote for the most talented speaker when he explained the nature of clustering and contrasted star clusters with chemical clusters. This year he gave an update on LibraryMCS clustering, showing off its performance. It scales linearly. He also gave a live demo of the use of heuristics in MCS search. In version 5.2 of the ChemAxon software, libraries of more than one million compounds can be clustered using various ring models and Bemis-Murcko frameworks.

At this year’s meeting, Miklós’ "approachable" talk (explaining a difficult topic with everyday examples) concerned similarity searching. This time he was joined by two colleagues, Gábor Imre and Adrián Kalászi, in an exposition about similarity, illustrated with toy cars of different sizes and colours. The talk was actually about 3D structures and similarity. ChemAxon may not be thought of as a "3D company" but it does have a 3D multi-conformation Calculator Plugin and a 3D structure and surface visualiser, MarvinSpace. Recently, the company has been working on 3D flexible alignment, 3D volume overlay and 3D virtual screening by 3D molecular descriptors.

Typical similarity searches are two-dimensional, with molecular descriptors and fingerprints. This is equivalent to flattening the cars and ripping them up. Two methods can be envisioned for 3D similarity: comparing shapes (a big car and a little car) or using descriptors from 3D objects (like Lego parts) and generating 3D fingerprints. Fingerprints are fast to generate and compare. Similarity by volume alignment (summing Gaussian functions etc.) is a costly multistep process. It is also important to allow for conformational flexibility. Multi-conformer rigid search is one way of tackling conformational flexibility. It is highly combinatorial: all target conformations need to be considered and there may be multiple query conformations. Multiple conformations have to be generated.

In 3D alignment, atom-pair distance constraints are minimised. Miklós used a MarvinSketch plugin to show the alignment of some challenging molecules. The Tanimoto coefficient can be used in 3D shape similarity with similarity scores in 3D, e.g., root mean squared distance, but pre-filtering is needed to speed things up. ChemAxon has developed a technology for 3D flexible virtual screening on 2D molecules with a 2D query. It maximises the coloured shape intersection of the query and the target and calculates a volume Tanimoto coefficient of the optimum 3D flexible alignment. A 3D pharmacophore fingerprint is generated by calculating maximum and minimum distances between each pair of atoms and then compressing the distance range into a histogram ("binning'). The costly calculation of the fingerprint is done only once per library. The similarity search is expected to perform as fast as with a 2D fingerprint. ChemAxon delivers some underlying 3D calculations as plugins, but not yet the full search process.

Text Mining

A very large number of Web pages, patents, text, office and PDF documents contain chemical names, which constitute valuable information. However, since the names are mixed in the text, without specific markup or associated chemical representation, it can be very difficult to identify and use this information. Daniel Bonniot of ChemAxon presented a set of technologies that solve this problem by detecting chemical names inside free text, converting the names to chemical structures, and annotating the original document with this added information. These technologies are combined to power the Web site, a free public service that acts as a proxy rendering any publicly accessible Web site with added chemical annotations. With these annotations it then becomes possible to perform rich chemical searches (for instance sub- or super-structure searches) on the set of indexed documents. Daniel reported some incremental improvements in ChemAxon’s structure-to-name and name-to-structure software and announced the "document-to-structure" product (locating and converting names in text and HTML documents), including OCR error correction. The Web service is built on top of it, adding structural information to existing public Web pages. As the mouse hovers over an underlined name, it opens a pop-up window with a structure image. Links to physical properties can be made available. could be installed natively on a custom Web site, with custom features.

Novartis has developed a chemical entity extraction tool with "under the hood". In a remote presentation, Josef ("Sepp") Scheiber presented a vision in which competitive intelligence is extracted from images, text, and tables from all sorts of Web sites, patents and documents. The chemical information, with further annotation, stored in a data warehouse, could be used in automated chemogenomics applications, or in patent analysis for medicinal chemistry projects. Sepp concentrated on patents in the user meeting talk. He aimed to extract all molecules that are mentioned in a patent text of interest, convert them to structures and make them available in machine-readable format. He chose to use ChemAxon’s extractor and name to connection table software.

He presented a case study in which a medicinal chemist wants to synthesise a competitor compound as a tool compound for his own project and analyse substitution patterns on a scaffold. In one example 452 compounds were automatically extracted from a text-based patent. The patent reference shows that there are 636 compounds; thus 71% were found automatically. Not all patents are handled so successfully. Text extraction is not suitable for image-based patents. Different languages (especially Japanese) can cause problems. OCR software errors and typos present problems; name-to-structure software presents fewer problems.

In IBM’s ChemVerse project, Steve Boyer’s objective is to extract all molecules automatically from all patents and make them searchable in a database. IBM takes advantage of cloud computing, has access to all full-text patents, and annotates the molecules with information from freely available databases. Sepp applauds this approach. He outlined some of his own plans for the future if he is to achieve true patinformatics: "the science of analysing patent information to discover relationships and trends that would be difficult to see when working with patent documents on a one-to-one basis" (Tony Trippe’s definition).

SureChem’s database of more than 9 million chemical structures is generated by extracting chemical names from full text patents and converting them to chemical structures using a suite of name-to-structure conversion tools. SureChem has performed an evaluation of ChemAxon's name to structure toolkit, comparing the results to those obtained by the other tools in SureChem’s current production pipeline.

As an indication of a typical conversion rate, Nicholas Goncharoff mentioned a benchmark data set of 101,074 chemical entities extracted from 900 pharmaceutically relevant patents, only 58.2% of which were converted to structures. The overall and pairwise conversion rates for the four name-to-structure tools in the current comparison were as follows. (The bold numbers are the total number of names converted for each tool. The pairwise comparisons show overlap, with numbers underneath in brackets showing the differing conversions. The numbers in the column headed "unique" are the numbers of names converted by that tool alone.)


Tool 1

Tool 2

Tool 3

ChemAxon tool


Tool 1






Tool 2






Tool 3






ChemAxon tool






ChemAxon’s tool produced more unique structures than any of the other tools produced. This could mean that ChemAxon generates more questionable structures, but it could also indicate high value data. An estimated 1,800 structures appear to be exemplified compounds. A manual review of 200 name-structure pairs extracted from the ChemAxon uniquely generated structures showed that 70% are unambiguously correct; 20% are ambiguous or relate to conversion of a fragment of a structure; and 10% are incorrect, due either to the nature of the names or to how the tool handled them. ChemAxon is still refining the tool: changes made for version 5.2.2 led to a 10% reduction in incorrectly generated structures (measured against other tools) compared with version 5.2.1. There was also a 10% reduction in conversion of fragmented names in the set of unique structures.

Nicko concluded that, in a short time, ChemAxon has developed a tool that is comparable to longstanding competitors. SureChem finds ChemAxon’s tool the easiest to use, with a good range of settings options. ChemAxon is quick to make improvements: SureChem had worked with three new releases in the previous two weeks, each one yielding better performance. It is likely that SureChem will be licensing ChemAxon’s name-to-structure software.

Lutz Weber described OCMiner, Ontogen’s chemistry aware semantic search engine, based on IBM’s Unstructured Information Management Architecture (UIMA) framework, a platform for building analytic solutions that process unstructured information to find latent meaning, relationships and relevant facts. The OCMiner toolbox provides a chemical structure aware ontology search engine: exact, substructure and similarity searching over any document based on dictionaries, and rule-based name-to-structure libraries. It also provides ontologies, fast annotation of large volumes of documents (in PDF, HTML, XML, Microsoft Word formats), and very large dictionaries for named entities (compounds, diseases, species etc.). Lutz claims that when compared to OCMiner, Peter Murray-Rust’s Open Source Chemistry Analysis Routines (OSCAR3) is too slow, annotates non-essential terms and fails to annotate important items. Once chemical entities have been detected, their names are converted to structures. Lutz claims that ChemAxon’s naming software for doing this is much better than OPSIN, a tool for name-to-structure, commonly distributed with the OSCAR3 package, but also available as a standalone library. A test version of OCMiner is available on the Web. Linguamatics is also using ChemAxon’s name to structure software; they gave a partner presentation.

Instant JChem

According to Ian Berry of Evotec, there is no clear answer on whether to buy or build an in-house cheminformatics system. It depends on what you want to do, how quickly you want to do it, what is available commercially and what resources you have available. Evotec chose to buy an ELN to work with ChemAxon tools and they chose ConturELN. They have not yet decided what to do about their EVOseek system (the corporate registry with HTS data, target data etc.). It has become monolithic and complicated to maintain and it uses old versions of JChem and Marvin. In future Evotec will move toward a component-based architecture and build on top of Netbeans and perhaps Instant JChem.

Ian presented his wish list for Instant JChem. When administrative changes have to be made to the database, everybody has to log out; this is very inconvenient. Version 2.4.x software upgrades do not work with the database 2.4. The limited number of characters for database column names is a problem, as some assay names can be large. Incompatibility with ISIS for Excel is a problem although there is a workaround. There is also some concern that Oracle buying Sun may have an effect on the future of Netbeans Instant JChem. Easy conversion and export of an Oracle Instant JChem database to a local database is a wish already in the pipeline.

Ian thinks that operations such as the Pistoia Alliance will be good for everyone. It is a great idea for pharma companies to force software providers to allow collaboration, and common and consistent formats for data transfer are a good thing too, but will it take too long to get agreement on something, and will Pistoia become too commercially driven? Fortunately it looks as if pharma will drive Pistoia. Finally, Ian recommends that if you are writing Java code, you should look at Java Persistence API (JPA) as a means of database interaction: it has a lot of nice features.

Tim Dudgeon for ChemAxon exemplified the 10 things that best describe Instant JChem: simple and flexible deployment; creating and managing structure databases; importing, exporting, merging and editing data; building tabular and form-based reports; running combined structure and data searches; structure-based predictions; managing relational data; sophisticated chemistry features; collaboration; and extensibility. He then talked about short and medium term developments. Soon, a URL field feature will allow users to pull in data from external sources such as ChemSpider, to handle new data types, and to drill out to external sites. Reactor will be incorporated in Instant JChem soon and there will be improvements to the schema editor. Calculations and fine grain security will be added later. Instant JChem Server is coming later this year. This will provide a three-tier architecture, will reduce memory and cpu requirements on the client, and will have the advantage of a faster start-up time.

Pipelining and workflow

Szilárd Dóránt described some new features in the ChemAxon for Pipeline Pilot component collection. Calculator Plugins were already in an earlier version of the component collection but it was not flexible enough so a new component, Chemical Terms Calculator, was added in July 2008. An improved error reporting system was also implemented at that time. The Reactor component underwent a major upgrade in November 2008. LibMCS Clustering, Molecule to IUPAC Name, and Molecule from IUPAC Name components were also added. In May 2009, further new components were added: MolConvert, Tautomerization, and Markush Enumeration.

ChemAxon nodes can also be used in KNIME. Dimitar Hristozov is doing an industrial postdoc with Eli Lilly, extending the work published in Patel, H.; Bodkin, M. J.; Chen, B.; Gillet, V. J. A Knowledge-Based Approach to de Novo Design Using Reaction Vectors. J. Chem. Inf. Model., 2009, 49, 1163-1184. The method is implemented in KNIME nodes and ChemAxon libraries. Dimitar hopes eventually to publish the KNIME nodes.

More from ChemAxon

ChemAxon’s pKa and logP prediction methods are dependent on the molecule types in the training set. József Szegezdi showed that a user-trained logP local model based on 25 molecules outperforms all of the standard models. A user defined pKa model is also more accurate than the built-in default model. Instant JChem can be used for curating input data for the training. The new model is only a refinement of the default model, so the training assumes a robust base model that is provided in Marvin. I would have been happier with this presentation if József had not presented an R2 of 0.99 in both his examples.

Gyorgy Pirok demonstrated some of ChemAxon’s command line utilities. These are particularly useful when working with structure files in batch mode. In another presentation, Gyorgy talked about the new product Metabolizer. The prediction of metabolic fate can help in the evaluation of experimental results, in the estimation of metabolic stability, and in the early identification of potential toxicity risks, but it is a complex problem. As part of the KnowTox project being carried out in collaboration with Aureus Pharma and sanofi-aventis, ChemAxon has developed a new software application called Metabolizer to enumerate all metabolites and to predict the major ones of drugs or other xenobiotics. The results can be influenced by the biotransformation libraries used. ChemAxon currently sells a demonstration database with the software, and a facility for users to build their own biotransformations.

Tamás Pelcz described JChem for Excel, a new product that is being implemented in C# .NET and Visual Studio. It features handling of structures inside Excel; import from databases; import from, and export to files; structure filtering; R-group decomposition; and custom chemical Excel functions. Most Chemical Terms functions and Marvin calculations are implemented as Excel functions. There are lots of plans for future developments; development will be easier when the .NET interface to the JChem suite is fully implemented. Tamás ended with a screen shot of JChem in SharePoint. There is no release date for this application yet but ChemAxon is working on it.

From the number of comments at this user meeting about speeding up the software, I am assuming that performance has been an issue in the past. Developments in core search features include a new chemical hashed fingerprint to improve search performance. Szabolcs Csepregi gave some typical registration and search times. JChem Base 5.2 has 40% faster substructure search but cartridge performance needs improving. Spreadsheet view has been speeded up so that scrolling 1 million structures is now possible. It is expected that the Marvin applet version will load 2-3 times faster in version 5.3. In version 5.4, a modular system, Marvin Lite, will load 5-6 times faster.


Aureus Pharma has used AurPROFILER and ChemAxon software suite in similarity searching. Results are displayed as interactive heat maps for drug repurposing. PathwayExplorer in Genostar’s Metabolic Pathway Builder Suite makes use of JChem Base for managing and displaying compounds and metabolic reactions. Gedeon Richter has developed a JChem-based Web application for searching compounds in vendor databases. Researchers at Eötvös Loránd University, Budapest presented some mathematical tools capable of evaluating the binding of molecular libraries to protein molecules, yielding tools for drug discovery and design.

Partner presentations

Agilent presented the Kalabie ELN which comes bundled with ChemAxon software but could be integrated with an alternative package.

Aureus Pharma’s AurQUEST and AurPROFILER integrate ChemAxon Java modules to assess compound-target polypharmacology and carry out 2D pharmacophoric similarity.

Biochemfusion sells Proteax, a protein data cartridge for Oracle which uses the Marvin chemical structure editor.

ViSoR (Virtual Screening optimizing the Reality) from c.a.r.u.s. IT is a browser-based IT platform that implements TrixX (the next generation of FlexX from the University of Hamburg), and other virtual screening software, and also uses Marvin and JChem. "ViSoR is the Christmas tree that you can decorate with TrixX, Autodock, whatever."

Jürgen Swienty-Busch of Elsevier presented Reaxys, a new workflow solution for synthetic chemists based on data from CrossFire Beilstein, CrossFire Gmelin and Patent Chemistry Database merged together with additional functionalities, a redesigned interface, in which MarvinView and MarvinSketch are instrumental, and advanced results handling. Features include workflow and decision making support for synthesis design and planning; quick access to key data by displaying results in a unique tabulated overview; and output of data in most common formats. Reaxys also contains a synthesis planner.

Founder, one of the largest IT companies in China (second only to Lenovo), is supporting ChemAxon’s products in China and elsewhere. Sufang Zhao, VP International Business, showed examples of chemical registration and chemical supplier tracking software developed from the ChemAxon API, plus an incubator project based on FAST and the ChemAxon API.

Sophie Huet of Genostar demonstrated PathwayExplorer plus ChemAxon software. She did a substructure search for benzoate, displayed biochemical reactions that involve benzoate, and viewed KEGG maps of metabolic processes. Genostar’s software colours those maps and makes them interactive.

Linguamatics is working on chemistry enabled text mining with ChemAxon. The software demonstrated is a work in progress: the integration has to be made more seamless. Paul Milligan demonstrated structure input and ontology lookup. He drew a structure and took the SMILES into Linguamatics’ I2E natural language programming based, knowledge discovery platform, and did a substructure search in 10,000 documents. Similarity search is also possible.

MolPort has yet another molecular search engine and online chemical marketplace. Imants Zudans says that the unique selling point is the actual ordering of the chemicals: MolPort organises your purchases for you.

SEURAT from Synaptic Science has now been split into Personal SEURAT and Enterprise SEURAT. Hosted SEURAT is also available. Abbott sang the praises of the product at BioIT World.


How does this meeting differ from other user meetings? Recently I have been to two user meetings where the underlying theme was "ELN ELN ELN", so it was interesting to sit through two days of a programme that was not ELN-centric. The ChemAxon meeting is also less marketing oriented than a typical American user meeting and there is less business jargon. Some of the talks did have lengthy lists of features but the underlying approach is, in general, centred on computer science and computational chemistry.

As ChemAxon seeks to become a more "serious" contender in the cheminformatics marketplace, I do hope that it will not throw the baby out with the bath water and lose the very special characteristics that its faithful customers appreciate. Alex Drijver says that ChemAxon aims to retain its existing virtues but to develop a more confident and serious business face as well. He speaks with the confidence that one would expect of a CEO but neither he nor the other speakers sound like smooth or slick marketing executives. I hope that ChemAxon will not begin to learn marketing-speak: currently it is a pleasure to find that no one at this user meeting thinks that "leverage" (sic) is a verb. Gradually ChemAxon is moving towards "enterprise strength" (another bit of business-speak you do not hear at this user meeting) but it still seems to be doing so in an approachable way. Customers clearly do not see ChemAxon as inflexible, arrogant or monolithic.

Are there any negatives? The variety of material covered in this user meeting did make me think that ChemAxon wants a finger in every pie (except perhaps a ChemAxon ELN). Could this lead to the developers being over-stretched? I hope not. Thus far the company has shown how fast it can grow its new ideas; it has low overheads compared with its major US competitors and it has taken on more developers in the Czech Republic and India. Thus far it has a good record for delivering on its promises. No company can afford to stay still; ChemAxon still seems to be managing the balancing act pretty well and it is remarkable how its revenues are increasing, despite the very difficult market conditions. Competition among vendors is good for users and I am positive about ChemAxon’s plans for the coming year.