Chemical Structure Representation Toolkit

Canonicalization and correction of chemical structures

Adding new compounds to an existing chemical database always raises the issue of uniqueness and correctness. When registering a new item, migrating legacy data or uploading a set of compounds to a database; the new entries have to fulfill certain requirements. Chemical structures can be represented in many different ways. These differences affect not only the graphic appearance of the molecules, but can influence more fundamental details of the topology, making compound identification even more problematic. Obviously structural mistakes and errors have to be addressed too.
ChemAxon's chemical structure representation toolkit has two major components: Standardizer, transforming chemical structures into customized, canonical representations; and Structure Checker, offering numerous checkers and fixers to search and correct various structural issues.

Standardizer - canonicalizing chemical structures

Standardizer's main purpose is to transform chemical structures into representations that obey certain chemical business rules to avoid inconsistencies in a chemical database. The tool is typically used in workflows where new compounds are registered, or where structure-based virtual screening is performed. There are 40 pre-defined standardizer actions available that cover among others the following issues:

  • Adding or removing explicit Hydrogen atoms
  • Neutralizing charged fragments or functional groups
  • Recognizing and converting legacy representations of functional groups (like aliases)
  • Removing certain fragments (like water and salt counterions)
  • 2D cleaning and expanding abbreviated groups
  • Unified representation of aromatic rings, tautomers and mesomers

Read about more Standardizer's actions. Besides the pre-defined rules, you can also implement your own ones as well.

Different representations of nitrobenzene

Structure Checker - correcting errors in chemical structures

Structure Checker searches molecules for structural problems. In case it finds an issue, the error in a structure is highlighted, and an instant solution is prompted. The reported problem can be fixed automatically by a built-in, prompted fixer or manually by the user. This tool is crucial in filtering out drawing errors and incorrect features when a new compound is registered. More than 40 checkers add up the system, correcting issues like:

  • Invalid bond length
  • Overlapping bonds or atoms
  • Molecule charges
  • Incorrect chiral flags
  • Invalid valences
  • OCR errors
  • Substructure checker that can be configured to transform a given substructure defined by a SMARTS string

Read about more checkers. Structure Checker is highly customizable, so adding your own checker is also possible.

Structure Checker workflow


Toolkit components can be easily tailored to unique business needs and regulations. Both Standardizer and Structure Checker have a full featured Application Programming Interface (API) in Java and in .NET, making this solution easily integratable with in-house or third-party applications. (Reach out for Standardizer Java API, .NET API; or for Structure Checker Java API, .NET API). Meaning that, with some programming you are able to add your own standardizer actions, checkers, and fixers to your own system. These custom solutions are required usually if specific atoms, functional groups or patterns in the structures need to be removed or replaced with Standardizer; and if you have to handle chemical features that cannot be handeled by the built-in checkers and fixers in Structure Checker.

Read more about custom standardizer actions and implementing them; and about implementing fixers.

Implementing custom fixers with Structure Checker API


The Chemical Structure Representation Toolkit has individual applications for both Standardizer and Structure Checker. However, thanks to their extensive APIs, these tools are most often paired with other ChemAxon software: