Chemical Structure Representation Toolkit
Canonicalization and correction of chemical structures
Adding new compounds to an existing chemical database always raises the issue of uniqueness and correctness. When registering a new item, migrating legacy data or uploading a set of compounds to a database; the new entries have to fulfill certain requirements. Chemical structures can be represented in many different ways. These differences affect not only the graphic appearance of the molecules, but can influence more fundamental details of the topology, making compound identification even more problematic. Obviously structural mistakes and errors have to be addressed too.
ChemAxon's chemical structure representation toolkit has two major components: Standardizer, transforming chemical structures into customized, canonical representations; and Structure Checker, offering numerous checkers and fixers to search and correct various structural issues.
Standardizer - canonicalizing chemical structures
Standardizer's main purpose is to transform chemical structures into representations that obey certain chemical business rules to avoid inconsistencies in a chemical database. The tool is typically used in workflows where new compounds are registered, or where structure-based virtual screening is performed. There are 40 pre-defined standardizer actions available that cover among others the following issues:
- Adding or removing explicit Hydrogen atoms
- Neutralizing charged fragments or functional groups
- Recognizing and converting legacy representations of functional groups (like aliases)
- Removing certain fragments (like water and salt counterions)
- 2D cleaning and expanding abbreviated groups
- Unified representation of aromatic rings, tautomers and mesomers
Read about more Standardizer's actions. Besides the pre-defined rules, you can also implement your own ones as well.
Structure Checker - correcting errors in chemical structures
Structure Checker searches molecules for structural problems. In case it finds an issue, the error in a structure is highlighted, and an instant solution is prompted. The reported problem can be fixed automatically by a built-in, prompted fixer or manually by the user. This tool is crucial in filtering out drawing errors and incorrect features when a new compound is registered. More than 40 checkers add up the system, correcting issues like:
- Invalid bond length
- Overlapping bonds or atoms
- Molecule charges
- Incorrect chiral flags
- Invalid valences
- OCR errors
- Substructure checker that can be configured to transform a given substructure defined by a SMARTS string
Read about more checkers. Structure Checker is highly customizable, so adding your own checker is also possible.
Toolkit components can be easily tailored to unique business needs and regulations. Both Standardizer and Structure Checker have a full featured Application Programming Interface (API) in Java and in .NET, making this solution easily integratable with in-house or third-party applications. (Reach out for Standardizer Java API, .NET API; or for Structure Checker Java API, .NET API). Meaning that, with some programming you are able to add your own standardizer actions, checkers, and fixers to your own system. These custom solutions are required usually if specific atoms, functional groups or patterns in the structures need to be removed or replaced with Standardizer; and if you have to handle chemical features that cannot be handeled by the built-in checkers and fixers in Structure Checker.
The Chemical Structure Representation Toolkit has individual applications for both Standardizer and Structure Checker. However, thanks to their extensive APIs, these tools are most often paired with other ChemAxon software:
- Both components play a key role in registration and chemical structure search, so no wonder that Compound Registration and the JChem Engines include the mentioned functionalities.
- Chemical database management relies on the represetation toolkit on desktop (Instant JChem, JChem for Office) and online (JChem for SharePoint, Plexus Suite).
- Marvin Live and Chemicalize use the toolkit's capabilities too.
- Structure checking is an important feature within our Marvin chemical drawing tool.
- Both functionality can be found in workflow tools - like KNIME and Pipeline Pilot
- The toolkit is also available from command line (Standardizer CL and Structure Checker CL).