Enhanced Stereochemistry Features - How to Use Them, and Handle Unknowns and Mixtures

Posted by
Christopherson
on 13 09 2021

Enhanced Stereochemistry Features - How to Use Them, and Handle Unknowns and Mixtures

While the V3000 version of MDL’s molfile specification is not necessarily new in the field of cheminformatics, debuting as early as 1995, its adoption and extensive use of the accompanying enhanced stereochemical representations remains a work in progress for many institutions and end users. In addition to the typical backend and migration challenges faced when moving to a different structural filetype, the V3000 format provides the additional hurdle of confronting the end user with additional decisions to make and details to add to their structure. In previous file types, information such as mixtures or uncertainty about a structure was represented by a number of different techniques, including drawing multiple structures, extensive contextual comments or using a flat bond in the context of an uncertain stereochemical center. The use of the V3000 enhanced stereochemical representation aims to overcome these issues in a more elegant fashion, but requires user training in the proper way to use the new atom labels. In this piece, we aim to provide an overview of the proper use of enhanced stereochemical labels, both in the context of ChemAxon products and cheminformatic workflows as a whole. If this article provides useful insights, a more in-depth set of tutorial slides is available at the bottom of the article.

A note on the ABSOLUTE label

The absolute label is one that can be added to a chiral center to denote that it is unambiguously a pure sample of the drawn stereoisomer. This is equivalent to the “Chiral Flag” in the earlier V2000 specification of the molfile, albeit at the level of the particular chiral center, and not generalized to the entire molecule. The necessity of using this label may be a point of contention; while the official MDL specification states that any chiral center not featuring an ABS label is not pure/some form of unknown, many practicing chemists (as well as the IUPAC recommendations) state that one should assume a pure stereoisomer when no further information is added. For the purposes of this document, we will consistently add the ABS label for the sake of completeness, however most ChemAxon applications have a default configuration setting of “assume absolute stereochemistry”.

Illustration 1

Absolute label - Pure unknowns and the use of the OR label

The OR label by itself handles a simple problem; what to do when you have a pure substance, but are unsure of the exact nature of the stereochemical center. For example, you may have been shipped a compound as part of a larger library, which you know is pure, but the vendor failed to attach stereochemical information for one of the structures. Many readers may previously have indicated such uncertainty through the use of all flat, or in some cases wavy, bonds around the stereocenter. The OR label removes two additional sources of ambiguity from such a depiction; the possibility of representing a mixture by using such vague bonds, or the possibility of representing that we know nothing about the nature of the steric center. By properly applying the OR label, not only have we unambiguously defined our structure, but we have also clearly communicated the depth of knowledge we possess regarding sample characterisation.

Or label

Illustration 4

The structure above, indicates one of the 4 structures below

Illustration 5Illustration 6Illustration 2Illustration 3

Mixed samples and the use of the AND label

Your product contains a mixture of stereoisomers, but you need to register it now. How do we draw a mixture? This problem more commonly occurs following in-house synthesis with incomplete product stereoselectivity. Similarly to what was explained above for the OR use case, drawing a flat bond is probably not the best solution here, as it can lead to ambiguity and is in some scenarios entirely unacceptable.

And label

Illustration 7

In this case, the AND label can be used to clearly communicate that both isomers are present. In doing so, we have depicted 2 different chemical structures in only one diagram. This may not seem like a substantial savings in effort, but when you consider the exploding number of depictions needed to communicate such a mixture when applied to compounds with two, three, four or more stereocenters, simply adding a few labels becomes highly preferable.

Diastereomers and the use of multiple labels

Following the synthesis of compounds containing two stereochemical centers, a mixture of 4 diastereomeric compounds may be obtained. Typically, an initial achiral separation takes place. This leads to two samples, each containing an enantiomeric mixture of 2 of the 4 possible diastereomers. How can we represent the different samples accurately in one sketch? Appropriate use of multiple labels helps us to depict the compounds at all stages of the separation. Two AND labels, one at each stereocenter, will suffice in this case. In the case of the initial mixture of 4 compounds, giving different numerical values to the AND labels shows that they are independent of each other. Following achiral separation, AND labels with the same values indicate that the two steric centers are relative to each other, and can only change in concert.

Finally, once chiral separation has been performed and pure structures are obtained, using multiple OR labels (OR indicates purity) with the same numerical value will help in indicating that a sample is one of a pair of pure enantiomers (but which of the two is not known). There might be additional labels that you wish to add to your structures, such as the order of elution. Such arbitrary labels are not natively supported by the V3000 format, however they are often available in certain applications. For example, ChemAxon’s Compound Registration provides the Chemically Significant Text (CST) field, in which additional labels can be added either arbitrarily or from a pick list. This is also useful in situations where institutional rules require a certain method of annotating information, or where enhanced stereochemistry cannot satisfactorily describe the structure. It should, however, be used with care; arbitrary annotations that seem obvious to one scientist may not be obvious to another, and methods of annotating may not be compatible across applications.

Mixing multiple labels in a database query

You want to search your compound database using enhanced stereochemistry labels, but aren’t sure how to narrow down the search appropriately. A key here is understanding the settings your chemical search engine is applying to the enhanced stereochemical features. This is especially true when entering a structure with multiple AND/OR combinations, as the one query structure represents several different concrete compounds. In a default ChemAxon JChem configuration, for instance, any compound that is a part of the chemical space allowed by the query will be matched. Additionally, there may be multiple different query structures (with different combinations of AND/OR labels) that will return the same hits, which may be a benefit or a hindrance.

In the presentation attached below, we dive in more depth into the fundamentals of using enhanced stereochemistry labels, as well as providing a few in depth questions to test your abilities.

While the V3000 version of MDL’s molfile specification is not necessarily new in the field of cheminformatics, debuting as early as 1995, its adoption and extensive use of the accompanying enhanced stereochemical representations remains a work in progress for many institutions and end users. In addition to the typical backend and migration challenges faced when moving to a different structural filetype, the V3000 format provides the additional hurdle of confronting the end user with additional decisions to make and details to add to their structure. In previous file types, information such as mixtures or uncertainty about a structure was represented by a number of different techniques, including drawing multiple structures, extensive contextual comments or using a flat bond in the context of an uncertain stereochemical center. The use of the V3000 enhanced stereochemical representation aims to overcome these issues in a more elegant fashion, but requires user training in the proper way to use the new atom labels. In this piece, we aim to provide an overview of the proper use of enhanced stereochemical labels, both in the context of ChemAxon products and cheminformatic workflows as a whole. If this article provides useful insights, a more in-depth set of tutorial slides is available at the bottom of the article.

A note on the ABSOLUTE label

The absolute label is one that can be added to a chiral center to denote that it is unambiguously a pure sample of the drawn stereoisomer. This is equivalent to the “Chiral Flag” in the earlier V2000 specification of the molfile, albeit at the level of the particular chiral center, and not generalized to the entire molecule. The necessity of using this label may be a point of contention; while the official MDL specification states that any chiral center not featuring an ABS label is not pure/some form of unknown, many practicing chemists (as well as the IUPAC recommendations) state that one should assume a pure stereoisomer when no further information is added. For the purposes of this document, we will consistently add the ABS label for the sake of completeness, however most ChemAxon applications have a default configuration setting of “assume absolute stereochemistry”.

Illustration 1

Absolute label - Pure unknowns and the use of the OR label

The OR label by itself handles a simple problem; what to do when you have a pure substance, but are unsure of the exact nature of the stereochemical center. For example, you may have been shipped a compound as part of a larger library, which you know is pure, but the vendor failed to attach stereochemical information for one of the structures. Many readers may previously have indicated such uncertainty through the use of all flat, or in some cases wavy, bonds around the stereocenter. The OR label removes two additional sources of ambiguity from such a depiction; the possibility of representing a mixture by using such vague bonds, or the possibility of representing that we know nothing about the nature of the steric center. By properly applying the OR label, not only have we unambiguously defined our structure, but we have also clearly communicated the depth of knowledge we possess regarding sample characterisation.

Or label

Illustration 4

The structure above, indicates one of the 4 structures below

Illustration 5Illustration 6Illustration 2Illustration 3

Mixed samples and the use of the AND label

Your product contains a mixture of stereoisomers, but you need to register it now. How do we draw a mixture? This problem more commonly occurs following in-house synthesis with incomplete product stereoselectivity. Similarly to what was explained above for the OR use case, drawing a flat bond is probably not the best solution here, as it can lead to ambiguity and is in some scenarios entirely unacceptable.

And label

Illustration 7

In this case, the AND label can be used to clearly communicate that both isomers are present. In doing so, we have depicted 2 different chemical structures in only one diagram. This may not seem like a substantial savings in effort, but when you consider the exploding number of depictions needed to communicate such a mixture when applied to compounds with two, three, four or more stereocenters, simply adding a few labels becomes highly preferable.

Diastereomers and the use of multiple labels

Following the synthesis of compounds containing two stereochemical centers, a mixture of 4 diastereomeric compounds may be obtained. Typically, an initial achiral separation takes place. This leads to two samples, each containing an enantiomeric mixture of 2 of the 4 possible diastereomers. How can we represent the different samples accurately in one sketch? Appropriate use of multiple labels helps us to depict the compounds at all stages of the separation. Two AND labels, one at each stereocenter, will suffice in this case. In the case of the initial mixture of 4 compounds, giving different numerical values to the AND labels shows that they are independent of each other. Following achiral separation, AND labels with the same values indicate that the two steric centers are relative to each other, and can only change in concert.

Finally, once chiral separation has been performed and pure structures are obtained, using multiple OR labels (OR indicates purity) with the same numerical value will help in indicating that a sample is one of a pair of pure enantiomers (but which of the two is not known). There might be additional labels that you wish to add to your structures, such as the order of elution. Such arbitrary labels are not natively supported by the V3000 format, however they are often available in certain applications. For example, ChemAxon’s Compound Registration provides the Chemically Significant Text (CST) field, in which additional labels can be added either arbitrarily or from a pick list. This is also useful in situations where institutional rules require a certain method of annotating information, or where enhanced stereochemistry cannot satisfactorily describe the structure. It should, however, be used with care; arbitrary annotations that seem obvious to one scientist may not be obvious to another, and methods of annotating may not be compatible across applications.

Mixing multiple labels in a database query

You want to search your compound database using enhanced stereochemistry labels, but aren’t sure how to narrow down the search appropriately. A key here is understanding the settings your chemical search engine is applying to the enhanced stereochemical features. This is especially true when entering a structure with multiple AND/OR combinations, as the one query structure represents several different concrete compounds. In a default ChemAxon JChem configuration, for instance, any compound that is a part of the chemical space allowed by the query will be matched. Additionally, there may be multiple different query structures (with different combinations of AND/OR labels) that will return the same hits, which may be a benefit or a hindrance.

In the presentation attached below, we dive in more depth into the fundamentals of using enhanced stereochemistry labels, as well as providing a few in depth questions to test your abilities.