Stereochemistry in Compound Registration

tutorial · 4 days ago

The JChem engines are heavily used throughout the cheminformatics industry to handle chemical representation and searching in end user applications such as ELNs, registries, etc. A number of different settings regarding many of the features to be discussed are available, and it is impractical to provide an exhaustive list of examples outside of the documentation.

As such, this piece focuses on the settings and handling used by Chemaxon’s Compound Registration application. If you are using a different application that also runs on a JChem engine, much of the information may be true; however you should verify this independently.

A basic familiarity with Chemaxon’s Compound Registration as well as the use and purpose of enhanced stereochemistry labels is recommended for readers.

It is important to note that Compound Registration has a default setting of “Assume Absolute Stereochemistry”. This means that old V2000 “Chiral” flags are not necessary, and for V3000 representations the “Abs” label is assumed.

A number of the listed functionalities are associated with particular user permissions. If you cannot find a particular option/setting/view, the suggested first course of action is to contact your application administrator and determine whether a lack of permissions is the cause.

A good illustration of the basic functionality is shown by the below compounds and their behaviour upon registration.

The only two compounds that will be registered to the same parent ID are the compound marked absolute and the wedge-bonded compound without any additional flags.

Flat bonds

Many users initially have questions about why a flat bond is registered separately from OR or AND wedge bonds. Flat bonds on stereo centers are commonly used in a number of different ways; most commonly to represent either racemic mixtures, or unknown or unspecified stereochemistry.

In Compound Registration, we aim to take the safest route that does not assume a specific position on this, as it largely depends on the specifications of end-use organizations that have grown historically. In order to achieve this it is necessary to enforce the separate registration of structures containing flat bonds at stereocenters and those with wedged AND labels.

If the adoption of the enhanced stereochemistry labels is preferred, we recommend to use flat bonds only to indicate unknown or unspecified stereogenic centers, while an AND label shows that we know both configurations are present.

If the use of flat bonds or AND labels is not desired, it is suggested to use the integrated standardization or structure fixing actions to flag or alter undesired features during registration.

OR labels communicate that we have a single configuration present (we just don’t yet know which one), which does not match a number of the common flat bond use cases (unknown, unspecified or racemic mixture), so it is inappropriate to match these structures during registration.

In Advanced Registration mode, you can use a manual Structure Fixer to indicate that a particular isomer is the major form present. This is discussed in the section dedicated to Advanced Registration.

You can also register structures as Mixtures or Formulations, instead of Single Structures.

Formulation can be used when defined amounts of substances are present together, and its use should be limited to when the chemist has control over these ratios.

Mixtures can be used to represent different structures that are present with variable composition.

This can, for example, be the outcome of a synthetic process over which the chemist does not have strong control.

These special registration types should only be used as a last resort when using a single structure with the enhanced features is still incapable of creating a satisfactory representation.

OR structure wedge types

When depicting a compound using one or more OR labels, users often ask about the best practice in choosing dash versus solid wedge bonds. If there is only one stereogenic center present, both the structure with the wedge bond and the structure with the dashed bond represent the same possible structures.

In Compound Registration, there is no distinction between the two wedge bond options when used with an OR label, so the choice on which option to use lies solely with the end user. The caveat to this is that the types of wedge bonds in structures with multiple ORs may be important - this is addressed below.

OR structure replicates

Many people ask why replicates of an OR structure should register with different parent compound numbers.

The representation covers the same chemical space in duplicates, so why not assign them to the same parent? The OR label indicates a degree of unknownness --- the stereochemical configuration at that center cannot be fully specified.

We have to consider the chance that further investigation at a later date will lead to a fuller structure elucidation which discriminates against the stereochemical structure.

It would be problematic to most organizations if the same compound ID was used for two structures that are later found to be different. On the contrary, it is more acceptable to have multiple IDs for compounds that later turn out to be the same - these are then treated as Alias IDs.

Some organizations have a strong preference for registering compounds with OR labels under the same parent compound in all cases. If this is true for you, then we provide a setting to change this behavior. From the administration interface, navigating to “Chemistry >> General” and setting “Allow isomer identifier generation” to OFF.

It should be noted that this will cause this behavior for every single “OR” compound. There are alternative application features that can be used to match OR compounds on a more selective basis, which will be discussed later.

Multiple OR labels

Some structures may include multiple OR labels. In such a case it is important to consider the numbering that occurs after the OR. Recall from our previous primer on enhanced stereochemistry labels that unequal OR values (e.g. OR1, OR2, or OR4, OR7) mean that the centers are independent of each other, whereas equal OR values (e.g. OR1, OR1, or OR3, OR3) mean that the two centers may only change in tandem.

The consequence of this is that two structures that are otherwise depicted identically would represent a different chemical space.

If the default settings are being used, meaning that “Allow isomer identifier generation” is ON, this makes no difference, since all of the structures will be registered as separate parents anyway. However, if “Allow isomer identifier generation” has been turned OFF as described in the preceding paragraph, the fact that they represent a different chemical space means that they will still be stored under different parent identifiers.

Input standardization upon matching

We are about to describe a number of ways to match compounds being registered to existing ones. It is important to note that if such a matching occurs, only a new lot is created.

If there are differences between the existing and newly registered structures, Compound Registration will register and display the structure that is already stored.

However, the originally drawn structure can still be seen in the compound history.

The originally drawn structure is also preserved in case you chose any entry from the suggestion list to register.

Isomer Numbers and Chemically Significant Text

There are several ways to force matches for compounds with OR labels. The main ones are the matching of ISOMER labels that are appended to the compounds, adding and matching chemically significant text fields, or using advanced registration mode in the case of single structure registration.

We have previously analyzed the behavior of Compound Registration to save compounds that have OR labels under separate Parent Compounds. Since they are depicted identically, the system adds ISOMER labels to the different compounds to more clearly differentiate them. (ISOMER labels are stored as attached data S-groups. See the Structure Revisions section for information on removing ISOMER information.)

A similar action can be performed, either manually on an individual basis or through field mapping for bulk uploads, using the “Chemically Significant Text” (CST) field.

These fields are taken into account along with the chemical structure of your compounds when attempting to match compounds during registration.

These features can be used in several other ways, as CST is considered a more general feature than ISOMER numbering. For example, it could be used to differentiate otherwise identical compounds if they come from different sources which the user wishes to discriminate (in house synthesis, external purchase or high throughput synthesis).

CST, may be used to register compounds labeled with an OR flag under the same PCN. In this case, the user needs to think of some identification to apply to the CST (e.g. ‘Primary Isomer’), and use the same CST for subsequent registrations.

Advanced Registration Mode

When performing single structure registration, an advanced mode can be toggled, which enables additional functionality. This mode enables two key features for the current topic.

The first is the Stereo Analyzer, which gives you an overview of how Compound Registration has understood the stereochemistry in your structure. This includes indicating the number of “Resolved Unknown” stereocenters (OR flags) and “Resolved Unknown, known relative” stereocenters (OR flags with the same number). This is useful for those still getting started with enhanced stereochemistry labels to ensure they are indicating such information properly.

The second important feature is the suggested matches popup. Upon registering the compound, similar compounds are shown, and the user is prompted whether they would like to register their structure as a new parent compound, or a new lot of an existing compound.

Another useful feature is the “Major” structure fixer. Using this adds data to the structure indicating that it is the Major stereoisomer. While an exact ratio is not required, the data can be edited to reflect such information if it is available.

Structures with different ranges or major enantiomers indicated will register into distinct trees, generating different compound IDs.

There are cases where you would generally register all OR labelled compounds as separate parents, but would on some occasions want to group them as several lots under the same parent compound. In this case, CST is the preferred method, but if this is unsuitable for you then the above described Advanced Registration Mode is a good workaround. The caveat is that you must register these compounds one-by-one manually.

No Structure

It is also possible to register compounds without a structure in Compound Registration. This is a natural extension of the idea of not having certainty about a stereocenter, but extends it to having little to no certainty about larger parts of the chemical structure.

Again, this is a setting that may need to be changed in your Administration interface (“Administration >> Chemical Structures >> Structure Types” and set “No Structure” to “ON”).

“No Structures” behave such that compounds registered without structural information will be saved under unique parent compound numbers. This is because while you may have multiple compounds without structural information, there is no guarantee that they share structural features.

As with OR structures, however, this can be overcome. Adding chemically significant text or mapping parent compound numbers during bulk upload allows you to force register “No Structure” compounds as lots of existing compounds rather than as new parents.

Structure Revisions

How do I:

Edit a registered structure?

It is straightforward to perform structure revision in Compound Registration. After navigating to the compound’s structure tree, simply select “Edit”, modify the structure, then “Save”. The key point here is to validate whether you are making edits at the appropriate level of the tree.

Assuming no matches for the new parent already exist, a change to the Parent will propagate the change through all versions and lots in the tree, modifying them all, but retaining their IDs.

You may need to be careful when making parent level changes, since these may lead to instances where the additional information (e.g. salt information) in version cannot be properly postulated.

A change to a lot will cause it to be re-registered if the change causes it to now match a different tree. It is registered either as a new lot of an existing parent if a match exists, or under a new parent if no match exists. This re-registration will also lead to a modification of its ID.

Edits to versions perform a combination of the two. Changes are propagated downwards to the lots assigned to the version, and the version and all associated lots will be moved to a new parent.

The exception is if you only make changes to a salt, without changing the compound itself, in which case the version and lots will be updated but remain in the same tree, with their original IDs.

If changes have been made to a structure, you can select “More actions >> View history” to see what the structural and ID changes that were made are.

Reassign a mis-assigned lot to a different parent?

It may be the case that a lot has been assigned to the wrong parent. One example may be if you are using the Advanced Registration Mode to manually assign parents.

If an error was made, you can navigate to that particular lot and select “More Actions >> Move this Lot”. You can then assign it to a different version (and thus parent). This history can be accessed as described previously.

Bulk movement can also be performed by using the “Bulk move lots” feature.

It is important to note that bulk movement can only be performed between single structures which are stereo matches of one another, while the individual lot movement can be used for all structures.


Apply new information to parents as further structure elucidation occurs?

Parents can be updated in the same way as lots; simply navigate to the structure tree and perform your alterations by selecting the “Edit” button. Note that any changes made at the parent level will be propagated to the versions and lots in their tree.

Register OR compounds as separate parents, except for in a few cases where I want to match them?

There are two possible ways to do this. You can either use chemically significant text (CST), or use the advanced mode during registration. CST requires prior knowledge/addition of CST to the existing parent, while advanced registration mode requires you to perform the matching manually.

The two are detailed further in their appropriate sections.


Remove an isomer number and replace it with CST?

ISOMER numbers are saved as “attached data” in the structure field.

You can remove them by entering the structure editing mode, right clicking on the atom where the ISOMER label is attached, navigating to “Attached Data” option and clearing all the data.

Remember that you need to carefully select which level in the tree you wish to edit!

You can then add a CST value as you would normally do.