Harmonization of the R-group query search and Markush search

news · 5 years ago
by András Volford, Krisztina Vajda, Norbert Sas
R-groups provide useful functionality for creating query structures with variable functional groups for molecule or reaction targets (tables). Besides query structures, R-groups are also used in Markush structures for representing large numbers of specific chemical structures in one complex Markush structure. In Markush structures, the occurrence of the R-atoms are definitely determined. A Markush structure represents all those specific structures in which all R-atoms are substituted according to their definitions.

Structures with R-groups in more detail

R-group structures consist of a scaffold, R-group definitions, and R-logic conditions. R-group_1

R-group structure as query structure

When using this R-group structure as query structure, we are not only matching the scaffold and the ligands defined in R-group definitions during the search process, but also the R-logic conditions. “R1 2”  defines the occurrence of R1 substituents; exactly two R1 atom of the scaffold must be substituted with OH and/or OMe. “restH” defines that no other substituent can be present on carbon atoms where R1 atom is attached on the scaffold (restH) except hydrogen. “R2 1” defines the occurrence of R2; exactly one R2 must be susbtituted with F or Cl or Br. In the case the user doesn’t specify R-logic conditions, the default R-logic condition is taken into account during the search. Until version 6.2, the default R-logic occurrence value was:  >0.  

R-group structure with former default R-logic:

R-group_2

This query structure finds those structures where at least one R1 and at least one R2 is substituted with the given ligands.

Markush structures for representing large number of specific chemical structures

Besides query structures, R-group structures are used in Markush structures for representing large numbers of specific chemical structures in one complex Markush structure or library. In Markush structures, the occurrence of the R atoms is definitely determined. A Markush structure represents all those specific structures in which all R-atoms are substituted according to their definitions.

R-group_3

This Markush structure defines structures where all the three R1 and both R2 are substituted with the given ligands.

Harmonizing both behaviours

As you can see, structures with R-groups are interpreted differently depending on whether it is a query or a Markush structure. To eliminate this discrepancy we changed the default value of occurrence range in R-logic to ALL (*), because chemists regard R-group structures as they are used in Markush structures.

New default R-logic occurrence value ALL

Since version 6.2.0, the new R-logic default value is  * (meaning all).

R-group_4

This query structure finds only those structures where all the three R1 and both R2 are substituted with the given ligands.

Possible problem cases

When R-group structures are used as query structures

R-group structures saved in mrv format do not contain explicitly the old default '>0' information, and when you open them with JChem/Marvin 6.2.0, the new default occurrence value '*'  will be present. For restoring the search behavior (e.g., expected hits/no-hits with former versions) the '*' (unspecified) must be changed to '>0' .

MarvinSketch / Structure>Attribute>R-logic

R-group_5

There is no need of any modification in case of RGfile format because this format explicitly saves the ’>0’ information.

When R-group structures are stored in or imported to Markush library database tables

If your database tables already contain R-group structures imported from RGfiles or you would like to import structures from RGfiles having the old default value, ’>0’  explicitly stored, since JChem version 6.2.0, this old default ’>0’ occurrence range data will be treated as the new default occurrence range ’*’.

If the R-group structures were originally imported from mrv file, there is no need for any special action. As previously mentioned R-group structures saved in mrv format do not explicitly contain the old default '>0' information, so the new default occurrence value '*'  will be present.