R-group decomposition is a special kind of substructure search that aims at finding a central structure - scaffold - and identify its ligands at certain attachment positions. The query molecule consists of the scaffold and ligand attachment points represented by R-groups. These R-groups are simple R-group atoms without R-group definitions in most cases. An example query structure is shown below:
![]() |
Note, that there are two R1 atoms referring to symmetrical ligand positions.
By default, this means that the matching ligands should be identical. You can change this
behavior by setting the --skip-same-structure-check parameter, in which case
the same RGroup index is only used to denote symmetrical positions on the scaffold and
different ligands at these positions are also accepted.
Ligand attachments are allowed at implicit H atoms,
but these attachment points are not stored and are not shown in the output.
To allow attachments only at R-group positions, you should add explicit H atoms
in place of all implicit hydrogens. This can also be done automatically by the
H command line option. The resulting query structure is shown below:
![]() |
To achieve just the opposite, there is another query transformation option
that adds R-groups in place of implicit hydrogens, allowing and storing all attachments
at implicit hydrogens. This is useful if you are interested in all scaffold-ligand attachments.
In this case you do not need to add R-group connections manually to your original query
if you use the R command line option. The resulting query structure is shown below:
![]() |
As an example, take the following targets:
![]() |
Decompositions using the different query options (no modification, hydrogenize, add R-groups)
are shown below. By default, decomposition is generated for the first hit only. To process all
hits, set the --allHits option.
Standardization may be necessary, the aromatization task is usually needed: substructure search
requires aromatized query and target structures and also assumes that the same functional group
representation is used in the query and the target molecules (e.g. nitro-groups,
also think of tautomer and mesomer forms). If your input file format contains the non-aromatized
form of the molecules (e.g. SDF) then aromatization should be specified. Standardization can be
specified in the --standardize option.
The following examples show some decomposition tables that can be
obtained by running the rgdecomp command line tool or directly using the
R-group Decomposition API.
Ligand attachment points are represented by a connection to an any-atom in the scaffold,
atom color codes are defined in Colors.ini and coloring data is
stored in the molecule property "DMAP". In the examples below, we choose MRV output format
so that this color data can be stored in an MRV tag and we sepcify this tag name with the
color definition file when running mview. To get a nice table output, we also
specify the number of columns in the -c parameter.
Alternative decomposition output styles for the above query and targets are shown later.
To run these examples, refer to the preparation instructions.
rgdecomp -S "aromatize" -q query.mol targets.sdf -a P -f sdf:-a -o result.sdf mview -t DMAP -p Colors.ini -c 4 -r 4 result.sdfYou can also pipe the output of
rgdecomp directly to mview
under Linux/Unix systems:
rgdecomp -S "aromatize" -q query.mol targets.sdf -a P -f sdf:-a | mview -t DMAP -p Colors.ini -c 4 -r 4 -
Note, that we have to set aromatization since our molecules are in dearomatized form (SDF).
To store the results in dearomatized form, we have to specify dearomatization in the output format:
-f sdf:-a. By default, attachment points are denoted by newly added any-atoms,
since this can be stored in any output format. We have chosen attachment point representation instead
by setting -a P.
![]() |
rgdecomp -S "aromatize" -m H -q query.mol targets.sdf -a P -f sdf:-a -o resultH.sdf mview -t DMAP -p Colors.ini -c 4 -r 4 resultH.sdfObserve, that now the query only matches the first target.
![]() |
rgdecomp -S "aromatize" -m Rs -q query.mol targets.sdf -a P -f sdf:-a -o resultR.sdf mview -t DMAP -p Colors.ini -c 6 -r 4 resultR.sdf
![]() |
Note, that now there can be a lot of hits with most R-groups matching single hydrogens,
but decomposition is generated for the first hit only by default. You can see all
possibilities by adding the --allHits option:
rgdecomp -S "aromatize" -m Rs -q query.mol targets.sdf -a P -f sdf:-a --allHits | mview -t DMAP -p Colors.ini -c 6 -r 4 -
You can allow the two R1 query nodes to match different ligands by setting the
-p option:
rgdecomp -S "aromatize" -m Rs -p -q query.mol targets.sdf -a P -f sdf:-a --allHits | mview -t DMAP -p Colors.ini -c 6 -r 4 -
Usage: rgdecomp [options] -q <query file/string> [target file(s)/string(s)]
Prepare the usage of the rgdecomp script or batch file
as described in Preparing the Usage of JChem
Batch Files and Shell Scripts.
Search options are identical to that of jcsearch.
In this section we describe the R-group decomposition specific command line options.
Options:
-h, --help this help message
Input options:
-q, --query <query> query SMARTS string or file
-m, --query-modification <H|Ra|Rs> query modification options:
H: add explicit hydrogens
Ra, Rs: attach unique rgroup nodes
in place of missing bonds
Ra: with any-bonds
Rs: with single-bonds
-S, --standardize <file/string> standardize query and target
according to configuration file/string
-g, --ignore-error continue with next molecule on error
Output options:
-a, --attachment-symbol <N|P|A|M|L> attachment symbol on ligands:
N: none
P: attachment point
A: any-atom (default)
M: atom map
L: atom label
-s, --style <HTS> output style (multiple choice):
H: include header (default)
T: include target (default)
S: include scaffold
-p, --skip-same-structure-check allow different structures
matching identical rgroup nodes
-i, --id <ID field> ID field in target,
to be displayed in SMILES table output
set '=' to display target index as ID
-f, --format <format> output file format:
SMILES table if omitted,
molecule series output otherwise
-o, --output <filepath> output file (default: standard output)
Search options:
-A, --allHits process all hits
...
Query and target standardization can be specified in the
--standardize option: the
standardization configuration is given either
directly in a simple action string
or as a configuration XML file path.
Note, that substructure search requires aromatized molecules, therefore if your input file
format does not support the aromatized form (e.g. SDF) then you definitely have to specify
-S "aromatize" as a minimum.
We can require query modification by setting the --query-modification option:
H for hydrogenize: forces ligand attachments being at R-group positions
Ra for adding R-groups: allows and stores all scaffold-ligand any-bond attachments
Rs for adding R-groups: allows and stores all scaffold-ligand single-bond attachments
If the query has no R-group nodes then the Rs modification is applied automatically.
We can set the attachment symbols by the --attachment-symbol option:
N: none
P: attachment point - a small mark besides the attachment atom
A: any-atom (default) - an any-atom is attached to the attachment atom
representing the connection to the scaffold
M: atom map representing the corresponding R-group index
L: an atom label representing the corresponding R-group index
Note, that the default any-atom representation and atom maps can be exported in all molecule file formats, while attachment point is not available in SMILES and atom labels are only supported in MRV.
We can set the output format in the --format output option. The output is
--id parameter
mview with appropriate options
defining the color palette, the color symbol molecule property name and the number of table columns
In both cases, data included in the output can be specified in the
--style option (set any combination of the following letters):
H: include query header
T: include targets
S: include scaffold
HT.
In case when the query contains R-group nodes with the same R-group indexes, these nodes
represent identical ligand structures by default. If we set the
--skip-same-structure-check option then we allow different structures to match
these nodes. In this case the identical R-group indexes represent symmetrical attachment
positions on the scaffold and have no implication for the matching target structures.
By default, only one decomposition for each target corresponding to the first search hit is
presented in the output. If the rgdecomp command line option --allHits
is specified, then all possible decompositions are listed.
If the command line parameter --ignore-error is specified, then import/export errors
will not stop the processing but the error is written to the console and the molecule is skipped.
By default, the program exits in case of molecule import/export erros.
To run these examples:
PATH (all systems) and the JCHEMHOME (under Windows)
environment variables have to be set as described in the
Preparing and Running JChem's Batch Files and
Shell Scripts manual.
RGroupDecomposition_files subdirectory.
cd jchem/doc/user/RGroupDecomposition_filesIn Windows:
cd jchem\doc\user\RGroupDecomposition_files
In the following examples we use the query and targets
from the introduction. You can type these examples and see the results yourself
in the subdirectory RGroupDecomposition_files where you can find the input files
query.mol and
targets.sdf.
-f parameter is specified):
rgdecomp -S "aromatize" -q query.mol targets.sdf
Clc1cc(c(c(c1)[*:1])[*:2])[*:1] [*:1] [*:1] [*:2] CCC(N)c1cc(Cl)cc(C(N)CC)c1Br CCC(N)* CCC(N)* Br* Oc1c(Cl)cc2CCCC3CCCc1c23 *CCCC(*)CCC* *CCCC(*)CCC* *CCCC(*)CCC* CC(c1cc(Cl)c(O)c(c1)C(C)c2cc(Cl)cc(C)c2Br)c3cc(Cl)cc(C)c3Br CC(*)c1cc(Cl)cc(C)c1Br CC(*)c1cc(Cl)cc(C)c1Br *[H]
R1 query node matching
different ligands, displaying target index in ID column:
rgdecomp -S "aromatize" -p -q query.mol targets.sdf -i = --allHits
ID Clc1cc(c(c(c1)[*:1])[*:2])[*:1] [*:1] [*:1] [*:2] 1 CCC(N)c1cc(Cl)cc(C(N)CC)c1Br CCC(N)* CCC(N)* Br* 2 Oc1c(Cl)cc2CCCC3CCCc1c23 *CCCC(*)CCC* *CCCC(*)CCC* *CCCC(*)CCC* 3 CC(c1cc(Cl)c(O)c(c1)C(C)c2cc(Cl)cc(C)c2Br)c3cc(Cl)cc(C)c3Br CC(*)c1cc(Cl)cc(C)c1Br CC(*)c1cc(Cl)cc(C)c1Br *[H] 3 CC(c1cc(Cl)c(O)c(c1)C(C)c2cc(Cl)cc(C)c2Br)c3cc(Cl)cc(C)c3Br CC(*)c1cc(Cl)c(O)c(c1)C(C)c2cc(Cl)cc(C)c2Br C* Br* 3 CC(c1cc(Cl)c(O)c(c1)C(C)c2cc(Cl)cc(C)c2Br)c3cc(Cl)cc(C)c3Br CC(*)c1cc(cc(Cl)c1O)C(C)c2cc(Cl)cc(C)c2Br C* Br* 4 CC(c1cc(CN)cc(Cl)c1O)c2cc(Cl)cc(C)c2Br CC(*)c1cc(Cl)cc(C)c1Br NC* *[H] 4 CC(c1cc(CN)cc(Cl)c1O)c2cc(Cl)cc(C)c2Br CC(*)c1cc(CN)cc(Cl)c1O C* Br*
rgdecomp -S "aromatize" -p -q query.mol targets.sdf -i ID --allHits
ID Clc1cc(c(c(c1)[*:1])[*:2])[*:1] [*:1] [*:1] [*:2] id1 CCC(N)c1cc(Cl)cc(C(N)CC)c1Br CCC(N)* CCC(N)* Br* id2 Oc1c(Cl)cc2CCCC3CCCc1c23 *CCCC(*)CCC* *CCCC(*)CCC* *CCCC(*)CCC* id3 CC(c1cc(Cl)c(O)c(c1)C(C)c2cc(Cl)cc(C)c2Br)c3cc(Cl)cc(C)c3Br CC(*)c1cc(Cl)cc(C)c1Br CC(*)c1cc(Cl)cc(C)c1Br *[H] id3 CC(c1cc(Cl)c(O)c(c1)C(C)c2cc(Cl)cc(C)c2Br)c3cc(Cl)cc(C)c3Br CC(*)c1cc(Cl)c(O)c(c1)C(C)c2cc(Cl)cc(C)c2Br C* Br* id3 CC(c1cc(Cl)c(O)c(c1)C(C)c2cc(Cl)cc(C)c2Br)c3cc(Cl)cc(C)c3Br CC(*)c1cc(cc(Cl)c1O)C(C)c2cc(Cl)cc(C)c2Br C* Br* id4 CC(c1cc(CN)cc(Cl)c1O)c2cc(Cl)cc(C)c2Br CC(*)c1cc(Cl)cc(C)c1Br NC* *[H] id4 CC(c1cc(CN)cc(Cl)c1O)c2cc(Cl)cc(C)c2Br CC(*)c1cc(CN)cc(Cl)c1O C* Br*
rgdecomp -S "aromatize" -m H -a M -s HTS -q query.mol targets.sdf
[H]c1c(Cl)c([H])c(c(c1[*:1])[*:2])[*:1] [H]c1cccc([H])c1Cl [*:1] [*:1] [*:2] CCC(N)c1cc(Cl)cc(C(N)CC)c1Br Clc1ccccc1 CC[CH2:1]N CC[CH2:1]N [BrH:2]
R1
query node matching different ligands, showing results in MView:
rgdecomp -S "aromatize" -p -q query.mol targets.sdf -a P -f mrv:-a --allHits -o result3.mrv mview -t DMAP -p Colors.ini -c 4 -r 5 result3.mrvYou can also pipe the output of
rgdecomp directly to mview
under Linux/Unix systems:
rgdecomp -S "aromatize" -p -q query.mol targets.sdf -a P -f mrv:-a --allHits | mview -t DMAP -p Colors.ini -c 4 -r 5 -Note, that by specifying MRV output format in the
-f parameter we automatically
switch to molecule series output as default output style and also enable the storage of atom
color data if the output format is capable of storing molecule fields (as e.g. SDF and MRV).
Atom color data is stored in the DMAP MRV tag and the color palette is
defined in Colors.ini. We also specify the
number of table columns in the mview option -c. The decompositions
of the third and fourth target molecules are shown below:
![]() |
rgdecomp -S "aromatize" -p -m H -s T -q query.mol targets.sdf -a P -f sdf:-a --allHits -o result4.sdf mview -t DMAP -p Colors.ini -c 4 -r 4 result4.sdfWith piping:
rgdecomp -S "aromatize" -p -m H -s T -q query.mol targets.sdf -a P -f sdf:-a --allHits | mview -t DMAP -p Colors.ini -c 4 -r 4 -
The result is shown below:
![]() |
rgdecomp -S "aromatize" -p -m Rs -a P -q query.mol targets.sdf -f sdf:-a -o result5.sdf mview -t DMAP -p Colors.ini -c 6 -r 5 result5.sdfWith piping:
rgdecomp -S "aromatize" -p -m Rs -a P -q query.mol targets.sdf -f sdf:-a | mview -t DMAP -p Colors.ini -c 6 -r 5 -
The result is shown below:
![]() |