Standardization

To ensure that search results are correct, query and target (database) molecules must share a similar representation. JChem Standardizer can be used to bring the two molecules involved in searching to this common format based on a configuration xml or action string.

Aromatization

Aromatization is the most important standardization step for searching. It converts different resonance forms of aromatic systems to a canonical aromatic representation by turning alternating single and double bonds to aromatic bond type. For the theory of aromaticity detection, see this link.

Aromatization can be performed either by directly calling MoleculeGraph.aromatize(true) function or by the help of the "aromatize" action of Standardizer. This has to be done manually when you use the MolSearch class, but is done in the high level StandardizedMolSearch and JChemSearch classes in case of default standardization. If you use SMARTS queries and are really keen on Daylight compatibility, using a daylight compatible aromatization may be important for you. In this case see MoleculeGraph.aromatize(int).

Special care has to be taken when you are assembling a custom standardization configuration. In this case the "aromatize" action should be present in the configuration, and it is safest to put it first.

Furthermore, the alternating single or double bond representation of aromaticity should be used with caution in queries. It cannot be used to describe only part of an aromatic ring, because in this case aromatization cannot be performed.

Table 1.

  target
query

Standardization in the database

In JChem databases, standardization is done automatically using the table standardizer configuration.

The database molecules are standardized during structure import into a JChem table (and also during structure update). First the original source of the chemical structure is stored in the cd_structure field, which can then be used for displaying and export purposes. The standardized form is then stored in the cd_smiles field in a compact format. This representation is used by the search process. All additional structure-dependent data (fingerprints, molecular weight and formula, Chemical Terms calculated columns) are also calculated from the standardized form. In case of JChem index in the Cartridge, this process is done during index creation (and during structure insert/update in an indexed structure column), and the standardized form is stored within the index.

Query structures are standardized automatically before the search.

There are two types of standardization in the database:

Back to index page