JChem Base, JChem Oracle Cartridge: Speed-up of combined searches

news · 6 years ago
by Gábor Hornyák

New improvements in our search algorithm result speed-up of search cases when chemical structure search is combined with non-structural criterion.

Combined, structural and non-structural compound query in SQL using JChem Oracle Cartridge  (JOC) can be defined as:

SELECT * FROM tableName WHERE jc_compare(‘CCC’, cd_structure, ‘t:s’) AND id<4000.

In this case the Oracle optimizer decides the priority of the evaluation. Oracle usually uses one index to execute a part of the query and the other part is running in function or row scan mode. Since JChem’s jc_compare function is rather slow in function mode, this may yield poor performance.

To  execute similar combined compound queries in JChem Base the filterquery option is provided, which is also available in JOC

SELECT * FROM tableName WHERE jc_compare(‘CCC’, cd_structure, ‘t:s; filterQuery: SELECT tableName.rowid FROM tableName WHERE id<4000’)

Using the filter query option the Oracle’s optimizer is not used. The JChem server always executes the chemical search in index mode and merges its result with the result of the non-chemical filter.

Combined queries can be divided into four types:

  1. The chemical search returns many hits (non-selective) the non-chemical part is selective.
  2. The chemical filter criterion is selective and the non-chemical filter is not selective.
  3. Both parts are not selective.
  4. Both parts are selective.

Before version 15.1.26, JChem was working in so called pre-filter mode. First, the filter query (non-chemical filter) was executed and then the chemical search part was running on the filtered set one by one. This works fine in many cases, but if the filter defined in the filterquery option is not selective enough (type 2) this execution order is not optimal.

From version 15.1.26 we have changed this behavior. Not only the pre-filter mode (when the non-chemical filters are running first) available but also the opposite case, the so called post-filter mode, when chemical search part is running first and then the filter query (non-chemical filters) is executed.

Thanks to the post-filter mode, type 2 queries can be substantially faster (up to 3-4 times according to our measurements). Type 3 queries require moving lots of data between Oracle and JChem server so you should avoid this case if possible. Type 4 queries are the best in terms of performance. As a general rule try to make your non-chemical filter criterion selective.