What Structures are Claimed in Patents? - Use cases for patent literature MMP transformations
Medicinal chemistry transformations from patent literature
Designing and optimizing novel drugs require both creativity and knowledge. Using the Matched Molecular Pairs method is one way of supporting this process. Commonly, MMP is used to connect structural changes of drug molecules to corresponding changes in assay readouts (Figure 1).
The MMP method was used to extract all synthetically available transformations described in the patent database SureCHEMBL. Accordingly, it is possible to get an overview of how often medicinal chemists have used certain transformations, irrespective of their optimization parameters (Table 1).
SureChEMBL is a publicly available large-scale resource containing compounds extracted from the full text, images and attachments of patent documents. The data are extracted from the patent literature according to an automated text and image-mining pipeline, updated on a daily basis.
Numbers |
Data points used to create MMP set |
61 |
Years of deposited patent applications |
600 MB |
of text |
1.35 M |
patent applications |
20 M |
exemplified compounds |
1.4 M |
Unique transformations |
1-20 |
Transformation size (#atoms) |
> 1000 |
Transformations with >1000 examples |
>50.000 |
Transformations with > 50 examples |
Table 1. Data behind extracted transformations
Figure 2. Number of occurrences and a few selected examples (orange bars) from the top 300 transformations (blue bars) in small molecule drug discovery projects, extracted from SureCHEMBL
Use cases for patent literature MMP transformations
The MMP transformations from SureCHEMBL can be used in different ways to create analogues to a seed compound:
- Based on the most common transformations [2]: automatic creation of compounds that are “expected” to be made in a project – making sure you don’t forget any.
- Based on the least common transformations: creation of analogues that are “unexpected” – compounds a medicinal chemist would not immediately think about, but could increase the chance of creating novel compounds
These analogues can then be filtered through any additional virtual screening cascade prior to selection for synthesis (Figure 3).
Figure 3. Example of workflows applying SureCHEMBL MMP transformations for creation of Design Sets
Download the 500 most common transformations from SureCHEMBL
—
[1] Hussain and Rea, Journal of Chemical Information and Modeling 2010 50 (3), 339-348