To be most successful, which type of model should be built?
The obvious answer is all models that are prospectively predictive. Also, it is very valuable if the models are interpretable, to facilitate design of new compounds with improved properties.
Another important parameter is the size of the applicability domain. A large domain implies that the input compounds can be highly diverse, but still recognized and correctly predicted by the model, thus allowing for extrapolating into a more promising chemical space.
How would a medicinal or computational chemist know which type of model to build, to be most successful?
Usually, there is a balance between applicability domain and predictive power, where a larger domain (a global model) often comes at the expense of less accurate predictions, which are easier to achieve with a more local model (Figure 1).
Figure 1. The size of the applicability domain of a global and a local model is relative and depends on the training set used. The key aspect is not the domain size but rather whether the model is valid for the molecules of interest.
Figure 2. Integration overview. Validated, production-grade predictions can be made available as a Design Hub plugin to foster selecting the most viable idea molecules and novel designs (Fig. 2.).
Simulation of (Fig. 3.) local and global model building with re-training and comparison.
Figure 3. Data preparation workflow
Figure 4. Local and global model performance on external data
Figure 5. Performance tested on scaffold analogues
Figure 6. Re-trained model performance tested on second round of scaffold analogues
—
[1] NCATS Parallel Artificial Membrane Permeability Assay (PAMPA) (1508612)