Development of a Reverse Phase HPLC Retention Index Model for Nontargeted Metabolomics Using Synthetic Compounds.
The MolFind application has been developed as a nontargeted metabolomics chemometric tool to facilitate structure identification when HPLC biofluids analysis reveals a feature of interest. Here synthetic compounds are selected and measured to form the basis of a new, more accurate, HPLC retention index model for use with MolFind. We show that relatively inexpensive synthetic screening compounds with simple structures can be used to develop an artificial neural network model that is successful in making quality predictions for human metabolites. A total of 1955 compounds were obtained and measured for the model. A separate set of 202 human metabolites was used for independent validation. The new ANN model showed improved accuracy over previous models. The model, based on relatively simple compounds, was able to make quality predictions for complex compounds not similar to training data. Independent validation metabolites with feature combinations found in three or more training compounds were predicted with 97% sensitivity while metabolites with feature combinations found in less than three training compounds were predicted with >90% sensitivity. The study describes the method used to select synthetic compounds and new descriptors developed to encode the relationship between lipophilic molecular subgraphs and HPLC retention. Finally, we introduce the QRI (qualitative range of interest) modification of neural network backpropagation learning to generate models simultaneously based on quantitative and qualitative data.