Data Visualization, Regression, Applicability Domains and Inverse Analysis Based on Generative Topographic Mapping

Posted by
Hiromasa Kaneko
on 2019-09-12

Data Visualization, Regression, Applicability Domains and Inverse Analysis Based on Generative Topographic Mapping

This paper introduces two generative topographic mapping (GTM) methods that can be used for data visualization, regression analysis, inverse analysis, and the determination of applicability domains (ADs). In GTM-multiple linear regression (GTM-MLR), the prior probability distribution of the descriptors or explanatory variables (X) is calculated with GTM, and the posterior probability distribution of the property/activity or objective variable (y) given X is calculated with MLR; inverse analysis is then performed using the product rule and Bayes’ theorem. In GTM-regression (GTMR), X and y are combined and GTM is performed to obtain the joint probability distribution of X and y; this leads to the posterior probability distributions of y given X and of X given y, which are used for regression and inverse analysis, respectively. Simulations using linear and nonlinear datasets and quantitative structure-activity relationship (QSAR) and quantitative structure-property relationship (QSPR) datasets confirm that GTM-MLR and GTMR enable data visualization, regression analysis, and inverse analysis considering appropriate ADs.

Visit the publication

 

This paper introduces two generative topographic mapping (GTM) methods that can be used for data visualization, regression analysis, inverse analysis, and the determination of applicability domains (ADs). In GTM-multiple linear regression (GTM-MLR), the prior probability distribution of the descriptors or explanatory variables (X) is calculated with GTM, and the posterior probability distribution of the property/activity or objective variable (y) given X is calculated with MLR; inverse analysis is then performed using the product rule and Bayes’ theorem. In GTM-regression (GTMR), X and y are combined and GTM is performed to obtain the joint probability distribution of X and y; this leads to the posterior probability distributions of y given X and of X given y, which are used for regression and inverse analysis, respectively. Simulations using linear and nonlinear datasets and quantitative structure-activity relationship (QSAR) and quantitative structure-property relationship (QSPR) datasets confirm that GTM-MLR and GTMR enable data visualization, regression analysis, and inverse analysis considering appropriate ADs.

Visit the publication