A machine learning pipeline for substructure detection in unknown mass spectra
The detection of unknown compounds with mass spectrometry in complex mixtures is a very cumbersome and lengthy process. A comprehensive GCxGC-MS chromatogram can contain up to 10,000 chromatographic peaks and mass spectra in one chromatogram. The identification rates for unknown small molecules in such complex samples are usually less than 1%. Metabolomics has the ultimate goal of giving a comprehensive overview about all small molecules in a certain sample. Better software for de-novo identification of the true isomer structure of small molecules is desperately needed. We developed an automated classification workflow which can recognize substructures from unknown electron impact mass spectra.