Refine
Document Type
- Journal article (1)
- Conference proceeding (1)
Language
- English (2)
Has full text
- yes (2)
Is part of the Bibliography
- yes (2)
Institute
- Life Sciences (2)
Publisher
- PLOS (1)
- PeerJ Inc. (1)
From raw ion mobility measurements to disease classification : a comparison of analysis processes
(2015)
Ion mobility spectrometry (IMS) is a technology for the detection of volatile compounds in the air of exhaled breath that is increasingly used in medical applications. One major goal is to classify patients into disease groups, for example diseased versus healthy, from simple breath samples. Raw IMS measurements are data matrices in which peak regions representing the compounds have to be identified and quantified. A typical analysis process consists of pre-processing and peak detection in single experiments, peak clustering to obtain consensus peaks across several experiments, and classification of samples based on the resulting multivariate peak intensities. Recently several automated algorithms for peak detection and peak clustering have been introduced, in order to overcome the current need for human-based analysis that is slow, subjective and sometimes not reproducible. We present an unbiased comparison of a multitude of combinations of peak processing and multivariate classification algorithms on a disease dataset. The specific combination of the algorithms for the different analysis steps determines the classification accuracy, with the encouraging result that certain fully-automated combinations perform even better than current manual approaches.
The best fully automated analysis process achieves even better classification results than the established manual process. The best algorithms for the three analysis steps are (i) SGLTR (Savitzky-Golay Laplace operator filter thresholding regions) and LM (Local Maxima) for automated peak identification, (ii) EM clustering (Expectation Maximization) and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) for the clustering step and (iii) RF (Random Forest) for multivariate classification. Thus, automated methods can replace the manual steps in the analysis process to enable an unbiased high throughput use of the technology.