Chemometric analysis in Raman spectroscopy from experimental design to machine learning–based modeling

Shuxia Guo, Jürgen Popp, Thomas Bocklitz

Published: 2021-11-04 DOI: 10.1038/s41596-021-00620-3

Extended

Extended Data Fig. 1 An example of data structure.

The data is structured hierarchically following device-replicate-group. The calibration files are saved along with the sample spectra under the folder each group. The date and time information of the measurement is marked in file names in a format ‘ddmmyy_hhmmss’. The ‘Info’ files in each folder contain necessary records of the measurement.

Extended Data Fig. 2 Results of model validation and evaluation based on two dimensional reduction methods and different mechanisms of sampling for the bacterial dataset (Dataset 2).

The classification was performed using two dimension reduction methods and four classifiers in the framework of different sample sampling. Each box contains 9 values representing the mean sensitivity of the validation and testing results produced during the 9 iterations of the 9-fold/9-replicate external validation. The internal validation is considered unbiased if the testing and validation results are comparable, otherwise it is biased.

Extended Data Fig. 3 Results of model validation and evaluation based on two dimensional reduction methods and different mechanisms of sampling for the cell’s dataset (Dataset 1).

Each box contains 9 values representing the mean sensitivity of the validation and testing results produced during the 9 iterations of the 9-fold/9-replicate external validation. The internal validation is considered unbiased if the testing and validation results are comparable, otherwise it is biased.

Quantitative profiling of posttranslational modifications of pathological tau via sarkosyl fractionation and mass spectrometry

References

Extended
Supplementary information

Popp, J. et al. Handbook of Biophotonics Vol. 1 (Wiley-VCH, 2011).
McCreery, R. L. Raman Spectroscopy for Chemical Analysis Vol. 225 (John Wiley & Sons, 2005).
Cheng, J.-X. & Xie, X. S. Vibrational spectroscopic imaging of living systems: an emerging platform for biology and medicine. Science 350, aaa8870 (2015).
Bocklitz, T. W. et al. Raman based molecular imaging and analytics: a magic bullet for biomedical applications!? Anal. Chem. 88, 133–151 (2016).
Lorenz, B. et al. Cultivation-free Raman spectroscopic investigations of bacteria. Trends Microbiol. 25, 413–424 (2017).
Liu, C.-Y. et al. Rapid bacterial antibiotic susceptibility test based on simple surface-enhanced Raman spectroscopic biomarkers. Sci. Rep. 6, 23375 (2016).
Prochazka, D. et al. Combination of laser-induced breakdown spectroscopy and Raman spectroscopy for multivariate classification of bacteria. Spectrochim. Acta B. Spectrosc. 139, 6–12 (2018).
Silge, A. et al. The application of UV resonance Raman spectroscopy for the differentiation of clinically relevant Candida species. Anal. Bioanal. Chem. 410, 5839–5847 (2018).
Hanson, C. et al. Simultaneous isolation and label-free identification of bacteria using contactless dielectrophoresis and Raman spectroscopy. Electrophoresis 40, 1446–1456 (2019).
Van Nest, S. J. et al. Raman spectroscopy detects metabolic signatures of radiation response and hypoxic fluctuations in non-small cell lung cancer. BMC Cancer 19, 474 (2019).
Marro, M. et al. Unravelling the metabolic progression of breast cancer cells to bone metastasis by coupling Raman spectroscopy and a novel use of MCR-ALS algorithm. Anal. Chem. 90, 5594–5602 (2018).
Aljakouch, K. et al. Raman microspectroscopic evidence for the metabolism of a tyrosine kinase inhibitor, neratinib, in cancer cells. Angew. Chem. Int. Ed. 57, 7250–7254 (2018).
Pence, I. & Mahadevan-Jansen, A. Clinical instrumentation and applications of Raman spectroscopy. Chem. Soc. Rev. 45, 1958–1979 (2016).
Kong, K. et al. Raman spectroscopy for medical diagnostics—from in-vitro biofluid assays to in-vivo cancer detection. Adv. Drug Deliv. Rev. 89, 121–134 (2015).
Koo, K. M. et al. Design and clinical verification of surface-enhanced Raman spectroscopy diagnostic technology for individual cancer risk prediction. ACS Nano 12, 8362–8371 (2018).
Doty, K. C. & Lednev, I. K. Raman spectroscopy for forensic purposes: recent applications for serology and gunshot residue analysis. TrAC Trends Anal. Chem. 103, 215–222 (2018).
Khandasammy, S. R. et al. Bloodstains, paintings, and drugs: Raman spectroscopy applications in forensic science. Forensic Chem. 8, 111–133 (2018).
de Oliveira Penido, C. A. F. et al. Raman spectroscopy in forensic analysis: identification of cocaine and other illegal drugs of abuse. J. Raman Spectrosc. 47, 28–38 (2016).
Guo, S., Ryabchykov, O., Ali, N., Houhou, R. & Bocklitz, T. Comprehensive chemometrics. in Comprehensive Chemometrics: Chemical and Biochemical Data Analysis (eds Brown, S. D. et al.) 333–360 (Elsevier, 2020).
Ryabchykov, O., Guo, S. & Bocklitz, T. Analyzing Raman spectroscopic data. in Micro-Raman Spectroscopy: Theory and Application (eds Popp, J. & Mayerhöfer, T.) 81–106 (De Gruyter, 2020).
Guo, S. et al. Comparability of Raman spectroscopic configurations: a large scale cross-laboratory study. Anal. Chem. 92, 15745–15756 (2020).
Morais, C. L. et al. Tutorial: multivariate classification for vibrational spectroscopy in biological samples. Nat. Protoc. 15, 2143–2162 (2020).
Baker, M. J. et al. Using Fourier transform IR spectroscopy to analyze biological materials. Nat. Protoc. 9, 1771 (2014).
Ryabchykov, O., Guo, S. & Bocklitz, T. Analyzing Raman spectroscopic data. Phys. Sci. Rev. https://doi.org/10.1515/psr-2017-0043 (2019).
Butler, H. J. et al. Using Raman spectroscopy to characterize biological materials. Nat. Protoc. 11, 664 (2016).
Smith, E. & Dent, G. Modern Raman Spectroscopy: A Practical Approach (Wiley, 2019).
Quinn, G. P. & Keough, M. J. Experimental Design and Data Analysis for Biologists (Cambridge University Press, 2002).
Shreve, A. P., Cherepy, N. J. & Mathies, R. A. Effective rejection of fluorescence interference in Raman spectroscopy using a shifted excitation difference technique. Appl. Spectrosc. 46, 707–711 (1992).
Zhao, J., Carrabba, M. M. & Allen, F. S. Automated fluorescence rejection using shifted excitation Raman difference spectroscopy. Appl. Spectrosc. 56, 834–845 (2002).
Guo, S. et al. Spectral reconstruction for shifted-excitation Raman difference spectroscopy (SERDS). Talanta 186, 372–380 (2018).
Matousek, P. et al. Subsurface probing in diffusely scattering media using spatially offset Raman spectroscopy. Appl. Spectrosc. 59, 393–400 (2005).
Bocklitz, T. et al. Spectrometer calibration protocol for Raman spectra recorded with different excitation wavelengths. Spectrochim. Acta A Mol. Biomol. Spectrosc. 149, 544–549 (2015).
Dörfer, T. et al. Checking and improving calibration of Raman spectra using chemometric approaches. Z. Phys. Chem. 225, 753–764 (2011).
ASTM E1840–96(2014): Standard Guide for Raman Shift Standards for Spectrometer Calibration (ASTM International, 2014).
Carrabba, M. M. Wavenumber standards for Raman Spectrometry. in Handbook of Vibrational Spectroscopy Vol 1 (Wiley, 2006).
Hajian-Tilaki, K. Sample size estimation in diagnostic test studies of biomedical informatics. J. Biomed. Inform. 48, 193–204 (2014).
Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).
Gy, P. Sampling for Analytical Purposes (John Wiley & Sons, 1998).
Saccenti, E. & Timmerman, M. E. Approaches to sample size determination for multivariate data: Applications to PCA and PLS-DA of omics data. J. Proteome Res. 15, 2379–2393 (2016).
Cohen, J. Statistical power analysis. Curr. Dir. Psychol. Sci. 1, 98–101 (1992).
Nakagawa, S. & Cuthill, I. C. Effect size, confidence interval and statistical significance: a practical guide for biologists. Biol. Rev. 82, 591–605 (2007).
Ali, N. et al. Sample-size planning for multivariate data: a Raman-spectroscopy-based example. Anal. Chem. 90, 12485–12492 (2018).
Beleites, C. et al. Sample size planning for classification models. Anal. Chim. Acta 760, 25–33 (2013).
Bocklitz, T. et al. How to pre-process Raman spectra for reliable and stable models? Anal. Chim. Acta 704, 47–56 (2011).
Heraud, P. et al. Effects of pre-processing of Raman spectra on in vivo classification of nutrient status of microalgal cells. J. Chemom. 20, 193–197 (2006).
Penny, K. I. & Jolliffe, I. T. A comparison of multivariate outlier detection methods for clinical laboratory safety data. J. R. Stat. Soc. D. 50, 295–307 (2001).
Brownfield, B. & Kalivas, J. H. Consensus outlier detection using sum of ranking differences of common and new outlier measures without tuning parameter selections. Anal. Chem. 89, 5087–5094 (2017).
Ryabchykov, O. et al. Automatization of spike correction in Raman spectra of biological samples. Chemom. Intell. Lab. Syst. 155, 1–6 (2016).
Guo, S. et al. Towards an improvement of model transferability for Raman spectroscopy in biological applications. Vib. Spectrosc. 91, 111–118 (2017).
Bloemberg, T. G. et al. Warping methods for spectroscopic and chromatographic signal alignment: a tutorial. Anal. Chim. Acta 781, 14–32 (2013).
Tomasi, G., Van Den Berg, F. & Andersson, C. Correlation optimized warping and dynamic time warping as preprocessing methods for chromatographic data. J. Chemom. 18, 231–241 (2004).
Liu, Y.-J. et al. Multivariate statistical process control (MSPC) using Raman spectroscopy for in-line culture cell monitoring considering time-varying batches synchronized with correlation optimized warping (COW). Anal. Chim. Acta 952, 9–17 (2017).
Beier, B. D. & Berger, A. J. Method for automated background subtraction from Raman spectra containing known contaminants. Analyst 134, 1198–1202 (2009).
McLaughlin, G., Sikirzhytski, V. & Lednev, I. K. Circumventing substrate interference in the Raman spectroscopic identification of blood stains. Forensic Sci. Int. 231, 157–166 (2013).
McLaughlin, G. et al. Universal detection of body fluid traces in situ with Raman hyperspectroscopy for forensic purposes: evaluation of a new detection algorithm (HAMAND) using semen samples. J. Raman Spectrosc. 50, 1147–1153 (2019).
Ryan, C. et al. SNIP, a statistics-sensitive background treatment for the quantitative analysis of PIXE spectra in geoscience applications. Nucl. Instrum. Methods Phys. Res. B 34, 396–402 (1988).
Eilers, P. H. & Boelens, H. F. Baseline correction with asymmetric least squares smoothing. Leiden-. Univ. Med. Cent. Rep. 1, 5 (2005).
Lieber, C. A. & Mahadevan-Jansen, A. Automated method for subtraction of fluorescence from biological Raman spectra. Appl. Spectrosc. 57, 1363–1367 (2003).
Afseth, N. K. & Kohler, A. Extended multiplicative signal correction in vibrational spectroscopy, a tutorial. Chemom. Intell. Lab. Syst. 117, 92–99 (2012).
Knorr, F., Smith, Z. J. & Wachsmann-Hogiu, S. Development of a time-gated system for Raman spectroscopy of biological samples. Opt. Express 18, 20049–20058 (2010).
Praveen, B. B. et al. Fluorescence suppression using wavelength modulated Raman spectroscopy in fiber-probe-based tissue analysis. J. Biomed. Opt. 17, 077006 (2012).
Engel, J. et al. Breaking with trends in pre-processing? TrAC Trends Anal. Chem. 50, 96–106 (2013).
Gerretzen, J. et al. Boosting model performance and interpretation by entangling preprocessing selection and variable selection. Anal. Chim. Acta 938, 44–52 (2016).
Guo, S., Bocklitz, T. & Popp, J. Optimization of Raman-spectrum baseline correction in biological application. Analyst 141, 2396–2404 (2016).
Morishita, A., Imaging device and image processing program for estimating fixed pattern noise from partial noise output of available pixel area. Google Patents (2012).
Brown, C. D. & Wentzell, P. D. Hazards of digital smoothing filters as a preprocessing tool in multivariate calibration. J. Chemom. 13, 133–152 (1999).
Theodoridis, S. and Koutroumbas, K. Pattern Recognition 4th edn (Academic Press, 2008).
Hastie, T. et al. The elements of statistical learning: data mining, inference and prediction. Math. Intell. 27, 83–85 (2005).
Guo, S. et al. Common mistakes in cross-validating classification models. Anal. Methods 9, 4410–4417 (2017).
Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of the 14th International Joint Conference on Artificial Intelligence Vol. 2, 1137–1145 (1995).
de Boves Harrington, P. Statistical validation of classification and calibration models using bootstrapped Latin partitions. TrAC Trends Anal. Chem. 25, 1112–1124 (2006).
Guyon, I. & Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003).
Liu, J. et al. Deep convolutional neural networks for Raman spectrum recognition: a unified solution. Analyst 142, 4067–4074 (2017).
Hedegaard, M. et al. Spectral unmixing and clustering algorithms for assessment of single cells by Raman microscopic imaging. Theor. Chem. Acc. 130, 1249–1260 (2011).
Nascimento, J. M. & Dias, J. M. Vertex component analysis: a fast algorithm to unmix hyperspectral data. IEEE Trans. Geosci. Remote Sens. 43, 898–910 (2005).
Li, R. & Wang, X. Dimension reduction of process dynamic trends using independent component analysis. Comput. Chem. Eng. 26, 467–473 (2002).
Zhang, Z., Chow, T. W. & Zhao, M. M-Isomap: orthogonal constrained marginal isomap for nonlinear dimensionality reduction. IEEE Trans. Cybern. 43, 180–191 (2012).
de Silva, V. & Tenenbaum, J. B. Global versus local methods in nonlinear dimensionality reduction. in Advances in Neural Information Processing Systems (2003).
Shan, R., Cai, W. & Shao, X. Variable selection based on locally linear embedding mapping for near-infrared spectral analysis. Chemom. Intell. Lab. Syst. 131, 31–36 (2014).
Hinton, G. E. & Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006).
Wold, S. Pattern recognition by means of disjoint principal components models. Pattern Recognit. 8, 127–139 (1976).
Barker, M. & Rayens, W. Partial least squares for discrimination. J. Chemom. 17, 166–173 (2003).
Copas, J. B. Regression, prediction and shrinkage. J. R. Stat. Soc. B Methodol. 45, 311–335 (1983).
Szymańska, E. et al. Chemometrics and qualitative analysis have a vibrant relationship. TrAC Trends Anal. Chem. 69, 34–51 (2015).
Ballabio, D., Grisoni, F. & Todeschini, R. Multivariate comparison of classification performance measures. Chemom. Intell. Lab. Syst. 174, 33–44 (2018).
Olivieri, A. C. Analytical figures of merit: from univariate to multiway calibration. Chem. Rev. 114, 5358–5378 (2014).
Petersen, L., Minkkinen, P. & Esbensen, K. H. Representative sampling for reliable data analysis: theory of sampling. Chemom. Intell. Lab. Syst. 77, 261–277 (2005).
Esbensen, K. H. & Geladi, P. Principles of proper validation: use and abuse of re-sampling for validation. J. Chemom. 24, 168–187 (2010).
Kalivas, J. H. et al. Calibration maintenance and transfer using Tikhonov regularization approaches. Appl. Spectrosc. 63, 800–809 (2009).
Fernández Pierna, J. et al. Standardization of NIR microscopy spectra obtained from inter-laboratory studies by using a standardization cell. Biotechnol. Agron. Soc. Environ. 17, 547–555 (2013).
Sjöblom, J. et al. An evaluation of orthogonal signal correction applied to calibration transfer of near infrared spectra. Chemom. Intell. Lab. Syst. 44, 229–244 (1998).
Wang, Y., Veltkamp, D. J. & Kowalski, B. R. Multivariate instrument standardization. Anal. Chem. 63, 2750–2756 (1991).
Guo, S. et al. Model transfer for Raman-spectroscopy-based bacterial classification. J. Raman Spectrosc. 49, 627–637 (2018).
Guo, S. et al. Extended multiplicative signal correction based model transfer for Raman spectroscopy in biological applications. Anal. Chem. 90, 9787–9795 (2018).
Morais, C. L. et al. Standardization of complex biologically derived spectrochemical datasets. Nat. Protoc. 14, 1546–1577 (2019).
Fisher, R. A. The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 179–188 (1936).
Neugebauer, U. et al. Towards detection and identification of circulating tumour cells using Raman spectroscopy. Analyst 135, 3178–3182 (2010).
Stöckel, S. et al. Identification of Bacillus anthracis via Raman spectroscopy and chemometric approaches. Anal. Chem. 84, 9873–9880 (2012).
Vogler, N. et al. Systematic evaluation of the biological variance within the Raman based colorectal tissue diagnostics. J. Biophotonics 9, 533–541 (2016).
Kumar, B. N. V. et al. Demonstration of carbon catabolite repression in naphthalene degrading soil bacteria via Raman spectroscopy based stable isotope probing. Anal. Chem. 88, 7574–7582 (2016).
Héberger, K. & Kollár-Hunek, K. Sum of ranking differences for method discrimination and its validation: comparison of ranks with random numbers. J. Chemom. 25, 151–158 (2011).

Chemometric analysis in Raman spectroscopy from experimental design to machine learning–based modeling

Extended

Supplementary information

推荐阅读