Characterization and feature selection of volatile metabolites in Yangxian pigmented rice varieties through GC-MS and machine learning algorithms


Cheng K., Dong R., Pan F., Su W., Xi L., Zhang M., ...Daha Fazla

FRONTIERS IN NUTRITION, cilt.12, 2025 (SCI-Expanded, Scopus) identifier identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 12
  • Basım Tarihi: 2025
  • Doi Numarası: 10.3389/fnut.2025.1598875
  • Dergi Adı: FRONTIERS IN NUTRITION
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Directory of Open Access Journals
  • Atatürk Üniversitesi Adresli: Evet

Özet

Introduction Pigmented rice is fascinated by consumers for its abundant phytochemicals and unique aroma.Methods In this study, GC-MS-based metabolomics of Yangxian colored rice varieties were performed to characterize their volatile metabolites through multivariate statistics and machine learning algorithms.Results Results showed that a total of 357 volatile metabolites were detected and segmented into 9 groups, including 96 organooxygen compounds (26.89%), 52 carboxylic acids and derivatives (14.57%), 42 fatty acyls (11.76%), 16 benzene and substituted derivatives (4.48%), and 11 hydroxy acids and derivatives (3.08%). Multivariate statistics screened 127 differentially abundant metabolites via PLS-DA. Principal component analysis revealed that the percentages of PC1 and PC2 were 52.48% and 27.09%, respectively. Based on differential metabolites with great multicollinearity above 0.8 and the chi-square test (20% feature numbers), only 7 metabolites were found to represent the overall metabolites among the several colored rice varieties. Four machine learning models were further used for the classification of various colored rice varieties, and random forest model was the optimum for predicting classification, with an accuracy of 0.97. Moreover, Shapley additive explanations analysis revealed that the 7 metabolites can be used as potential markers for representing the metabolomic profiles.Conclusions These results implied that GC-MS-based metabolomics combined with random forest might be effective for extracting key features among different pigmented rice varieties.