中国组织工程研究 ›› 2025, Vol. 29 ›› Issue (35): 7499-7510.doi: 10.12307/2025.947

• 骨组织构建 bone tissue construction • 上一篇    下一篇

对比6种适用于医学领域使用的机器学习模型:支持骨质疏松症筛查和初步诊断

杨  磊1,刘三毛2,3,孙焕伟3,车  超1,唐  琳1   

  1. 1大连大学软件工程学院,辽宁省大连市  116622;2广西医科大学第一附属医院,广西壮族自治区南宁市  530021;3大连理工大学附属中心医院,辽宁省大连市  116033


  • 收稿日期:2024-10-11 接受日期:2024-12-10 出版日期:2025-12-18 发布日期:2025-04-30
  • 通讯作者: 唐琳,博士,副教授,大连大学软件工程学院,辽宁省大连市 116622
  • 作者简介:杨磊,男,1998年生,内蒙古自治区呼伦贝尔市人,大连大学在读硕士,主要从事医学信息学和机器学习方法的研究。
  • 基金资助:
    国家自然科学基金面上项目(62076045),项目负责人:车超;大连大学学科交叉项目(DLUXK-2023-YB-003),项目负责人:车超

Comparison of six machine learning models suitable for use in medicine: support for osteoporosis screening and initial diagnosis

Yang Lei1, Liu Sanmao2, 3, Sun Huanwei3, Che Chao1, Tang Lin1   

  1. 1School of Software Engineering, Dalian University, Dalian 116622, Liaoning Province, China; 2The First Affiliated Hospital of Guangxi Medical University, Nanning 530021, Guangxi Zhuang Autonomous Region, China; 3The Affiliated Central Hospital of Dalian University of Technology, Dalian 116033, Liaoning Province, China
  • Received:2024-10-11 Accepted:2024-12-10 Online:2025-12-18 Published:2025-04-30
  • Contact: Tang Lin, PhD, Associate professor, School of Software Engineering, Dalian University, Dalian 116622, Liaoning Province, China
  • About author:Yang Lei, Master candidate, School of Software Engineering, Dalian University, Dalian 116622, Liaoning Province, China
  • Supported by:
    National Natural Science Foundation of China, No. 62076045 (to CC); Dalian University Discipline Crossing Project, No. DLUXK-2023-YB-003 (to CC) 

摘要:

文题释义:
集成学习:是一种机器学习方法,它结合多个模型来提高预测性能和鲁棒性。集成学习通常包括多种类型子模型,这些子模型分别进行训练,并通过投票、平均或加权合并等方式整合预测结果,能够减少过拟合的风险,提高模型的泛化能力,达到更高的预测准确率。
SHAP:即SHapley Additive exPlanations,是一种可解释性框架,常被用于解释机器学习模型的预测结果。它基于博弈论中的沙普利值(Shapley values),通过量化每个特征对模型预测的贡献度来进行解释。SHAP方法能够帮助研究者和医生理解模型做出特定预测的原因,从而提高模型的透明度和可信度。


背景:随着社会人口老龄化程度加剧,骨质疏松症发病率正逐年递增,相应的筛查和诊断需求给医疗系统带来了巨大挑战,也增加了患者接受检查的时间成本、经济负担和辐射暴露的风险。
目的:构建基于传统CT检查数据和人口统计学数据的新型可解释性预测方法。
方法:设计了一个两阶段可解释性骨质疏松预测框架。第1阶段,采用人机协同标注CT图像,创新性地提出了椎骨7点CT值测量方法,并将患者的性别与年龄作为关键人口统计学特征纳入特征集,显著丰富了模型的输入信息;第2阶段,在LightGBM模型的基础上,引入了SHAP(SHapley Additive exPlanations)方法,对特征重要性进行定量分析,从而增强模型预测结果的可解释性,提升临床可操作性与信任度。通过系统性实验,对不同特征组合与6种机器学习模型进行对比分析,验证所提出框架的有效性与最优特征组合的稳定性。为进一步评估模型的泛化能力,研究还在外部独立数据集上进行了验证。
结果与结论:实验对比了6种适用于医学领域使用的机器学习模型,结果显示LightGBM模型F1分数为0.902 2,曲线下面积为0.938 7,高于其他模型。在可解释性方面,通过排序并可视化输入特征对结果的贡献程度,提升了模型在临床应用中的可信度和可操作性。此外,该研究实现了原型系统,测试结果显示系统操作简便,能快速处理数据给出预测结果,且可视化结果具有较好的可解释性,能够有效辅助医生进行临床决策,为骨质疏松症的筛查和初步诊断提供了有力支持。

https://orcid.org/0000-0002-4538-0225(唐琳)

中国组织工程研究杂志出版内容重点:组织构建;骨细胞;软骨细胞;细胞培养;成纤维细胞;血管内皮细胞;骨质疏松;组织工程

关键词: 骨质疏松, CT, 临床辅助决策, 临床决策支持, 可解释性预测模型, 集成学习, LightGBM模型, SHAP

Abstract: BACKGROUND: With the increasing degree of population aging in China, the incidence of osteoporosis is rising annually. This growing demand for screening and diagnosis poses significant challenges to the healthcare system, increasing the time costs, financial burdens, and radiation exposure risks for patients.
OBJECTIVE: To develop a novel interpretable prediction method based on traditional CT examination data and demographic data, aiming to reduce the number of patient examinations and enable multiple screenings from one examination.
METHODS: A two-stage interpretable framework for osteoporosis prediction was designed. In the first stage, a human-computer collaborative method was used for annotating CT images, with an innovative vertebra 7-point CT value measurement technique. Patient’s sex and age were used as key demographic features to enrich the model’s input. In the second stage, the LightGBM model was enhanced by SHapley Additive exPlanations for quantitative analysis of feature importance, improving the interpretability of predictions and increasing clinical trust. Systematic experiments validated the effectiveness of the framework and the stability of the optimal feature set through the comparative analysis of different feature combinations with six machine learning models. To further assess the generalization ability of the model, the model was further tested on an external dataset.
RESULTS AND CONCLUSION: The experiment compared six machine learning models suitable for medical applications, and the results showed that LightGBM model achieved an F1 score of 0.902 2 and an area under the curve of 0.938 7, outperforming the other models. In terms of interpretability, the clinical application credibility and operability of the model was increased by ranking and visualizing the contribution of input features to the results. Additionally, this study realized a prototype system, and testing results indicated that the system is user-friendly, capable of quickly processing data to provide prediction results, with visualized outcomes demonstrating good interpretability. This system effectively assists doctors in clinical decision-making and provides robust support for the screening and preliminary diagnosis of osteoporosis.

Key words: osteoporosis, CT, clinical decision aid, clinical decision support, interpretable predictive modeling, integrated learning, LightGBM model, SHapley Additive exPlanations

中图分类号: