中国组织工程研究 ›› 2025, Vol. 29 ›› Issue (35): 7679-7689.doi: 10.12307/2025.971

• 组织构建相关数据分析 Date analysis of organization construction • 上一篇    下一篇

机器学习联合生物信息学筛选与自噬相关的肺纤维化关键基因及实验验证

巩月红1,2,王梦君3,任  航3,郑  辉3,孙佳佳3,刘军鹏3,张  飞3,杨建华1,2,胡君萍4   

  1. 1新疆医科大学第一附属医院药学部,新疆维吾尔自治区乌鲁木齐市  830011;2新疆药物临床研究重点实验室,新疆维吾尔自治区乌鲁木齐市  830011;新疆医科大学,3临床医学部,4药学院,新疆维吾尔自治区乌鲁木齐市  830017
  • 收稿日期:2024-11-04 接受日期:2024-12-31 出版日期:2025-12-18 发布日期:2025-05-07
  • 通讯作者: 胡君萍,博士,教授,博士生导师,新疆医科大学药学院,新疆维吾尔自治区乌鲁木齐市 830017 并列通讯作者:杨建华,博士,教授,博士生导师,新疆医科大学第一附属医院药学部,新疆维吾尔自治区乌鲁木齐市 830011;新疆药物临床研究重点实验室带头人,新疆维吾尔自治区乌鲁木齐市 830011
  • 作者简介:巩月红,女,1976年生,新疆维吾尔自治区乌鲁木齐市人,汉族,硕士,副主任药师,主要从事新疆地产新药抗肺纤维化研究。 并列第一作者:王梦君,女,2002年生,河北省张家口市人,汉族,主要从事新疆地产新药抗肺纤维化研究。
  • 基金资助:
    新疆维吾尔自治区卫生健康委员会“天山英才”医药卫生高层次人才培养计划项目(TSYC202301B095),项目负责人:巩月红;新疆维吾尔自治区大学生创新训练计划项目(S202310760059),项目负责人:王梦君;新疆维吾尔自治区科学技术厅自然科学基金重点项目(2021D01D11),项目负责人:胡君萍;新疆医科大学第一附属医院创新团队培养项目(党字[2023]52号),项目负责人:杨建华

Machine learning combined with bioinformatics screening of key genes for pulmonary fibrosis associated with cellular autophagy and experimental validation

Gong Yuehong1, 2, Wang Mengjun3, Ren Hang3, Zheng Hui3, Sun Jiajia3, Liu Junpeng3, Zhang Fei3, Yang Jianhua1, 2, Hu Junping4   

  1. 1Pharmaceutical Department of the First Affiliated Hospital of Xinjiang Medical University, Urumqi 830011, Xinjiang Uygur Autonomous Region, China; 2Xinjiang Key Laboratory of Clinical Drug Research, Urumqi 830011, Xinjiang Uygur Autonomous Region, China; 3School of Clinical Medicine, 4School of Pharmacy, Xinjiang Medical University, Urumqi 830017, Xinjiang Uygur Autonomous Region, China 
  • Received:2024-11-04 Accepted:2024-12-31 Online:2025-12-18 Published:2025-05-07
  • Contact: Hu Junping, PhD, Professor, Doctoral supervisor, School of Pharmacy, Xinjiang Medical University, Urumqi 830017, Xinjiang Uygur Autonomous Region, China Co-corresponding author: Yang Jianhua, PhD, Professor, Doctoral supervisor, Pharmaceutical Department of the First Affiliated Hospital of Xinjiang Medical University, Urumqi 830011, Xinjiang Uygur Autonomous Region, China; Xinjiang Key Laboratory of Clinical Drug Research, Urumqi 830011, Xinjiang Uygur Autonomous Region, China
  • About author:Gong Yuehong, Master, Associate chief pharmacist, Pharmaceutical Department of the First Affiliated Hospital of Xinjiang Medical University, Urumqi 830011, Xinjiang Uygur Autonomous Region, China; Xinjiang Key Laboratory of Clinical Drug Research, Urumqi 830011, Xinjiang Uygur Autonomous Region, China Wang Mengjun, School of Clinical Medicine, Xinjiang Medical University, Urumqi 830017, Xinjiang Uygur Autonomous Region, China Gong Yuehong and Wang Mengjun contributed equally to this work.
  • Supported by:
    The “Tianshan Talents” high-level medical and health personnel training plan of the Health Commission of Xinjiang Uygur Autonomous Region, No.TSYC202301B095(to GYH); Xinjiang Medical University Student Innovation Training Program, No. S202310760059 (to WMJ); Natural Science Foundation of Xinjiang Uygur Autonomous Region, No. 2021D01D11 (to HJP); Innovation Team Training Project of the First Affiliated Hospital of Xinjiang Medical University, No. [2023]52 (to YJH)

摘要:


文题释义:
肺纤维化:是一类以成纤维细胞增殖和大量细胞外基质聚集,伴随炎症损伤和组织结构破坏为特征的肺疾病。
自噬:又称为Ⅱ型程序性细胞死亡,是细胞为了应对饥饿、缺氧等外界环境条件刺激下出现的一种破坏蛋白质和细胞器而细胞膜不被破坏的自我调控过程。

背景:肺纤维化的早期诊断是及时开展抗纤维化药物治疗的基础,因此,探索并发现能够有效应用于肺纤维化早期诊断的理想生物标志物对疾病治疗至关重要。
目的:通过生物信息学和机器学习技术对肺纤维化过程中涉及的与自噬相关关键基因进行深入分析,探究与自噬相关的肺纤维化核心基因是否可以作为评估肺纤维化进展中可靠的生物标志物。
方法:基于GEO数据库(是由美国国家生物技术信息中心开发和维护的一个公共数据库,用于存储和共享生物信息学数据)下载肺纤维化GSE24206和GSE110147两个数据集,利用R软件中的“limma”包将两组基因表达矩阵归一化处理。从GeneCards数据库(由美国国家生物技术信息中心创建,该知识库自动整合了约200个Web来源的以基因为中心的数据,包括基因组、转录组、蛋白质组、遗传、临床和功能信息)提取自噬相关基因集;对肺纤维化数据集进行差异基因分析,将差异基因与自噬基因集交叉对比提取共同基因,识别肺纤维化过程中可能发挥作用的自噬基因。交集基因通过GO、KEGG进行功能富集和细胞免疫浸润分析。通过蛋白质-蛋白质相互作用和机器学习筛选与自噬相关的肺纤维化核心基因,并对核心基因进行集富集分析。将筛选出的核心基因构建诊断模型,用校准曲线来评估线形图模型的预测能力,采用外部数据集GSE21369进行受试者工作特征曲线分析,验证与自噬相关的肺纤维化基因的表达特征,通过Coremine数据库预测与基因IL6、COL1A2相关的中药。培养人胚肺成纤维细胞,通过转化生长因子β1处理造模,利用qRT-PCR技术验证IL6、COL1A2在模型细胞中的相对表达。
结果与结论:①获得肺纤维化差异基因51个、与自噬基因交集基因25个,GO分析显示25个交集基因与细胞外基质组织、胶原代谢过程、胶原原纤维组织、生长因子结合等过程有关,KEGG分析显示25个交集基因主要与磷脂酰肌醇3-激酶-蛋白激酶B信号通路、细胞外基质-受体相互作用等信号通路有关;②免疫浸润分析发现,肺纤维化组活化记忆性CD4+ T细胞、M0巨噬细胞、静息树突状细胞表达显著升高(P < 0.05),呈强相关性;③共筛选出2个参与肺纤维化进展的自噬特征基因:COL1A2、IL6,列线图模型显示两核心基因预测肺纤维化的发病较为准确,受试者工作特征曲线分析显示2个特征基因均具有诊断意义;COL1A2、IL6与细胞周期通路、丝裂原活化蛋白激酶信号通路、Janus激酶-信号转导与转录激活子信号通路以及细胞因子与细胞因子受体相互作用相关;预测到与COL1A2、IL6相关的中药共20味,功效以清热解毒、活血行气为主;细胞实验验证了COL1A2、IL6在肺纤维化中高表达。结果表明:COL1A2、IL6可能是肺纤维化潜在的诊断生物标志物,但是它们与肺纤维化相关的特异性尚需进一步研究。

关键词: 肺纤维化, 自噬;机器学习, 生物信息学, 免疫浸润, 最小绝对收缩与选择算子, 基因富集分析, 工程化组织构建

Abstract: BACKGROUND: Early diagnosis of pulmonary fibrosis is the foundation for timely antifibrotic drug therapy. Therefore, exploring and discovering ideal biomarkers that can be effectively used for the early diagnosis of pulmonary fibrosis is crucial for the treatment of the disease.
OBJECTIVE: To conduct an in-depth analysis of key autophagy-related genes involved in the process of pulmonary fibrosis by means of bioinformatics and machine learning techniques, in order to investigate whether autophagy-related core genes of pulmonary fibrosis can be used as reliable biomarkers in the assessment of the progression of pulmonary fibrosis.
METHODS: Two datasets of pulmonary fibrosis, GSE24206 and GSE110147, were downloaded from the Gene Expression Omnibus (GEO) database (a public database developed and maintained by the U.S. National Center for Biotechnology Information to store and share bioinformatics data), and the gene expression matrices of these two datasets were normalized by using the “limma” package in R software. The autophagy-related genes were extracted from GeneCards database (a database created by the U.S. National Center for Biotechnology Information, which automatically integrates gene-centric data from about 200 Web sources, including genomic, transcriptomic, proteomic, genetic, clinical, and functional information). Differential gene analysis was performed on the pulmonary fibrosis dataset, and the common genes were extracted by cross-comparing the differential genes with the autophagy genes, so as to identify autophagy genes that may play a role in the process of pulmonary fibrosis. The intersecting genes were analyzed for functional enrichment and cellular immune infiltration by gene ontology and Kyoto Encyclopedia of Genes and Genomes. Core genes of pulmonary fibrosis associated with autophagy were screened by protein-protein interactions and machine learning, and core genes were subjected to the enrichment analysis. Diagnostic models were constructed from the identified core genes. Calibration curves were used to assess the predictive ability of the line graph model. An external dataset, GSE21369, was used to perform a receiver operating characteristic curve analysis to validate the expression profiles of pulmonary fibrosis genes associated with autophagy, as well as to predict Chinese herbs associated with the genes IL6 and COL1A2 via the Coremine database. Finally, human embryonic lung fibroblasts were cultured and modelled by transforming growth factor-β1 treatment, and the relative expression of genes in the model cells was verified using qRT-PCR.
RESULTS AND CONCLUSION: (1) A total of 51 pulmonary fibrosis differential genes and 25 genes intersecting with autophagy genes were obtained. Gene ontology analysis showed that the 25 intersecting genes were related to extracellular matrix tissue, collagen metabolism, collagen pro-fibroblasts, and growth factor binding, etc. The results of Kyoto Encyclopedia of Genes and Genomes enrichment analysis indicated that they were mainly related to the Phosphatidylinositol 3-kinase/protein kinase B signaling pathway and the signaling pathway of the extracellular matrix-receptor interactions. (2) Immunoinfiltration analysis revealed that the expression of activated memory CD4+ T cells, M0 macrophages, and resting dendritic cells was significantly elevated in the pulmonary fibrosis group (P < 0.05), showing a strong correlation. (3) Two autophagy signature genes involved in the progression of pulmonary fibrosis were identified: COL1A2 and IL6. The column-line diagram model showed that the two core genes predicted the onset of pulmonary fibrosis more accurately, and the receiver operating characteristic curve analysis showed that the two characteristic genes had diagnostic significance. COL1A2 and IL6 were related to the cell-cycle pathway, mitogen-activated protein kinase signaling pathway, Janus kinase-signal transduction and activator of transcription signaling pathway and cytokine-cytokine receptor interactions. A total of 20 Chinese herbs were predicted to be related to COL1A2 and IL6 genes, and their efficacies were mainly to clear away heat and detoxify toxins and to invigorate blood and move qi. COL1A2 and IL6 were verified to be highly expressed in pulmonary fibrosis. To conclude, COL1A2 and IL6 may be potential diagnostic biomarkers for pulmonary fibrosis, but its specificity to pulmonary fibrosis needs to be further investigated.

Key words: pulmonary fibrosis, autophagy, machine learning, bioinformatics, immune infiltration, minimum absolute contraction and the selection operator, gene enrichment analysis, engineered tissue construction

中图分类号: