Chinese Journal of Tissue Engineering Research ›› 2025, Vol. 29 ›› Issue (35): 7679-7689.doi: 10.12307/2025.971

Previous Articles     Next Articles

Machine learning combined with bioinformatics screening of key genes for pulmonary fibrosis associated with cellular autophagy and experimental validation

Gong Yuehong1, 2, Wang Mengjun3, Ren Hang3, Zheng Hui3, Sun Jiajia3, Liu Junpeng3, Zhang Fei3, Yang Jianhua1, 2, Hu Junping4   

  1. 1Pharmaceutical Department of the First Affiliated Hospital of Xinjiang Medical University, Urumqi 830011, Xinjiang Uygur Autonomous Region, China; 2Xinjiang Key Laboratory of Clinical Drug Research, Urumqi 830011, Xinjiang Uygur Autonomous Region, China; 3School of Clinical Medicine, 4School of Pharmacy, Xinjiang Medical University, Urumqi 830017, Xinjiang Uygur Autonomous Region, China 
  • Received:2024-11-04 Accepted:2024-12-31 Online:2025-12-18 Published:2025-05-07
  • Contact: Hu Junping, PhD, Professor, Doctoral supervisor, School of Pharmacy, Xinjiang Medical University, Urumqi 830017, Xinjiang Uygur Autonomous Region, China Co-corresponding author: Yang Jianhua, PhD, Professor, Doctoral supervisor, Pharmaceutical Department of the First Affiliated Hospital of Xinjiang Medical University, Urumqi 830011, Xinjiang Uygur Autonomous Region, China; Xinjiang Key Laboratory of Clinical Drug Research, Urumqi 830011, Xinjiang Uygur Autonomous Region, China
  • About author:Gong Yuehong, Master, Associate chief pharmacist, Pharmaceutical Department of the First Affiliated Hospital of Xinjiang Medical University, Urumqi 830011, Xinjiang Uygur Autonomous Region, China; Xinjiang Key Laboratory of Clinical Drug Research, Urumqi 830011, Xinjiang Uygur Autonomous Region, China Wang Mengjun, School of Clinical Medicine, Xinjiang Medical University, Urumqi 830017, Xinjiang Uygur Autonomous Region, China Gong Yuehong and Wang Mengjun contributed equally to this work.
  • Supported by:
    The “Tianshan Talents” high-level medical and health personnel training plan of the Health Commission of Xinjiang Uygur Autonomous Region, No.TSYC202301B095(to GYH); Xinjiang Medical University Student Innovation Training Program, No. S202310760059 (to WMJ); Natural Science Foundation of Xinjiang Uygur Autonomous Region, No. 2021D01D11 (to HJP); Innovation Team Training Project of the First Affiliated Hospital of Xinjiang Medical University, No. [2023]52 (to YJH)

Abstract: BACKGROUND: Early diagnosis of pulmonary fibrosis is the foundation for timely antifibrotic drug therapy. Therefore, exploring and discovering ideal biomarkers that can be effectively used for the early diagnosis of pulmonary fibrosis is crucial for the treatment of the disease.
OBJECTIVE: To conduct an in-depth analysis of key autophagy-related genes involved in the process of pulmonary fibrosis by means of bioinformatics and machine learning techniques, in order to investigate whether autophagy-related core genes of pulmonary fibrosis can be used as reliable biomarkers in the assessment of the progression of pulmonary fibrosis.
METHODS: Two datasets of pulmonary fibrosis, GSE24206 and GSE110147, were downloaded from the Gene Expression Omnibus (GEO) database (a public database developed and maintained by the U.S. National Center for Biotechnology Information to store and share bioinformatics data), and the gene expression matrices of these two datasets were normalized by using the “limma” package in R software. The autophagy-related genes were extracted from GeneCards database (a database created by the U.S. National Center for Biotechnology Information, which automatically integrates gene-centric data from about 200 Web sources, including genomic, transcriptomic, proteomic, genetic, clinical, and functional information). Differential gene analysis was performed on the pulmonary fibrosis dataset, and the common genes were extracted by cross-comparing the differential genes with the autophagy genes, so as to identify autophagy genes that may play a role in the process of pulmonary fibrosis. The intersecting genes were analyzed for functional enrichment and cellular immune infiltration by gene ontology and Kyoto Encyclopedia of Genes and Genomes. Core genes of pulmonary fibrosis associated with autophagy were screened by protein-protein interactions and machine learning, and core genes were subjected to the enrichment analysis. Diagnostic models were constructed from the identified core genes. Calibration curves were used to assess the predictive ability of the line graph model. An external dataset, GSE21369, was used to perform a receiver operating characteristic curve analysis to validate the expression profiles of pulmonary fibrosis genes associated with autophagy, as well as to predict Chinese herbs associated with the genes IL6 and COL1A2 via the Coremine database. Finally, human embryonic lung fibroblasts were cultured and modelled by transforming growth factor-β1 treatment, and the relative expression of genes in the model cells was verified using qRT-PCR.
RESULTS AND CONCLUSION: (1) A total of 51 pulmonary fibrosis differential genes and 25 genes intersecting with autophagy genes were obtained. Gene ontology analysis showed that the 25 intersecting genes were related to extracellular matrix tissue, collagen metabolism, collagen pro-fibroblasts, and growth factor binding, etc. The results of Kyoto Encyclopedia of Genes and Genomes enrichment analysis indicated that they were mainly related to the Phosphatidylinositol 3-kinase/protein kinase B signaling pathway and the signaling pathway of the extracellular matrix-receptor interactions. (2) Immunoinfiltration analysis revealed that the expression of activated memory CD4+ T cells, M0 macrophages, and resting dendritic cells was significantly elevated in the pulmonary fibrosis group (P < 0.05), showing a strong correlation. (3) Two autophagy signature genes involved in the progression of pulmonary fibrosis were identified: COL1A2 and IL6. The column-line diagram model showed that the two core genes predicted the onset of pulmonary fibrosis more accurately, and the receiver operating characteristic curve analysis showed that the two characteristic genes had diagnostic significance. COL1A2 and IL6 were related to the cell-cycle pathway, mitogen-activated protein kinase signaling pathway, Janus kinase-signal transduction and activator of transcription signaling pathway and cytokine-cytokine receptor interactions. A total of 20 Chinese herbs were predicted to be related to COL1A2 and IL6 genes, and their efficacies were mainly to clear away heat and detoxify toxins and to invigorate blood and move qi. COL1A2 and IL6 were verified to be highly expressed in pulmonary fibrosis. To conclude, COL1A2 and IL6 may be potential diagnostic biomarkers for pulmonary fibrosis, but its specificity to pulmonary fibrosis needs to be further investigated.

Key words: pulmonary fibrosis, autophagy, machine learning, bioinformatics, immune infiltration, minimum absolute contraction and the selection operator, gene enrichment analysis, engineered tissue construction

CLC Number: