中国组织工程研究 ›› 2026, Vol. 30 ›› Issue (3): 770-784.doi: 10.12307/2026.009

• 植入物相关大数据分析 Implant related big data analysis • 上一篇    下一篇

强直性脊柱炎与银屑病共有基因人工神经网络模型及基于机器学习的挖掘和验证

赵非凡1,曹玉净2   

  1. 1河南中医药大学骨伤学院,河南省郑州市   450000;2河南省中医院,河南省郑州市   450003
  • 收稿日期:2024-11-06 接受日期:2024-12-14 出版日期:2026-01-28 发布日期:2025-07-10
  • 通讯作者: 曹玉净,博士,教授,河南省中医院创伤骨科,河南省郑州市 450003
  • 作者简介:赵非凡,男,1998年生,河南省周口市人,汉族,河南中医药大学在读硕士,主要从事中医药防治腰髋关节疾病研究。
  • 基金资助:
    河南省中医药管理局科学研究专项课题项目(2024ZYZD06);河南省中医药管理局科学研究专项课题项目(2023ZY1008),项目负责人:曹玉净

An artificial neural network model of ankylosing spondylitis and psoriasis shared genes and machine learning-based mining and validation

Zhao Feifan1, Cao Yujing2   

  1. 1College of Bone Injury, Henan University of Traditional Chinese Medicine, Zhengzhou 450000, Henan Province, China; 2Henan Provincial Hospital of Traditional Chinese Medicine, Zhengzhou 450003, Henan Province, China
  • Received:2024-11-06 Accepted:2024-12-14 Online:2026-01-28 Published:2025-07-10
  • Contact: Cao Yujing, PhD, Professor, Henan Provincial Hospital of Traditional Chinese Medicine, Zhengzhou 450003, Henan Province, China
  • About author:Zhao Feifan, Master candidate, College of Bone Injury, Henan University of Traditional Chinese Medicine, Zhengzhou 450000, Henan Province, China
  • Supported by:
    Henan Provincial Administration of Traditional Chinese Medicine Scientific Research Special Projects, Nos. 2024ZYZD06 and 2023ZY1008 (to CYJ)

摘要:

文题释义

人工神经网络模型:是一种受生物神经网络启发的计算模型,由大量相互连接的“神经元”组成,能够模拟人脑处理信息的方式。人工神经网络广泛应用于模式识别、分类、回归分析等领域,是机器学习和深度学习的重要组成部分。
孟德尔随机化:是一种基于遗传学的因果推断方法,利用基因变异作为工具变量,评估暴露因素(如生活方式或生物标志物)与结果(如疾病风险)之间的因果关系。

摘要
背景:强直性脊柱炎和银屑病的发生及发展密切相关,但其关键基因及调控机制尚不明确。
目的:建立基于GEO数据库的强直性脊柱炎和银屑病共有基因的人工神经网络模型并评价其效果,同时使用孟德尔随机化确定关键基因表达与两病有无因果关系。
方法:从GEO数据库中下载数据集GSE25101(强直性脊柱炎样本和健康对照样本各816例)、GSE30999(银屑病样本和健康对照样本各85例)、GSE73754(52例强直性脊柱炎样本和20例健康对照样本)和GSE14905(33例银屑病样本和49例健康对照样本),以GSE25101和GSE30999分别作为强直性脊柱炎和银屑病的训练数据集,通过差异分析鉴定出各自的差异表达基因从而获得两病共有驱动基因,利用随机森林和支持向量机递归特征消除术两种机器学习方法进一步筛选出关键核心基因,基于关键核心基因构建人工神经网络模型并在外部数据集GSE73754和GSE14905中进行验证,接下来构建相应的列线图以预测疾病的发病率。同时,对强直性脊柱炎和银屑病的免疫浸润结果展开分析。最后,使用孟德尔随机化评估关键基因与疾病之间的因果关系,使用Dgidb数据库分析药物-基因相互作用,从而预测药物靶点。
结果与结论:①在强直性脊柱炎中共获得差异基因61个,在银屑病中共获得差异基因4 309个,取交集后得到8个共有差异基因,进一步通过机器学习筛选得到5个关键基因(DNMT1,GNG11,CDC25B,S100A8及S100A12),并利用关键基因分别构建强直性脊柱炎和银屑病的人工神经网络模型,在训练集GSE25101和GSE30999中AUC值分别为0.979及0.989,在外部验证数据集GSE73754和GSE14905中AUC值分别为0.818及0.874。②基于5个关键基因构建了列线图,校准曲线显示列线图模型的预测概率与理想模型几乎一致。免疫细胞浸润显示,关键基因与活化B细胞、自然杀伤细胞、γδT细胞、滤泡辅助性T细胞、单核细胞、浆细胞样树突状细胞、中性粒细胞等相关。孟德尔随机化结果显示,S100A8是强直性脊柱炎与银屑病发病的危险因素。最后,利用DGIdb筛选得到81种靶向药物,其中,只有甲氨蝶呤(Methotrexate)、阿托吉泮(Atogepant)、乌布吉泮(Ubrogepant)、瑞美吉泮(Rimegepant)、艾普奈珠单抗(Eptinezumab)、阿扎胞苷(Azacitidine)、硒片(Selenium)、羟基脲(Hydroxyurea)、异环磷酰胺(Ifosfamide)、氟尿苷(Floxuridine)、姜黄素(Curcumin)、米托蒽醌(Mitoxantrone)、顺铂(Cisplatin)、三氧化二砷(Arsenic Trioxide)、己烯雌酚(Diethylstilbestrol)及地西他宾(Decitabine)16种药物获得了美国食品药品监督管理局的批准。③国际数据库、欧洲群体的研究成果和数据分析,尤其在基因组学和疾病表型研究方面,已经积累了大量成功的案例,这些经验为中国疾病的流行病学特征、基因多样性及其对环境和生活方式的反应提供了有价值的参考。④文章构建了强直性脊柱炎和银屑病共同驱动基因的人工神经网络模型并得到验证,发现了关键基因与两病发病的因果关系,预测出潜在治疗的靶向药物,希望能为探索其发病机制和治疗方向提供一个新的视角。



中国组织工程研究杂志出版内容重点:人工关节;骨植入物;脊柱;骨折;内固定;数字化骨科;组织工程

关键词: 强直性脊柱炎, 银屑病, 富集分析, 机器学习, 人工神经网络, 交叉验证, 列线图, 免疫细胞浸润, 孟德尔随机化, 药物预测

Abstract: BACKGROUND: Ankylosing spondylitis is closely related to the occurrence and development of psoriasis, but the key genes and regulatory mechanisms are still unclear.
OBJECTIVE: To establish an artificial neural network model of genes shared by ankylosing spondylitis and psoriasis based on the GEO database and evaluate its effect, and also to determine whether there is a causal relationship between the expression of key genes and the two diseases using Mendelian randomization. 
METHODS: Datasets GSE25101 (816 ankylosing spondylitis samples and 816 healthy control samples), GSE30999 (85 psoriasis samples and 85 healthy control samples), GSE73754 (52 ankylosing spondylitis samples and 20 healthy control samples), and GSE14905 (33 psoriasis samples and 49 healthy control samples) were downloaded from the GEO database. GSE25101 and GSE30999 were used as the training datasets of ankylosing spondylitis and psoriasis, respectively, and their respective differentially expressed genes were identified through difference analysis to obtain the common driver genes of the two diseases, and the key core genes were further screened out based on Mendelian randomization. The key core genes were further screened out, and artificial neural network models were constructed based on the key core genes and validated in external datasets GSE73754 and GSE14905, followed by the construction of the corresponding nomogram to predict the incidence rates of the diseases. Also, the results of immune infiltration in ankylosing spondylitis and psoriasis were analyzed. Finally, Mendelian randomization was used to assess causal relationships between key genes and diseases, and drug-gene interactions were analyzed using the Dgidb database to predict drug targets. 
RESULTS AND CONCLUSION: (1) A total of 61 differential genes were obtained in ankylosing spondylitis and 4 309 differential genes were obtained in psoriasis. Eight shared differential genes were obtained after intersection, and five key genes (DNMT1, GNG11, CDC25B, S100A8, and S100A12) were further screened by machine learning. The key genes were utilized to build artificial neural network models of ankylosing spondylitis and psoriasis, with the area under curve values of 0.979 and 0.989 in the training sets GSE25101 and GSE30999, respectively, and 0.818 and 0.874 in the external validation datasets GSE73754 and GSE14905, respectively. (2) Nomogram was constructed based on the five core genes, and the calibration curves showed that the predicted probabilities of the nomogram models were almost the same as that of the ideal model. Immune cell infiltration showed that the key genes were associated with activated B cells, natural killer cells, γδ T cells, follicular helper T cells, monocytes, plasma cell-like dendritic cells, and neutrophils. Mendelian randomization showed that S100A8 was a risk factor for the occurrence of ankylosing spondylitis and psoriasis. Finally, DGIdb screening was utilized to obtain 81 targeted drugs, only 16 of which, including methotrexate, atogepant, ubrogepant, rimegepant, eptinezumab, azacitidine, selenium, hydroxyurea, ifosfamide, floxuridine, curcumin, mitoxantrone, cisplatin, arsenic trioxide, diethylstilbestrol, and decitabine, were approved by the U.S. Food and Drug Administration. (3) A large number of successful cases have been accumulated in international databases, research results and data analysis of European groups, especially in genomics and disease phenotyping studies. These experiences provide valuable references for the epidemiological characterization of diseases in China, genetic diversity and their response to the environment and lifestyle. (4) An artificial neural network model of the common driver genes of ankylosing spondylitis and psoriasis was constructed and validated, the causal relationship between the key genes and the pathogenesis of the two diseases was discovered, and the targeted drugs for potential treatments were predicted, which hopefully provides a new perspective for exploring their pathogenesis and therapeutic directions.

Key words: ankylosing spondylitis, psoriasis, enrichment analysis, machine learning, artificial neural network, cross-validation, nomogram, immune cell infiltration, Mendelian randomization, drug prediction

中图分类号: