中国组织工程研究 ›› 2011, Vol. 15 ›› Issue (17): 3191-3195.doi: 10.3969/j.issn.1673-8225.2011.17.036

• 骨与关节学术探讨 academic discussion of the bone and joint • 上一篇    下一篇

随机森林滑动窗法探查类风湿疾病单核苷酸多态性及在上位显性交互研究中的应用

闫璐颖1,华  琳1,闫  岩2   

  1. 1微创医疗器械(上海)有限公司,上海市 201203
    2首都医科大学生物医学工程学院,北京市  100069
  • 收稿日期:2010-11-04 修回日期:2011-01-20 出版日期:2011-04-23 发布日期:2011-04-23
  • 通讯作者: 闫岩,硕士,副教授,首都医科大学生物医学工程学院,北京市 100069 yy2703@163.com
  • 作者简介:闫璐颖,女,1981年生,北京市人,汉族,2004年首都医科大学毕业,中级工程师,主要从事生物数据挖掘的研究。 lyyan@microport.com
  • 基金资助:

    首都医科大学基础-临床科研合作基金(09JL33)。

Single nucleotide polymorphism and application of epistatic dominance communication in rheumatoid disease explored by random forest sliding window

Yan Lu-ying1, Hua Lin1, Yan Yan1   

  1. 1Microport Medical (Shanghai) Co., Ltd., Shanghai  201203, China
    2Institute of Biomedical Engineering, Capital Medical University, Beijing  100069, China
  • Received:2010-11-04 Revised:2011-01-20 Online:2011-04-23 Published:2011-04-23
  • Contact: Yan Yan, Master, Associate professor, Institute of Biomedical Engineering, Capital Medical University, Beijing 100069, China yy2703@163.com
  • About author:Yan Lu-ying, Intermediate engineer, Microport Medical (Shanghai) Co., Ltd., Shanghai 201203, China lyyan@microport.com
  • Supported by:

    Capital Medical University Basis-Clinical Research Cooperation Fund, No.09JL35*

摘要:

背景:类风湿性关节炎是一种复杂的多基因遗传疾病。很多传统的遗传学方法难以分析高通量的类风湿病数据,从而挖掘出与疾病相关的遗传标记。
目的:采用数据挖掘方法,提取与类风湿病关联的新的致病“靶点”。
方法:采用随机森林滑动窗方法探查类风湿疾病的单核苷酸多态性,先将全部单核苷酸多态性按照它们的基尼指数进行排序。然后通过滑动窗,每增加1个单核苷酸多态性,就将这些单核苷酸多态性作为分类变量计算袋外样本的分类错误率,将分类错误率达到最低时的单核苷酸多态性作为特征单核苷酸多态性。还采用多态性互作算法研究了特征单核苷酸多态性的上位显性交互及三向交互。
结果与结论:结果发现由该方法识别的特征单核苷酸多态性有不少都得到了文献的验证,证实和疾病有关。同时,特征单核苷酸多态性的交互作用分析为疾病的研究提供了理论依据。

关键词: 随机森林, 滑动窗, 单核苷酸多态性, 类风湿, 分类, 交互

Abstract:

BACKGROUND: Rheumatoid Arhtritis (RA) is a complex polygene genetic disease. The traditional genetics is difficult to analyze high throughput rheumatoid disease (RD) data, and to identify the genetic markers associated with disease.
OBJECTIVE: To abstract the new target genes associated with RA by data mining method.
METHODS: Single nucleotide polymorphism (SNP) of RD was explored by random forest sliding window. First, gini importance of each SNP was sorting from the largest to the smallest. In the following step, a sliding window sequential forward algorithm that added one SNP at a time was applied to construct a subset of SNPs, which was used to compute the classification error rate of out of bag (OOB) as categorical variables set. We filtered a set of feature SNPs, which could minimize the classification error. Furthermore, we applied polymorphism interaction analysis (PIA) algorithm to explore two-way and three-way interactions among feature SNPs.
RESULTS AND CONCLUSION: The results showed that many of feature SNPs associated with RA are validated by previous reports. In addition, the interaction analysis results might provide some theoretical basis for the research of disease.

中图分类号: