中国组织工程研究 ›› 2012, Vol. 16 ›› Issue (33): 6206-6210.doi: 10.3969/j.issn.2095-4344.2012.33.025

• 组织构建基础实验 basic experiments in tissue construction • 上一篇    下一篇

应用普通变异位点的主成分分析法探讨群体结构

杨 铮,华 琳,刘 红   

  1. 首都医科大学,北京市 100069:刘红,副教授,首都医科大学,北京市 100069
  • 收稿日期:2011-12-12 修回日期:2012-02-23 出版日期:2012-08-12 发布日期:2012-08-12
  • 通讯作者: 华琳,讲师,首都医科大学,北京市 100069 并列通讯作者:刘红,副教授,首都医科大学,北京市 100069
  • 作者简介:杨铮★,男,1986年生,北京市人,汉族,首都医科大学在读硕士,主要从事生物物理学研究。 yzh31350@163.com

Exploring population structure using principal components analysis for common variants

Yang Zheng, Hua Lin, Liu Hong   

  1. Capital University of Medical Sciences, Beijing 100069, China
  • Received:2011-12-12 Revised:2012-02-23 Online:2012-08-12 Published:2012-08-12
  • Contact: Hua Lin, Lecturer, Capital University of Medical Sciences, Beijing 100069, China Liu Hong, Associate professor, Capital University of Medical Sciences, Beijing 100069, China
  • About author:Yang Zheng★, Studying for master’s degree, Capital University of Medical Sciences, Beijing 100069, China yzh31350@163.com

摘要:

背景:普通变异位点在群体分层和群体结构研究中有重要的作用。
目的:应用普通变异位点数据采用主成分分析法探讨群体结构。
方法:采用主成分分析法对普通变异位点单核苷酸多态数据提取第一和第二主成分,同时采用随机森林算法对7组不同人群进行分类。此外分别对与第一主成分和第二主成分相关联的基因,作KEGG pathway和Gene ontology 上的富集分析,挖掘基因功能。
结果与结论:结果发现结合主成分分析和随机森林分类算法使得群体分层的准确率高达99.6%。这说明群体间等位基因频率的差异可以用来识别群体结构。此外,还发现通过主成分提取的基因具有一定的功能聚集性。由此证实实验提供的方法可以用来指导分子生物学研究和探讨基因功能。

关键词: 主成分分析, 单核苷酸多态, 基因功能, 随机森林, 普通位点, 群体分层

Abstract:

BACKGROUND: Common variants play important roles in the population stratification and population structure studies.
OBJECTIVE: To explore population structure using principal components analysis for common variants.
METHODS: In this study, we extracted the first two principal components from the common variants, and performed classification to seven populations using random forest algorithm. In addition, we mined gene function by performing KEGG pathway and Gene Ontology enrichment analysis to those genes showing the highest loading in the first two principal components, respectively.
RESULTS AND CONCLUSION: The results showed that combining principal components analysis and random forest could improve the classification correct rate to 99.6%, suggesting that allele frequency differences between populations can be used to identify population structure. In addition, we also found the genes extracted by principal components analysis showed a certain functional aggregation, which approved that the methods of the present study could be used to direct molecule biology research and explore gene function.

中图分类号: