中国组织工程研究 ›› 2018, Vol. 22 ›› Issue (20): 3237-3242.doi: 10.3969/j.issn.2095-4344.0302

• 血管组织构建 vascular tissue construction • 上一篇    下一篇

基于双向LSTM神经网络电子病历命名实体的识别模型

杨红梅1,李  琳2,杨日东1,周  毅1,2   

  1. 1中山大学中山医学院,广东省广州市  510080;2新疆医科大学,新疆维吾尔自治区乌鲁木齐市  830011
  • 收稿日期:2018-03-13 出版日期:2018-07-18 发布日期:2018-07-18
  • 通讯作者: 周毅,博士,副教授,中山大学中山医学院,广东省广州市 510080;新疆医科大学,新疆维吾尔自治区乌鲁木齐 830011
  • 作者简介:杨红梅,女,1993年生,云南省楚雄市人,汉族,中山大学在读硕士,主要从事医学信息抽取,医学数据挖掘研究。
  • 基金资助:

    国家重点研发计划精准医学专项基金项目(2016YFC0901602);NSFC-广东大数据科学中心联合基金项目 (U1611261);广东省前沿与关键技术创新专项基金项目(2014B010118003);广州市2017年产学研协同创新重大专项(201604016136);广州市健康医疗协同创新重大专项(201604020016)

Named entity recognition based on bidirectional long short-term memory combined with case report form

Yang Hong-mei1, Li Lin2, Yang Ri-dong1, Zhou Yi1, 2   

  1. 1Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, Guangdong Province, China; 2Xinjiang Medical University, Urumqi 830011, Xinjiang Uygur Autonomous Region, China
  • Received:2018-03-13 Online:2018-07-18 Published:2018-07-18
  • Contact: Zhou Yi, M.D., Associate professor, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, Guangdong Province, China; Xinjiang Medical University, Urumqi 830011, Xinjiang Uygur Autonomous Region, China
  • About author:Yang Hong-mei, Master candidate, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, Guangdong Province, China
  • Supported by:

    the National Key Research & Development Precision Medicine Project of China, No. 2016YFC0901602; the Project of NSFC-Guangdong Big Data Science Center, No. U1611261; the Advanced and Key Technology Innovation Project of Guangdong Province, No. 2014B010118003; the Major University-Industry Cooperation Innovation Project of Guangzhou in 2017, No. 201604016136; the Major Project of Health Medicine Cooperation Innovation Project of Guangzhou, No. 201604020016

摘要:

文章快速阅读:
文题释义:
命名实体识别:又称作“专名识别”,是指识别文本中具有特定意义的实体。命名实体识别是信息提取的重要基础工具,在自然语言处理技术走向实用化的过程中占有重要地位。
语言模型:语言模型是用来计算一个句子(或者词序列)的概率的模型。一个长度为n的句子W可以用词序列w1,w2,…,wn表示。那语言模型就是求这个词序列W的概率P(W)=P(w1,w2,…,wn)。
电子病历:是在临床治疗过程中产生的,由医务人员撰写的描述患者医疗活动的记录,包括患者所患疾病、症状、检查、治疗、检查结果以及所发生的时间。这些信息相互联系,是患者的身体状况和医疗知识的体现,可以支持临床辅助决策系统、精准医学研究和疾病监控等应用。
摘要
背景
:电子病历数据是医疗领域大数据的重要源头,是医学知识的体现。电子病历是患者就医过程的记录,是临床辅助决策系统、精准医学研究和疾病监控等应用的重要数据支撑。
目的:研究电子病历的信息抽取技术,提取中文电子病历中的重要医学实体,支持肝细胞癌的知识发现。
方法:数据集来自广东省某三甲医院的电子病历数据库。共收集了240例患有肝细胞癌的病历记录(18 542个句子),包括入院记录和出院小结。按照预先定义的标准进行标注。随机抽取180例患者病历(13 839个句子)进行训练,并保留60个病例记录(4 703个句子)作为测试集。利用双向的LSTM网络结合CRF训练命名实体识别模型。在测试数据集上评估NER系统的性能,并计算出严格匹配的准确率、召回率和F1值。
结果与结论:对测试数据集的评估表明,入院记录中实体识别F1值为0.853 5,出院小结中实体识别的F1值为0.726 5,总体F1值为0.805 2。研究实现了电子病历文本自动命名实体识别模型,下一步的研究重点将改进实体抽取的准确率。

中国组织工程研究杂志出版内容重点:组织构建;骨细胞;软骨细胞;细胞培养;成纤维细胞;血管内皮细胞;骨质疏松组织工程
ORCID: 0000-0002-3254-0429(杨红梅)

关键词: 电子病历, 命名实体识别, BiLSTM, CRF, 组织构建

Abstract:

BACKGROUND: Electronic medical record (EMR) is an important source of medical source, reflecting medical knowledge. There are patient clinical features in EMR, which enables decision support system and precision medicine.
OBJECTIVE: To extract important medical entities of EMR using information extraction, and to discover hepatocellular carcinoma knowledge.
METHODS: The EMR database of a Grade-A Tertiary hospital in Guangdong Province was used. We retrieved clinical records (18 542 sentences) of 240 patients suffering from hepatocellular carcinoma, including admission notes and discharge summaries. The records were remarked according to the predetermined standards. Totally 180 patients’ records (13 839 sentences) were selected randomly for training and 60 patients’ records (4 703 sentences) were remained for testing. Bidirectional long short-term memory combined with case report form was used to identify the model. The performance of NER systems was evaluated on the test datasets, and precision, recall, F1 of strict matching were caculated.
RESULTS AND CONCLUSION: Evaluation on the dataset showed that an F1-measure of 0.853 5 was for admission, F1-measure of 0.726 5 was for the discharge summaries, and an overall F1-measure was 0.805 2. In this study, we have achieved the auto-name entity identification model of EMR, but the accuracy of entity extraction needs further investigation.

中国组织工程研究杂志出版内容重点:组织构建;骨细胞;软骨细胞;细胞培养;成纤维细胞;血管内皮细胞;骨质疏松组织工程

Key words: Medical Records Systems, Computerized, Neural Networks (Computer), Liver Neoplasms, Tissue Engineering

中图分类号: