Logo 知识与财富的链接
基于RoBERTa-WWM的中文电子病历命名实体识别

基于RoBERTa-WWM的中文电子病历命名实体识别

ISSN:1006-2475
2021年第2期
数据库与数据挖掘
基于RoBERTa-WWM的中文电子病历命名实体识别 ZHU Yan,ZHANG Li,WANG Yu

电子病历(EMRs)中包含着丰富的信息,如临床症状、诊断结果和药物疗效.命名实体识别(Named Entity Recognition,NER)旨在从非结构化文本中抽取命名实体,这也是从电子病历中抽取有价值信息的初始步骤.本文提出一种基于预训练模型RoBERTa-WWM (A Robustly Optimized BE...

Electronic Medical Records (EMRs) contain abundant information, such as clinical symptoms, diagnosis results and drug efficacy. Named Entity Recognition (NER) aims to extract named entities from unstructured texts. It is also the initial step to extract valuable information from the EMRs. This paper proposes a method to recognize named entities based on the RoBERTa-WWM (A Robustly Optimized BERT Pre-training Approach-Whole Word  Masking). RoBERTa-WWM is a kind of pre-training model, which is utilized to generate semantic representations with prior knowledge. Compared with BERT (Bidirectional Encoder Representations from Transformers), the semantic representations generated by RoBERTa-WWM are more suitable for Chinese NER task because it masks the whole word during pre-training. The semantic representations are then inputted into Bidirectional Long  Short-Term Memory (BiLSTM) and Conditional Random Field (CRF) models in turn. The experimental results show that this method can effectively improve the F1-score on “China Conference on Knowledge Graph and Semantic Computing 2019 (CCKS 2019)” dataset and improve the performance of NER in Chinese EMRs.

认领
收 藏
点 赞
认领进度
0 %

发表评论

ISSN:1006-2475
2021年第2期
数据库与数据挖掘

用户信息设置