Logo 知识与财富的链接
口语对话中的代词指代消解

口语对话中的代词指代消解

ISSN:1000-9825
2011年第22卷第2期
模式识别与人工智能
费仲超1,周雅倩2,黄萱菁2,吴立德2 FEI Zhong-Chao[1],ZHOU Ya-Qian[2],HUANG Xuan-Jing[2],WU Li-De[2]
  1. 复旦大学,计算机科学技术学院,上海,200433;上海贝尔股份有限公司,产品线战略及技术领先部,上海,201206
  2. 复旦大学,计算机科学技术学院,上海,200433
FEI Zhong-Chao1,2,ZHOU Ya-Qian1,HUANG Xuan-Jing1,WU Li-De1 1(School of Computer Science,Fudan University,Shanghai 200433,China) 2(Portfolio Strategy and Technology Leadership CTO Group,Alcatel-Lucent Shanghai Bell,Shanghai 200433,China)

提出一套分为两步的代词指代消解算法,算法不需要人工清洗语料及预定义规则.算法第1步采用一些新特征和机器学习算法对名词性指代代词和非名词性指代(non-anaphoric)代词分类,第2步分别对两类代词进行消解.针对名词性代词指代消解,提出了适用于口语对话的特征抽取及表示方法,如代词和候选先行词的距离、语法、语义等的抽取和表示方法,然后通过综合这些特征来选择先行词.针对非名词性指代,将右边界规则(right frontier rule)改进为可以在口语对话中自动抽取的形式,并根据该规则选择先行项.在Byron于2004年发布的语料上测试,消解正确率达到77.0%.召回率达到66.0%.与Byron的工作相比,该方法在保证系统能够自动完成的同时还提高了消解性能.

This paper presents a two-stage pronoun resolution algorithm. It does not need to clean the testing corpus and predefine patterns manually. In the first stage of the algorithm, some new features and machine learning methods are used to classify pronouns into anaphoric and non-anaphoric ones. In the second stage, these two kinds of pronouns are resolved respectively. For the anaphoric ones, some methods are presented to extract distance, syntactic, and semantic features etc. For the non-anaphoric ones, the Right Frontier Rule is improved to do the resolution work. While testing the corpus published by Byron in 2004, this algorithm achieves a precision of 77.0% and a recall of 66.0%. Compared with the work of Byron, the algorithm is fully automatic, and the results are much better.

认领
收 藏
点 赞
认领进度
0 %

发表评论

ISSN:1000-9825
2011年第22卷第2期
模式识别与人工智能

用户信息设置