Logo 知识与财富的链接
基于mBART的农作物命名实体规范化研究

基于mBART的农作物命名实体规范化研究

ISSN:1000-1298
2025年第56卷第7期
农业信息化工程
胡玉雪,黄仲强,王同官,苏东宇,申余丰,沙灜 HU Yuxue,HUANG Zhongqiang,WANG Tongguan,SU Dongyu,SHEN Yufeng,SHA Ying

由于地域、文化差异,农业文本中实体名称混乱,使得自动识别和提取信息变得复杂,限制了农业信息化发展。为提高农业信息提取效率,本文提出了基于mBART的农业命名实体规范化方法mJoint。首先,基于农业领域专家的知识经验,构建了一个以农作物为主的农业文本数据集,涵盖豆类、谷物和油料三大农作物,共包含22440条高质量的农业标注数据。其次,农业实体规范化问题涉及农业非规范化实体的检测与识别2个问题,本文提出基于mBART的统一生成式框架来联合检测、识别出农业非规范实体,直接完成农业命名实体规范化任务。为了提高农业实体规范化效果,在模型中额外引入农业非规范实体检测和农业非规范实体识别2个辅助任务。最后,在提出的农作物数据集上进行大量实验,结果表明,本文提出的mJoint在农业命名实体规范化任务上的P、R与F1值都达到0.99以上,相较于其他对比方法,各项指标均为最优。与大语言模型相比,本文提出的方法同样具有显著优势。

Due to geographical or cultural differences, the entity names in agricultural texts are confused, which makes automatic identification and extraction of information complicated and limits the development of agricultural informatization. In view of this, an agricultural entity normalization method based on mBART was proposed. Firstly, based on the knowledge and experience of experts in the agricultural field, a crop-oriented agricultural text dataset was constructed, covering the three major crops of “legumes”, “cereals” and “oil crops”, with a total of 22440 pieces of high-quality agricultural labeling data. Secondly, the problem of agricultural entity normalization involved the detection and identification of non-normalized agricultural entities. A unified generative framework was proposed based on mBART to jointly detect and identify agricultural non-normalized entities and directly complete the task of normalizing agricultural named entities. Furthermore, in order to improve the normalization effect of agricultural entities, auxiliary tasks of agricultural non-normalized entity detection and agricultural non-normalized entity recognition were additionally introduced into the model. Finally, extensive experiments were conducted on the proposed crop dataset. The results showed that the proposed method achieved P, R, and F1 above 0.99 in the task of agricultural entity normalization, and all indexes were optimal compared with other methods. Compared with the large language models, the proposed method also had significant advantages.

认领
收 藏
点 赞
认领进度
0 %

发表评论

ISSN:1000-1298
2025年第56卷第7期
农业信息化工程

用户信息设置