A Named Entity Recognition Model for Medieval Latin Charters

本文详细介绍了一种基于中世纪拉丁语源的自动命名实体识别模型的实现,并对其在不同数据集上的鲁棒性进行了测试。

作者:

Pierre Chastang, UVSQ-Université Paris-Saclay
Sergio Torres Aguilar , UVSQ-Université Paris-Saclay
Xavier Tannier , Sorbonne Université

转载来源:Digital Humanities Quarterly, 2021, Volume 15 Number 4,http://www.digitalhumanities.org/dhq/vol/15/4/000574/000574.html

命名实体识别(NER)是一种具有优势的技术,在数字人文领域的存在越来越多。理论上,命名实体的自动检测和恢复可以提供在已编辑源中查找未编辑信息的新方法,并允许在短时间内解析大量数据以支持历史假设。本文详细介绍了一种基于中世纪拉丁语源的自动命名实体识别模型的实现,并对其在不同数据集上的鲁棒性进行了测试。在9到14世纪的勃艮第外交宪章的巨大数据集上训练了不同的模型,并通过使用巴黎、英国、意大利和西班牙宪章的短集测试的通用和世纪特别模型来验证。我们在每个案例中展示交叉验证的结果,并讨论这些结果对中世纪地名和人名历史的影响。

Pierre Chastang 

Pierre Chastang is a full professor of medieval history at UVSQ-Université Paris-Saclay. His main fields of research are the culture of writing and written heritage in the Middle Ages. For the past ten years, he has been developing interdisciplinary research projects involving history, physical-chemical analysis of ancient materials, and text mining concerning medieval pragmatic writing.

Sergio Torres Aguilar 

Sergio Torres Aguilar is a postdoctoral research fellow at the École nationale des Chartes in Paris. He received the doctoral degree on history in 2019 from Paris-Saclay university. His research focuses on machine learning methods applied to historical sources, natural languages processing to ancient languages and handwritten text recognition for medieval manuscripts.

Xavier Tannier 

Xavier Tannier is a full professor of computer science at Sorbonne University, Paris. He conducts his research in the laboratory for medical informatics and knowledge engineering for e-health (LIMICS). His main research topics concern natural language processing and information retrieval and extraction.

zh_CNChinese