Generic HTR Models for Medieval Manuscripts. The CREMMALab Project

Ariane Pinche

doi:10.46298/jdmdh.10252

Ariane Pinche - Generic HTR Models for Medieval Manuscripts. The CREMMALab Project

jdmdh:10252 - Journal of Data Mining & Digital Humanities, 16 octobre 2023, Documents historiques et reconnaissance automatique de texte - https://doi.org/10.46298/jdmdh.10252

Generic HTR Models for Medieval Manuscripts. The CREMMALab ProjectArticle

Auteurs : Ariane Pinche ^1,²

In the Humanities, the emergence of digital methods has opened up research questions to quantitative analysis. This is why HTR technology is increasingly involved in humanities research projects following precursors such as the Himanis project. However, many research teams have limited resources, either financially or in terms of their expertise in artificial intelligence. It may therefore be difficult to integrate handwritten text recognition into their project pipeline if they need to train a model or to create data from scratch. The goal here is not to explain how to build or improve a new HTR engine, nor to find a way to automatically align a preexisting corpus with an image to quickly create ground truths for training. This paper aims to help humanists easily develop an HTR model for medieval manuscripts, create and gather training data by knowing the issues underlying their choices. The objective is also to show the importance of the constitution of consistent data as a prerequisite to allow their gathering and to train efficient HTR models. We will present an overview of our work and experiment in the CREMMALab project (2021-2022), showing first how we ensure the consistency of the data and then how we have developed a generic model for medieval French manuscripts from the 13 th to the 15 th century, ready to be shared (more than 94% accuracy) and/or fine-tuned by other projects.

https://doi.org/10.46298/jdmdh.10252

Source : HAL:hal-03837519v4

Volume : Documents historiques et reconnaissance automatique de texte

Publié le : 16 octobre 2023

Accepté le : 29 mars 2023

Soumis le : 3 novembre 2022

Mots-clés : [SHS.HIST]Humanities and Social Sciences/History, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [INFO.INFO-MO]Computer Science [cs]/Modeling and Simulation, [en] HTR, model, dataset, medieval, text, transcription

Licence : Attribution - Pas d’Utilisation Commerciale 4.0 International (CC BY-NC 4.0)

Ariane Pinche - Generic HTR Models for Medieval Manuscripts. The CREMMALab Project

Références bibliographiques

2 Documents citant cet article

Partager et exporter

Statistiques de consultation