Historical Documents and Automatic Text Recognition: Introduction

Ariane Pinche; Peter Stokes

doi:10.46298/jdmdh.13247

Ariane Pinche ; Peter Stokes - Historical Documents and Automatic Text Recognition: Introduction

jdmdh:13247 - Journal of Data Mining & Digital Humanities, 19 mars 2024, Documents historiques et reconnaissance automatique de texte - https://doi.org/10.46298/jdmdh.13247

Historical Documents and Automatic Text Recognition: IntroductionArticle

Auteurs : Ariane Pinche ^1,²; Peter Stokes ^3,⁴

With this special issue of the Journal of Data Mining and Digital Humanities (JDMDH), we bringtogether in one single volume several experiments, projects and reflections related to automatic textrecognition applied to historical documents. More and more research projects now include automatic text acquisition in their data processing chain, and this is true not only for projects focussed on Digital or Computational Humanities but increasingly also for those that are simply using existing digital tools as the means to an end. The increasing use of this technology has led to an automation of tasks that affects the role of the researcher in the textual production process. This new data-intensive practice makes it urgent to collect and harmonise the corpora necessary for the constitution of training sets, but also to make them available for exploitation. This special issue is therefore an opportunity to present articles combining philological and technical questions to make a scientific assessment of the use of automatic text recognition for ancient documents, its results, its contributions and the new practices induced by its use in the process of editing and exploring texts. We hope that practical aspects will be questioned on this occasion, while raising methodological challenges and its impact on research data.The special issue on Automatic Text Recognition (ATR) is therefore dedicated to providing a comprehensive overview of the use of ATR in the humanities field, particularly concerning historical documents in the early 2020s. This issue presents a fusion of engineering and philological aspects, catering to both beginners and experienced users interested in launching projects with ATR. The collection encompasses a diverse array of approaches, covering topics such as data creation or collection for training generic models, reaching specific objectives, technical and HTR machine architecture, segmentation methods, and image processing.

https://doi.org/10.46298/jdmdh.13247

Source : HAL:hal-04508874v1

Volume : Documents historiques et reconnaissance automatique de texte

Publié le : 19 mars 2024

Accepté le : 19 mars 2024

Soumis le : 19 mars 2024

Mots-clés : [SHS.HIST]Humanities and Social Sciences/History, [INFO]Computer Science [cs], [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [INFO.INFO-MO]Computer Science [cs]/Modeling and Simulation, [SHS.HIST]Humanities and Social Sciences/History, [SHS.LITT]Humanities and Social Sciences/Literature, [en] ATR, eScriptorium, Kraken, HTR-United, SegmOnto

Licence : Attribution 4.0 International (CC BY 4.0)

Financement :

Source : HAL

Biblissima+, Observatoire des cultures écrites anciennes, de l'argile à l'imprimé; Financeur: French National Research Agency (ANR); Code: ANR-21-ESRE-0005

Datasets

Référence

Hodel, T., Schoch, D., & Dängeli, P. (2021). Handwritten Text Recognition Ground Truth Set: StABS Ratsbücher O10, Urfehdenbuch X (Version 1.0) [Dataset]. Zenodo. 10.5281/ZENODO.5153262 ¹

Hodel, T., Schoch, D., & Dängeli, P. (2021). Handwritten Text Recognition Ground Truth Set: StABS Ratsbücher O10, Urfehdenbuch X (Version 1.0) [Dataset]. Zenodo. 10.5281/ZENODO.5153263 ¹

Clérice, T. (2022). YALTAi: Segmonto Manuscript and Early Printed Book Dataset (Version 1.0.0) [Dataset]. Zenodo. 10.5281/ZENODO.6814769 ¹

Clérice, T. (2022). YALTAi: Segmonto Manuscript and Early Printed Book Dataset (Version 1.0.0) [Dataset]. Zenodo. 10.5281/ZENODO.6814770 ¹

Jacsont, P. (2022). Toponomasia : edition of cod. 174 of Bern Burgerbibliothek. (Version v.1.0) [Dataset]. Zenodo. 10.5281/ZENODO.7026584 ¹

Jacsont, P. (2022). Toponomasia : edition of cod. 174 of Bern Burgerbibliothek. (Version v.1.0) [Dataset]. Zenodo. 10.5281/ZENODO.7026585 ¹

Matthias Gille Levenson. (2023). Towards a general open dataset and model for late medieval Castilian text recognition (HTR/OCR). Datasets and scripts (Version v2) [Dataset]. Zenodo. 10.5281/ZENODO.7386489 ¹

Levenson, M. G. (2022). Towards a general open dataset and model for late medieval Castilian text recognition (HTR/OCR). Datasets and scripts (Version v1.01) [Dataset]. Zenodo. 10.5281/ZENODO.7389195 ¹

Torres Aguilar, S., & Jolivet, V. (2023). Dataset and evaluation for HTR models for Latin and French Medieval Documentary Manuscripts (Version 0.1) [Dataset]. Zenodo. 10.5281/ZENODO.7401832 ¹

Torres Aguilar, S., & Jolivet, V. (2023). Dataset and evaluation for HTR models for Latin and French Medieval Documentary Manuscripts (Version 0.1) [Dataset]. Zenodo. 10.5281/ZENODO.7401833 ¹

Torres Aguilar, S., & Jolivet, V. (2023). HTR model for Latin and French Medieval Documentary Manuscripts (12th-15th) (Version 1) [Dataset]. Zenodo. 10.5281/ZENODO.7547438 ¹

Perdiki, E. (2023). List of manuscripts containing John Chrysostom’s Homilies and the relevant manual transcriptions (Versions 1.2) [Dataset]. Zenodo. 10.5281/ZENODO.7681132 ¹

Perdiki, E. (2023). List of manuscripts containing John Chrysostom’s Homilies and the relevant manual transcriptions (Version 1) [Dataset]. Zenodo. 10.5281/ZENODO.7681133 ¹

Perdiki, E. (2023). List of manuscripts containing John Chrysostom’s Homilies and the relevant manual transcriptions (Versions 1.2) [Dataset]. Zenodo. 10.5281/ZENODO.8102662 ¹

1 ScholeXplorer

Références bibliographiques

2 Documents citant cet article

Partager et exporter

Statistiques de consultation

Cette page a été consultée 2143 fois.

Le PDF de cet article a été téléchargé 1745 fois.