Edited by Ariane Pinche (CIHAM, CNRS) and Peter Stokes (AOROC, École Pratique des Hautes Études – Université PSL)
With this special issue of the Journal of Data Mining and Digital Humanities (JDMDH), we bring together in one single volume several experiments, projects and reflections related to automatic text recognition on Historical documents.
Many projects now include automatic text acquisition in their data processing chain. The integration of this technology into increasingly powerful processing chains has led to an automation of tasks that affects the role of the researcher in the textual production process. This new data-intensive practice makes it urgent to collect and harmonise the corpora necessary for the constitution of training sets, but also to
make them available for exploitation. This issue is an opportunity to propose articles combining philological and technical questions to make a scientific assessment of the use of automatic text recognition for ancient documents, its results, its contributions and the new practices induced by its use in the process
of editing and exploring texts. We hope that practical aspects will be questioned on this occasion, while raising methodological challenges and its impact on research data.
The special issue on Automatic Text Recognition (ATR) is dedicated to providing a comprehensive overview of the use of ATR in the humanities field, particularly concerning historical documents in the early 2020s. This issue presents a fusion of engineering and philological aspects, catering to both beginners and experienced users interested in launching projects with ATR. The collection encompasses a diverse array of approaches, covering topics such as data creation or collection for training generic models, reaching specific objectives, technical and HTR machine architecture, segmentation methods, and image processing.
PINCHE, Ariane, STOKES, Peter A., « Historical Documents and Automatic Text Recognition: Introduction », https://doi.org/10.46298/jdmdh.13247
COUTURE, Béatrice, VERRET, Farah, GOHIER, Maxime [et al.], « The challenges of HTR model training: Feedbacks from the project Donner le goût de l’archive à l'ère numérique », https://jdmdh.episciences.org/12556.
CALVELLI, Lorenzo, BOSCHETTI, Federico et TOMMASI, Tatiana, « EpiSearch. Identifying Ancient Inscriptions in Epigraphic Manuscripts », https://doi.org/10.46298/jdmdh.10417
ROMEIN, C. Annemieke, HODEL, Tobias, GORDIJN, Femke, [et al.], « Exploring Data Provenance in Handwritten Text Recognition Infrastructure: Sharing and Reusing Ground Truth Data, Referencing Models, and Acknowledging Contributions. Starting the Conversation on How We Could Get It Done », https://doi.org/10.46298/jdmdh.10403
PERDIKI, Elpida, « Preparing Big Manuscript Data for Hierarchical Clustering with Minimal HTR Training », https://doi.org/10.46298/jdmdh.10419
LEVENSON GILLE, Matthias, « Towards a general open dataset and model for late medieval Castilian text recognition (HTR/OCR). Paper », https://doi.org/10.46298/jdmdh.10416
PINCHE, Ariane, « Generic HTR Models for Medieval Manuscripts The CREMMALab Project », https://jdmdh.episciences.org/11592
AGUILAR, Sergio Torres et JOLIVET, Vincent, « Handwritten Text Recognition for Documentary Medieval Manuscripts », https://doi.org/10.46298/jdmdh.10484
CLÉRICE, Thibault, « You Actually Look Twice At it (YALTAi): using an object detection approach instead of region segmentation within the Kraken engine », https://doi.org/10.46298/jdmdh.9806
JACSONT, Pauline et LEBLANC, Elina, « Impact of Image Enhancement Methods on HTR Trainings with eScriptorium », https://doi.org/10.46298/jdmdh.10262
WEST, Graham, SWINDALL, Matthew I., KEENER, Ben, [et al.], « An Approach for Noisy, Crowdsourced Datasets Utilizing Ensemble Modeling, “Human Softmax” Distributions, and Entropic Measures of Uncertainty », https://doi.org/10.46298/jdmdh.10297
Boschetii F., episearch-htr. Published online November 23, 2022. Accessed July 27, 2023. https://github.com/vedph/episearch-htr
Clérice T. YALTAi: Segmonto Manuscript and Early Printed Book Dataset. Published online July 10, 2022. doi:10.5281/zenodo.6814770
Hodel T, Schoch D, Dängeli P. Handwritten Text Recognition Ground Truth Set: StABS Ratsbücher O10, Urfehdenbuch X. Published online August 2, 2021. doi:10.5281/zenodo.5153263
Jacsont P. Toponomasia : edition of cod. 174 of Bern Burgerbibliothek. Published online July 26, 2022. doi:10.5281/zenodo.7026585
Levenson MG. Towards a general open dataset and model for late medieval Castilian text recognition (HTR/OCR). Datasets and scripts. Published online December 1, 2022. doi:10.5281/zenodo.7389195
Perdiki E. List of manuscripts containing John Chrysostom’s Homilies and the relevant manual transcriptions. Published online February 27, 2023. doi:10.5281/zenodo.7681133
Pinche A, Gabay S, Leroy N, Christensen K. Données HTR incunables du 15e siècle. Published online March 22, 2023. Accessed July 27, 2023. https://github.com/Gallicorpora/HTR-incunable-15e-siecle
Pinche A, Gabay S, Leroy N, Christensen K. Données HTR manuscrits du 15e siècle. Published online March 22, 2023. Accessed July 27, 2023. https://github.com/Gallicorpora/HTR-MSS-15e-Siecle
Pinche A. Cremma Medieval. Published online June 2022. Accessed July 27, 2023. https://github.com/HTR-United/cremma-medieval
Torres Aguilar S, Jolivet V. Dataset and evaluation for HTR models for Latin and French Medieval Documentary Manuscripts. Published online January 10, 2023. doi:10.5281/zenodo.7401833
Torres Aguilar S, Jolivet V. HTR model for Latin and French Medieval Documentary Manuscripts (12th-15th). Published online January 18, 2023. doi:10.5281/zenodo.7547438
Journal of Data Mining and Digital Humanities is an open-access peer-reviewed journal with first draft as pre-print in arxiv or HAL and peer-review post-pblication.
Contact : ariane.pinche@cnrs.fr