Impact of Image Enhancement Methods on Automatic Transcription Trainings with eScriptorium

Pauline Jacsont; Elina Leblanc

doi:10.46298/jdmdh.10262

Pauline Jacsont ; Elina Leblanc - Impact of Image Enhancement Methods on Automatic Transcription Trainings with eScriptorium

jdmdh:10262 - Journal of Data Mining & Digital Humanities, 12 septembre 2023, Documents historiques et reconnaissance automatique de texte - https://doi.org/10.46298/jdmdh.10262

Impact of Image Enhancement Methods on Automatic Transcription Trainings with eScriptoriumArticle

Auteurs : Pauline Jacsont ^1,²; Elina Leblanc ^1,²

This study stems from the Desenrollando el cordel (Untangling the cordel) project, which focuses on 19th-century Spanish prints editing. It evaluates the impact of image enhancement methods on the automatic transcription of low-quality documents, both in terms of printing and digitisation. We compare different methods (binarisation, deblur) and present the results obtained during the training of models with the Kraken tool. We demonstrate that binarisation methods give better results than the other, and that the combination of several techniques did not significantly improve the transcription prediction. This study shows the significance of using image enhancement methods with Kraken. It paves the way for further experiments with larger and more varied corpora to help future projects design their automatic transcription workflow.

https://doi.org/10.46298/jdmdh.10262

Source : HAL:hal-03831686v4

Volume : Documents historiques et reconnaissance automatique de texte

Publié le : 12 septembre 2023

Accepté le : 19 juin 2023

Soumis le : 7 novembre 2022

Mots-clés : [INFO.INFO-TI]Computer Science [cs]/Image Processing [eess.IV], [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], [en] image enhancement methods, binarisation, printed documents, Spanish literature

Licence : Attribution 4.0 International (CC BY 4.0)

Références bibliographiques

1 Document citant cet article

Partager et exporter

Statistiques de consultation

Cette page a été consultée 1481 fois.

Le PDF de cet article a été téléchargé 805 fois.