You Actually Look Twice At it (YALTAi): using an object detection approach instead of region segmentation within the Kraken engine

Thibault Clérice

doi:10.46298/jdmdh.9806

Thibault Clérice - You Actually Look Twice At it (YALTAi): using an object detection approach instead of region segmentation within the Kraken engine

jdmdh:9806 - Journal of Data Mining & Digital Humanities, 26 décembre 2023, Documents historiques et reconnaissance automatique de texte - https://doi.org/10.46298/jdmdh.9806

You Actually Look Twice At it (YALTAi): using an object detection approach instead of region segmentation within the Kraken engineArticle

Auteurs : Thibault Clérice ^1,^2,^3,^4,⁵

Layout Analysis (the identification of zones and their classification) is the first step along line segmentation in Optical Character Recognition and similar tasks. The ability of identifying main body of text from marginal text or running titles makes the difference between extracting the work full text of a digitized book and noisy outputs. We show that most segmenters focus on pixel classification and that polygonization of this output has not been used as a target for the latest competition on historical document (ICDAR 2017 and onwards), despite being the focus in the early 2010s. We propose to shift, for efficiency, the task from a pixel classification-based polygonization to an object detection using isothetic rectangles. We compare the output of Kraken and YOLOv5 in terms of segmentation and show that the later severely outperforms the first on small datasets (1110 samples and below). We release two datasets for training and evaluation on historical documents as well as a new package, YALTAi, which injects YOLOv5 in the segmentation pipeline of Kraken 4.1.

https://doi.org/10.46298/jdmdh.9806

Source : HAL:hal-03723208v4

Volume : Documents historiques et reconnaissance automatique de texte

Publié le : 26 décembre 2023

Accepté le : 20 décembre 2023

Soumis le : 19 juillet 2022

Mots-clés : [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], [SHS]Humanities and Social Sciences, [en] Kraken, Layout segmentation, Yolo, Htr, Ocr, Object detection, Historical document

Licence : Attribution 4.0 International (CC BY 4.0)

Datasets

Référence

Martens, M., Willighagen, E., & Evelo, C. (2025). Adverse Outcome Pathway Wiki RDF (Version 2025-10) [Dataset]. Zenodo. 10.5281/ZENODO.17240212 ¹

Martens, M., Willighagen, E., & Evelo, C. (2023). Adverse Outcome Pathway Wiki RDF (Version 2023-04) [Dataset]. Zenodo. 10.5281/ZENODO.8297026 ¹

Martens, M., Willighagen, E., & Evelo, C. (2023). Adverse Outcome Pathway Wiki RDF (Version 2023-04) [Dataset]. Zenodo. 10.5281/ZENODO.8297025 ¹

Clérice, T. (2022). YALTAi: Segmonto Manuscript and Early Printed Book Dataset (Version 1.0.0) [Dataset]. Zenodo. 10.5281/ZENODO.6814770 ¹

Clérice, T. (2022). YALTAi: Segmonto Manuscript and Early Printed Book Dataset (Version 1.0.0) [Dataset]. Zenodo. 10.5281/ZENODO.6814769 ¹

Martens, M., Willighagen, E., & Evelo, C. (2026). Adverse Outcome Pathway Wiki RDF (Version 2026-03) [Dataset]. Zenodo. 10.5281/ZENODO.18821622 ¹

Martens, M., Willighagen, E., & Evelo, C. (2026). Adverse Outcome Pathway Wiki RDF (Version 2026-02) [Dataset]. Zenodo. 10.5281/ZENODO.18447235 ¹

Martens, M., Willighagen, E., & Evelo, C. (2026). Adverse Outcome Pathway Wiki RDF (Version 2026-01) [Dataset]. Zenodo. 10.5281/ZENODO.18113363 ¹

Martens, M., Willighagen, E., & Evelo, C. (2025). Adverse Outcome Pathway Wiki RDF (Version 2025-12) [Dataset]. Zenodo. 10.5281/ZENODO.17774419 ¹

Martens, M., Willighagen, E., & Evelo, C. (2025). Adverse Outcome Pathway Wiki RDF (Version 2025-11) [Dataset]. Zenodo. 10.5281/ZENODO.17499488 ¹

Martens, M., Willighagen, E., & Evelo, C. (2023). Adverse Outcome Pathway Wiki RDF [Dataset]. Zenodo. 10.5281/ZENODO.*** ¹

Martens, M., Willighagen, E., & Evelo, C. (2025). Adverse Outcome Pathway Wiki RDF (Version 2025-09) [Dataset]. Zenodo. 10.5281/ZENODO.17015893 ¹

Martens, M., Willighagen, E., & Evelo, C. (2025). Adverse Outcome Pathway Wiki RDF (Version 2025-08) [Dataset]. Zenodo. 10.5281/ZENODO.16671771 ¹

Martens, M., Willighagen, E., & Evelo, C. (2025). Adverse Outcome Pathway Wiki RDF (Version 2025-07) [Dataset]. Zenodo. 10.5281/ZENODO.15779889 ¹

Martens, M., Willighagen, E., & Evelo, C. (2025). Adverse Outcome Pathway Wiki RDF (Version 2025-06) [Dataset]. Zenodo. 10.5281/ZENODO.15568399 ¹

Martens, M., Willighagen, E., & Evelo, C. (2025). Adverse Outcome Pathway Wiki RDF (Version 2025-05) [Dataset]. Zenodo. 10.5281/ZENODO.15314606 ¹

Martens, M., Willighagen, E., & Evelo, C. (2025). Adverse Outcome Pathway Wiki RDF (Version 2025-04) [Dataset]. Zenodo. 10.5281/ZENODO.15117729 ¹

Martens, M., Willighagen, E., & Evelo, C. (2025). Adverse Outcome Pathway Wiki RDF (Version 2025-03) [Dataset]. Zenodo. 10.5281/ZENODO.14995605 ¹

Martens, M., Willighagen, E., & Evelo, C. (2025). Adverse Outcome Pathway Wiki RDF (Version 2025-02) [Dataset]. Zenodo. 10.5281/ZENODO.14850921 ¹

Martens, M., Willighagen, E., & Evelo, C. (2023). Adverse Outcome Pathway Wiki RDF [Dataset]. Zenodo. 10.5281/ZENODO.10057880 ¹

1 ScholeXplorer

Références bibliographiques

7 Documents citant cet article

Partager et exporter

Statistiques de consultation

Cette page a été consultée 1784 fois.

Le PDF de cet article a été téléchargé 732 fois.