Authors: Thibault Clérice ORCID1,2,3,4,5; Ariane Pinche ORCID6,2,4,7,5

A lot of available digitized manuscripts online are actually digitized microfilms, a technology dating back from the 1930s. With the progress of artificial colorization, we make the hypothesis that microfilms could be colored with these recent technologies, testing InstColorization. We train a model over an ad-hoc dataset of 18 788 color images that are artificially gray-scaled for this purpose. With promising results in terms of colorization but clear limitations due to the difference between artificially grayscaled images and "naturaly" greyscaled microfilms, we evaluate the impact of this artificial colorization on two downstream tasks using Kraken: layout analysis and text recognition. Unfortunately, the results show little to no improvements which limits the interest of artificial colorization on manuscripts in the computer vision domain.

Volume: 2022
Section: Towards a Digital Ecosystem: NLP. Corpus infrastructure. Methods for Retrieving Texts and Computing Text Similarities
Published on: April 12, 2023
Accepted on: April 12, 2023
Submitted on: September 7, 2021
Keywords: [SCCO.COMP]Cognitive science/Computer science,[SHS.LITT]Humanities and Social Sciences/Literature
  • The Manuscrit du Roi. Image, Text and Music; Funder: French National Research Agency (ANR); Code: ANR-18-CE27-0016

