Historical Documents and automatic text recognition

Special issue : « Historical Documents and automatic text recognition »

With this special issue of the Journal of Data Mining and Digital Humanities (JDMDH), we wish to bring together in one single volume several experiments, projects and reflections related to automatic text recognition on Historical documents.

Many projects now include automatic text acquisition in their data processing chain. The integration of this technology into increasingly powerful processing chains has led to an automation of tasks that affects the role of the researcher in the textual production process. This new data-intensive practice makes it urgent to collect and harmonise the corpora necessary for the constitution of training sets, but also to make them available for exploitation. This issue will be an opportunity to propose articles combining philological and technical questions to make a scientific assessment of the use of automatic text recognition for ancient documents, its results, its contributions and the new practices induced by its use in the process of editing and exploring texts. We hope that practical aspects will be questioned on this occasion, while raising methodological challenges and its impact on research data.

This special issue is the outcome of an event that took place at the Ecole Nationale des Chartes in Paris on June 23 and 24, 2022, which brought together scholars from various backgrounds to discuss the use of HTR and OCR in their researches. During these days, problems of engineering, machine learning or infrastructure were raised. Many technical subjects such as segmentation or the development of models linked to philological questions were discussed. The different speeches covered a large number of documents: manuscripts, archives, epigraphic materials, documents, sometimes in languages with their own specificities such as Hebrew, Vietnamese languages as CHAM or ancient Greek from the 11th to the 20th century.

This call is open not only to participants of these event, but to anyone working with HTR or OCR.

To address these issues, we propose the following three axes:

- Axis 1: Sources, constitution and sharing of training data.

- Axis 2: Machine learning

- Axis 3: Feedback and data exploitation

This special issue aims to provide an overview of the use of HTR or OCR on historical documents at a time when its uses are multiplying and more and more research projects and cultural heritage institutions are interested in it. Through the Journal of Data Mining and Digital Humanities, we are delighted to offer an opportunity to all those who wish to make their own contribution to the field or to share their experience by exposing their successes, their questions and their difficulties, or even failure. By publishing this special issue, we hope to present a state of the art of the uses of automatic handwriting recognition today.

Journal of Data Mining and Digital Humanities is an open-access peer-reviewed journal with first draft as pre-print in arxiv or HAL and peer-review post-pblication.

Submission details and deadlines:

  • The papers are expected to be between 6 and 8 pages for short paper or between 12 and 15 pages for long papers.

  • The articles must present original and previously unpublished work.

  • All submissions must be in english

  • All the articles submitted are subject to blind peer-review in accordance with the journal’s editorial policies.

  • Submission deadline: 1 November 2022.

  • In order to submit an article to the special issue you should:

    • Sign up and connect to the platform of the JDMDH.

    • Register on an exterior repository cooperating with EPIsciences (HAL, Arxiv or CWI) and upload your manuscript there.

    • Submit your manuscript to the special issue by providing the ID of your manuscript that was assigned to it upon uploading.

    • After your paper is accepted, you will be invited to adjust the manuscript according to the journal’s guidelines and stylesheet (toolkits are provided for MS Word and LaTeX). For more details, see the dedicated section of this website and the official EPIscience documentation.