French vital records data gathering and analysis through image processing and machine learning algorithms

Cyprien Plateau-Holleville; Enzo Bonnot; Franck Gechter; Laurent Heyberger

doi:10.46298/jdmdh.7327

Cyprien Plateau-Holleville ; Enzo Bonnot ; Franck Gechter ; Laurent Heyberger - French vital records data gathering and analysis through image processing and machine learning algorithms

jdmdh:7327 - Journal of Data Mining & Digital Humanities, 15 juillet 2021, 2021 - https://doi.org/10.46298/jdmdh.7327

French vital records data gathering and analysis through image processing and machine learning algorithmsArticle

Auteurs : Cyprien Plateau-Holleville ¹; Enzo Bonnot ¹; Franck Gechter ^2,^3,¹; Laurent Heyberger ⁴

1 Université de Technologie de Belfort-Montbeliard
2 Connaissance et Intelligence Artificielle Distribuées [Dijon] [CIAD]
3 Proof-oriented development of computer-based systems
4 Franche-Comté Électronique Mécanique, Thermique et Optique - Sciences et Technologies (UMR 6174) [FEMTO-ST]

Vital records are rich of meaningful historical data concerning city as well as countryside inhabitants that can be used, among others, to study former populations and then reveal the social, economic and demographic characteristics of those populations. However, these studies encounter a main difficulty for collecting the data needed since most of these records are scanned documents that need a manual transcription step in order to gather all the data and start exploiting it from a historical point of view. This step consequently slows down the historical research and is an obstacle to a better knowledge of the population habits depending on their social conditions. Therefore in this paper, we present a modular and self-sufficient analysis pipeline using state-of-the-art algorithms mostly regardless of the document layout that aims to automate this data extraction process.

https://doi.org/10.46298/jdmdh.7327

Source : HAL:hal-03189188v3

Volume : 2021

Publié le : 15 juillet 2021

Accepté le : 3 juillet 2021

Soumis le : 6 avril 2021

Mots-clés : [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], [SHS.HIST]Humanities and Social Sciences/History, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], [en] Handwritten Text Recognition, Machine Learning, Optical Character Recognition, Historical Data

Licence : Hal authorisation v1

Références bibliographiques

1 Document citant cet article

Partager et exporter

Statistiques de consultation

Cette page a été consultée 3058 fois.

Le PDF de cet article a été téléchargé 1371 fois.