C. Annemieke Romein ; Tobias Hodel ; Femke Gordijn ; Joris J. van Zundert ; Alix Chagué et al.
-
Exploring Data Provenance in Handwritten Text Recognition Infrastructure: Sharing and Reusing Ground Truth Data, Referencing Models, and Acknowledging Contributions. Starting the Conversation on How We Could Get It Done
jdmdh:10403 -
Journal of Data Mining & Digital Humanities,
18 mars 2024,
Documents historiques et reconnaissance automatique de texte
-
https://doi.org/10.46298/jdmdh.10403
Exploring Data Provenance in Handwritten Text Recognition Infrastructure: Sharing and Reusing Ground Truth Data, Referencing Models, and Acknowledging Contributions. Starting the Conversation on How We Could Get It DoneArticle
Auteurs : Romein, C. Annemieke 1; Hodel, Tobias 2; Gordijn, Femke 3; Zundert, Joris J. van 3; Chagué, Alix 4; Lange, Milan van 5; Jensen, Helle Strandgaard 6; Stauder, Andy 7; Purcell, Jake 8; Terras, Melissa M. 9; Heuvel, Pauline van den 10; Keijzer, Carlijn 5; Rabus, Achim 11; Sitaram, Chantal 12; Bhatia, Aakriti 12; Depuydt, Katrien 13; Afolabi-Adeolu, Mary Aderonke 14; Anikina, Anastasiia 15; Bastianello, Elisa 16; Benzinger, Lukas Vincent 17; Bosse, Arno 18; Brown, David 19; Charlton, Ash 20; Dannevig, André Nilsson 21; Gelder, Klaas van 22; Go, Sabine C.P.J. 17; Goh, Marcus J.C. 17; Gstrein, Silvia 23; Hasan, Sewa 17; Heide, Stefan von der 24; Hindermann, Maximilian 25; Huff, Dorothee 26; Huysman, Ineke 3; Idris, Ali 17; Keijzer, Liesbeth 27; Kemper, Simon 27; Koenders, Sanne 17; Kuijpers, Erika 17; Rønsig Larsen, Lisette 28; Lepa, Sven 29; Link, Tommy O. 17; Nispen, Annelies van 5; Nockels, Joe 20; Noort, Laura M. van 17; Oosterhuis, Joost Johannes 30; Popken, Vivien 31; Estrella Puertollano, María 17; Puusaag, Joosep J. 17; Sheta, Ahmed 32; Stoop, Lex 33; Strutzenbladh, Ebba 34; Sijs, Nicoline van der 13; Spek, Jan Paul van der 33; Trouw, Barry Benaissa 33; Van Synghel, Geertrui 3; Vučković, Vladimir 17; Wilbrink, Heleen 35; Weiss, Sonia 7; Wrisley, David Joseph 36; Zweistra, Riet 33
Romein, C. Annemieke;Hodel, Tobias;Gordijn, Femke;Zundert, Joris J. van;Chagué, Alix;Lange, Milan van;Jensen, Helle Strandgaard;Stauder, Andy;Purcell, Jake;Terras, Melissa M.;Heuvel, Pauline van den;Keijzer, Carlijn;Rabus, Achim;Sitaram, Chantal;Bhatia, Aakriti;Depuydt, Katrien;Afolabi-Adeolu, Mary Aderonke;Anikina, Anastasiia;Bastianello, Elisa;Benzinger, Lukas Vincent;Bosse, Arno;Brown, David;Charlton, Ash;Dannevig, André Nilsson;Gelder, Klaas van;Go, Sabine C.P.J.;Goh, Marcus J.C.;Gstrein, Silvia;Hasan, Sewa;Heide, Stefan von der;Hindermann, Maximilian;Huff, Dorothee;Huysman, Ineke;Idris, Ali;Keijzer, Liesbeth;Kemper, Simon;Koenders, Sanne;Kuijpers, Erika;Rønsig Larsen, Lisette;Lepa, Sven;Link, Tommy O.;Nispen, Annelies van;Nockels, Joe;Noort, Laura M. van;Oosterhuis, Joost Johannes;Popken, Vivien;Estrella Puertollano, María;Puusaag, Joosep J.;Sheta, Ahmed;Stoop, Lex;Strutzenbladh, Ebba;Sijs, Nicoline van der;Spek, Jan Paul van der;Trouw, Barry Benaissa;Van Synghel, Geertrui;Vučković, Vladimir;Wilbrink, Heleen;Weiss, Sonia;Wrisley, David Joseph;Zweistra, Riet
1 Huygens Institute for the History and Culture of the Netherlands; Vrije Universiteit Amsterdam
2 University of Bern
3 Huygens Institute for the History and Culture of the Netherlands
4 ALMAnaCH, Inria, Paris; Université de Montréal
5 NIOD Institute for War, Holocaust, and Genocide Studies
6 Aarhus Universitet/ Aarhus University
7 READ-COOP SCE
8 American Historical Association
9 University of Edinburgh
10 Amsterdam City Archives
11 Albert-Ludwigs-Universität: Freiburg im Breisgau
12 KNAW Humanities Cluster Amsterdam
13 Instituut voor de Nederlandse Taal
14 Bonn Center for Dependency and Slavery Studies at the University of Bonn
15 Universiteit van Amsterdam
16 Bibliotheca Hertziana – Max Planck Institute for Art History
17 Vrije Universiteit Amsterdam
18 KNAW Humanities Cluster, Amsterdam
19 Trinity College Dublin
20 University of Edinburgh; National Library of Scotland
21 National Archives of Norway
22 Vrije Universiteit Brussel; State Archives Brussels
23 University of Innsbruck; State Library of Tyrol
24 CCS Content Conversion Specialists GmbH
25 University of Basel
26 University Library of Tübingen
27 Dutch National Archives
28 Danish National Archives
29 Rahvusarhiiv Estonia
30 University of Amsterdam
31 Research Centre for Hanse and Baltic History (FGHO)
32 Friedrich Alexander Universität Erlangen-Nürnberg
33 independent citizen scientist
34 University of Aberdeen
35 Utrechts Archief
36 NYU Abu Dhabi
This paper discusses best practices for sharing and reusing Ground Truth in Handwritten Text Recognition infrastructures, as well as ways to reference and acknowledge contributions to the creation and enrichment of data within these systems. We discuss how one can place Ground Truth data in a repository and, subsequently, inform others through HTR-United. Furthermore, we want to suggest appropriate citation methods for ATR data, models, and contributions made by volunteers. Moreover, when using digitised sources (digital facsimiles), it becomes increasingly important to distinguish between the physical object and the digital collection. These topics all relate to the proper acknowledgement of labour put into digitising, transcribing, and sharing Ground Truth HTR data. This also points to broader issues surrounding the use of machine learning in archival and library contexts, and how the community should begin to acknowledge and record both contributions and data provenance.
Romein, C. A., Hodel, T., Gordijn, F., Zundert, J. J. van, Chagué, A., Lange, M. van, Jensen, H. S., Stauder, A., Purcell, J., Terras, M. M., Heuvel, P. van den, Keijzer, C., Rabus, A., Sitaram, C., Bhatia, A., Depuydt, K., Afolabi-Adeolu, M. A., Anikina, A., Bastianello, E., … Zweistra, R. (2024). Exploring Data Provenance in Handwritten Text Recognition Infrastructure: Sharing and Reusing Ground Truth Data, Referencing Models, and Acknowledging Contributions. Starting the Conversation on How We Could Get It Done. Zenodo. 10.5281/ZENODO.10804745