Exploring Data Provenance in Handwritten Text Recognition Infrastructure: Sharing and Reusing Ground Truth Data, Referencing Models, and Acknowledging Contributions. Starting the Conversation on How We Could Get It DoneArticleAuteurs : Romein, C. Annemieke
1; Hodel, Tobias
2; Gordijn, Femke
3; Zundert, Joris J. van
3; Chagué, Alix
4; Lange, Milan van
5; Jensen, Helle Strandgaard
6; Stauder, Andy
7; Purcell, Jake
8; Terras, Melissa M.
9; Heuvel, Pauline van den
10; Keijzer, Carlijn
5; Rabus, Achim
11; Sitaram, Chantal
12; Bhatia, Aakriti
12; Depuydt, Katrien
13; Afolabi-Adeolu, Mary Aderonke
14; Anikina, Anastasiia
15; Bastianello, Elisa
16; Benzinger, Lukas Vincent
17; Bosse, Arno
18; Brown, David
19; Charlton, Ash
20; Dannevig, André Nilsson
21; Gelder, Klaas van
22; Go, Sabine C.P.J.
17; Goh, Marcus J.C.
17; Gstrein, Silvia
23; Hasan, Sewa
17; Heide, Stefan von der
24; Hindermann, Maximilian
25; Huff, Dorothee
26; Huysman, Ineke
3; Idris, Ali
17; Keijzer, Liesbeth
27; Kemper, Simon
27; Koenders, Sanne
17; Kuijpers, Erika
17; Rønsig Larsen, Lisette
28; Lepa, Sven
29; Link, Tommy O.
17; Nispen, Annelies van
5; Nockels, Joe
20; Noort, Laura M. van
17; Oosterhuis, Joost Johannes
30; Popken, Vivien
31; Estrella Puertollano, María
17; Puusaag, Joosep J.
17; Sheta, Ahmed
32; Stoop, Lex
33; Strutzenbladh, Ebba
34; Sijs, Nicoline van der
13; Spek, Jan Paul van der
33; Trouw, Barry Benaissa
33; Van Synghel, Geertrui
3; Vučković, Vladimir
17; Wilbrink, Heleen
35; Weiss, Sonia
7; Wrisley, David Joseph
36; Zweistra, Riet
33
0000-0003-3682-0126##0000-0002-2071-6407##0000-0002-8653-8674##0000-0003-3862-7602##0000-0002-0136-4434##0000-0002-4609-5186##0000-0002-8623-9586##0000-0002-2253-0614##0000-0002-2253-0614##0000-0001-6496-3197##0000-0001-6496-3197##0000-0003-1728-5623##0000-0002-5366-1430##0000-0002-5366-1430##0000-0002-5366-1430##0000-0003-4579-5305##0000-0003-4579-5305##0000-0003-4579-5305##0000-0002-4419-7912##0000-0002-4419-7912##0000-0003-3681-1289##0000-0003-3681-1289##0000-0003-3681-1289##0000-0003-3681-1289##0000-0002-9250-629X##0000-0003-3452-370X##0000-0003-3452-370X##0000-0002-9861-4478##0000-0002-9861-4478##0000-0002-0504-5385##0000-0002-9337-4655##0000-0003-0866-9967##0000-0002-8659-2822##0000-0002-8659-2822##0000-0002-8659-2822##0000-0003-3169-1783##0000-0003-3169-1783##0000-0003-3169-1783##0000-0003-3169-1783##0000-0003-3169-1783##0000-0003-3169-1783##0000-0003-4341-7634##0000-0002-4577-6596##0000-0002-4577-6596##0000-0003-4699-4672##0000-0003-4699-4672##0000-0003-4699-4672##0000-0003-4699-4672##0000-0003-4699-4672##0000-0003-4699-4672##0000-0003-4699-4672##0000-0001-9042-6336##0000-0001-9042-6336##0000-0001-9042-6336##0000-0001-9042-6336##0000-0001-9042-6336##0000-0001-9042-6336##0000-0001-9042-6336##0000-0002-0355-1487##0000-0002-0355-1487
Romein, C. Annemieke;Hodel, Tobias;Gordijn, Femke;Zundert, Joris J. van;Chagué, Alix;Lange, Milan van;Jensen, Helle Strandgaard;Stauder, Andy;Purcell, Jake;Terras, Melissa M.;Heuvel, Pauline van den;Keijzer, Carlijn;Rabus, Achim;Sitaram, Chantal;Bhatia, Aakriti;Depuydt, Katrien;Afolabi-Adeolu, Mary Aderonke;Anikina, Anastasiia;Bastianello, Elisa;Benzinger, Lukas Vincent;Bosse, Arno;Brown, David;Charlton, Ash;Dannevig, André Nilsson;Gelder, Klaas van;Go, Sabine C.P.J.;Goh, Marcus J.C.;Gstrein, Silvia;Hasan, Sewa;Heide, Stefan von der;Hindermann, Maximilian;Huff, Dorothee;Huysman, Ineke;Idris, Ali;Keijzer, Liesbeth;Kemper, Simon;Koenders, Sanne;Kuijpers, Erika;Rønsig Larsen, Lisette;Lepa, Sven;Link, Tommy O.;Nispen, Annelies van;Nockels, Joe;Noort, Laura M. van;Oosterhuis, Joost Johannes;Popken, Vivien;Estrella Puertollano, María;Puusaag, Joosep J.;Sheta, Ahmed;Stoop, Lex;Strutzenbladh, Ebba;Sijs, Nicoline van der;Spek, Jan Paul van der;Trouw, Barry Benaissa;Van Synghel, Geertrui;Vučković, Vladimir;Wilbrink, Heleen;Weiss, Sonia;Wrisley, David Joseph;Zweistra, Riet
- 1 Huygens Institute for the History and Culture of the Netherlands; Vrije Universiteit Amsterdam
- 2 University of Bern
- 3 Huygens Institute for the History and Culture of the Netherlands
- 4 ALMAnaCH, Inria, Paris; Université de Montréal
- 5 NIOD Institute for War, Holocaust, and Genocide Studies
- 6 Aarhus Universitet/ Aarhus University
- 7 READ-COOP SCE
- 8 American Historical Association
- 9 University of Edinburgh
- 10 Amsterdam City Archives
- 11 Albert-Ludwigs-Universität: Freiburg im Breisgau
- 12 KNAW Humanities Cluster Amsterdam
- 13 Instituut voor de Nederlandse Taal
- 14 Bonn Center for Dependency and Slavery Studies at the University of Bonn
- 15 Universiteit van Amsterdam
- 16 Bibliotheca Hertziana – Max Planck Institute for Art History
- 17 Vrije Universiteit Amsterdam
- 18 KNAW Humanities Cluster, Amsterdam
- 19 Trinity College Dublin
- 20 University of Edinburgh; National Library of Scotland
- 21 National Archives of Norway
- 22 Vrije Universiteit Brussel; State Archives Brussels
- 23 University of Innsbruck; State Library of Tyrol
- 24 CCS Content Conversion Specialists GmbH
- 25 University of Basel
- 26 University Library of Tübingen
- 27 Dutch National Archives
- 28 Danish National Archives
- 29 Rahvusarhiiv Estonia
- 30 University of Amsterdam
- 31 Research Centre for Hanse and Baltic History (FGHO)
- 32 Friedrich Alexander Universität Erlangen-Nürnberg
- 33 independent citizen scientist
- 34 University of Aberdeen
- 35 Utrechts Archief
- 36 NYU Abu Dhabi
This paper discusses best practices for sharing and reusing Ground Truth in Handwritten Text Recognition infrastructures, as well as ways to reference and acknowledge contributions to the creation and enrichment of data within these systems. We discuss how one can place Ground Truth data in a repository and, subsequently, inform others through HTR-United. Furthermore, we want to suggest appropriate citation methods for ATR data, models, and contributions made by volunteers. Moreover, when using digitised sources (digital facsimiles), it becomes increasingly important to distinguish between the physical object and the digital collection. These topics all relate to the proper acknowledgement of labour put into digitising, transcribing, and sharing Ground Truth HTR data. This also points to broader issues surrounding the use of machine learning in archival and library contexts, and how the community should begin to acknowledge and record both contributions and data provenance.
Volume : Documents historiques et reconnaissance automatique de texte
Publié le : 18 mars 2024
Accepté le : 8 décembre 2023
Soumis le : 30 novembre 2022
Mots-clés : Automatic Text Recognition, Handwritten Text Recognition, Data Publication, Open Data, Data Provenance, Data Curation, Ground Truth, Sharing