Samantha Blickhan ; Coleman Krawczyk ; Daniel Hanson ; Amy Boyer ; Andrea Simenstad et al. - Individual vs. Collaborative Methods of Crowdsourced Transcription

jdmdh:5759 - Journal of Data Mining & Digital Humanities, December 3, 2019, Special Issue on Collecting, Preserving, and Disseminating Endangered Cultural Heritage for New Understandings through Multilingual Approaches -
Individual vs. Collaborative Methods of Crowdsourced TranscriptionArticle

Authors: Samantha Blickhan ORCID1,2; Coleman Krawczyk 3; Daniel Hanson 4,5; Amy Boyer 1; Andrea Simenstad 4,5; Victoria van Hyning 6

While online crowdsourced text transcription projects have proliferated in the last decade, there is a need within the broader field to understand differences in project outcomes as they relate to task design, as well as to experiment with different models of online crowdsourced transcription that have not yet been explored. The experiment discussed in this paper involves the evaluation of newly-built tools on the crowdsourcing platform, attempting to answer the research question: "Does the current Zooniverse methodology of multiple independent transcribers and aggregation of results render higher-quality outcomes than allowing volunteers to see previous transcriptions and/or markings by other users? How does each methodology impact the quality and depth of analysis and participation?" To answer these questions, the Zooniverse team ran an A/B experiment on the project Anti-Slavery Manuscripts at the Boston Public Library. This paper will share results of this study, and also describe the process of designing the experiment and the metrics used to evaluate each transcription method. These include the comparison of aggregate transcription results with ground truth data; evaluation of annotation methods; the time it took for volunteers to complete transcribing each dataset; and the level of engagement with other project elements such as posting on the message board or reading supporting documentation. Particular focus will be given to the (at times) competing goals of data quality, efficiency, volunteer engagement, and user retention, all of which are of high importance for projects that focus on data from galleries, libraries, archives and museums. Ultimately, this paper aims to provide a model for impactful, intentional design and study of online crowdsourcing transcription methods, as well as shed light on the associations between project design, methodology and outcomes.

Volume: Special Issue on Collecting, Preserving, and Disseminating Endangered Cultural Heritage for New Understandings through Multilingual Approaches
Published on: December 3, 2019
Accepted on: November 29, 2019
Submitted on: September 12, 2019
Keywords: [SHS.STAT]Humanities and Social Sciences/Methods and statistics,[SHS.MUSEO]Humanities and Social Sciences/Cultural heritage and museology,[SHS.INFO]Humanities and Social Sciences/Library and information sciences

4 Documents citing this article

Consultation statistics

This page has been seen 2568 times.
This article's PDF has been downloaded 890 times.