- Collecting, Preserving, and Disseminating Endangered Cultural Heritage for New Understandings through Multilingual Approaches


Special Issue Editors 

Amel Fraisse
Univ. Lille, EA 4073 - GERiiCO - Groupement d’Etudes et de Recherche Interdisciplinaire en Information et Communication, F-59000 Lille, France.
Website | E-Mail
Interests: library and information science, knowledge and language diversity, cultural heriatge, multilingualism, digital humanities.

Ronald Jenn
Univ. Lille, EA 4074 - CECILLE - Centre d’Études en Civilisations Langues et Lettres Etrangères, F-59000 Lille, France.
Website | E-Mail
Interests: translation studies, translated texts, digital humanities,  Mark Twain.

Shelley Fisher Fishkin
Stanford University, English Departement.
Website | E-Mail
Interests:  transnational American studies, literature, translation studies, Mark Twain.


Special Issue Information

In an increasingly globalized context, multilingualism and multiculturalism have become major preoccupations to preserve knowledge diversity and cultural heritage. Indeed, over time, the gap between languages of dominant nations or civilizations and other languages has been growing. This special issue, featuring a selection of papers  presenting recent research that aims at collecting, preserving, and dissiminating endangered knowledge and cultural heritage for sustaining knowledge diversity.


More than a century ago, Paul Otlet, the pioneer of Documentation Studies, envisioned a universal compilation of knowledge and the technology to make it globally available. He wrote numerous essays on how to collect and organize the world's knowledge ( Otlet, 1934). The ever growing number of digital documents and scientific and political interests in making them openly available all over the world has led to the creation of new digital collections in a broad range of fields and languages. Several Registries of Open Access Repositories (ROARs) hosted by national and international organizations and universities, have been developed. For example, The Library of Congress[1] has digitized approximately 164 million items in virtually all formats, languages, subjects, and periods. These collections are broad in scope, including research materials in more than 470 languages and multiple media. The Europeana collection[2], launched in 2008 and funded by the European Commission, contains over fifteen million digitized paintings, drawings, maps, photos, books, newspapers, letters, diaries, etc., from fifteen hundred institutions. However, the language barrier is a key issue that Knowledge Organization Systems (KOS) have to address as described by Hudon (1997,1998) and Agnes Hajdu Barat (2008). Indeed, over time, the gap between languages of dominant nations or civilizations and other languages has been growing. Although KOS include knowledge encoded in under-resourced languages, their use and exploration is still limited. 


[1] https://www.loc.gov

[2] https://www.europeana.eu


1. Context

According to the Sapient Globalization Report there are over 6,700 living languages in the world; the fifteen most popular languages are spoken by 49.5% of the world’s population, while the other 51.5% of the world’s population speak 6,600 languages. Yet, only about 6% of the world’s population speak English. Of the world’s 6000+ languages only a small fraction, a dozen or so, currently enjoy the benefits of modern information technologies and knowledge organization systems. A larger but still modest number, close to a hundred, have the so-called Basic LAnguage Resource Kit (BLARK): monolingual and bilingual corpora, machine readable dictionaries, terminologies, thesauri, ontologies and the like as described by Steven Krauwer (2003) and Antti Arppe (2016). Preserving knowledge diversity and ensuring the right of all people to access knowledge in their mother tongue is the main goal of the Information for All Programme (IFAP) created by UNESCO. Several research work have called for cultural and linguistic diversity as described by e.g. Alder (2016), Beghtol (2005), Dahlberg (1992), López-Huertas (2016), and Mustafa El Hadi (2015). In a previous research work Beghtol (1986, 2001) introduce the concept of cultural warrant. Fisher Fishkin (2011) introduced and described a new model for data curation and sharing by inviting colleagues around the world to collaborate on Digital Palimpsest Mapping Projects (DPMPs), or “Deep Maps”. Deep Maps, curated collaboratively by scholars in multiple locations, would put multilingual digital archives around the globe in conversation with one another, using maps as the gateway.

Moving from a closed, discontinuous, and out of context to open, continuous, and in context knowledge organization models is a concept that has shown its effectiveness by wiki plateform for example wikipedia. The basic concept is based upon collaborative approach and promoting the right of all people to use information system in their mother tongue. It consists of renouncing the idea of perfect and complete knowledge and publishing partial knowledge with variable quality, which will be improved incrementally during the use of the information system. Therefore, the information process will be ongoing and improve continuously. The new process permits the incremental augmentation of both quality and quantity. The best known example of this is the Wikipedia community, in which knowledge is added and improved continuously by contributors.

Individual vs. Collaborative Methods of Crowdsourced Transcription 

Authors: Blickhan, Samantha and Krawczyk, Coleman and Hanson, Daniel and Boyer, Amy and Simenstad, Andrea and Hyning, Victoria, and Van Hyning, Victoria 

While online crowdsourced text transcription projects have proliferated in the last decade, there is a need within the broader field to understand differences in project outcomes as they relate to task design, as well as to experiment with different models of online crowdsourced transcription that have not yet been explored. The experiment discussed in this paper involves the evaluation of newly-built tools on the Zooniverse.org crowdsourcing platform, attempting to answer the research question: "Does the current Zooniverse methodology of multiple independent transcribers and aggregation of results render higher-quality outcomes than allowing volunteers to see previous transcriptions and/or markings by other users? How does each methodology impact the quality and depth of analysis and participation?" To answer these questions, the Zooniverse team ran an A/B experiment on the project Anti-Slavery Manuscripts at the Boston Public Library. This paper will share results of this study, and also describe the process of designing the experiment and the metrics used to evaluate each transcription method. These include the comparison of aggregate transcription results with ground truth data; evaluation of annotation methods; the time it took for volunteers to complete transcribing each dataset; and the level of engagement with other project elements such as posting on the message board or reading supporting documentation. Particular focus will be given to the (at times) competing goals of data quality, efficiency, volunteer engagement, and user retention, all of which are of high importance for projects that focus on data from galleries, libraries, archives and museums. Ultimately, this paper aims to provide a model for impactful, intentional design and study of online crowdsourcing transcription methods, as well as shed light on the associations between project design, methodology and outcomes.


A Collaborative Ecosystem for Digital Coptic Studies

Authors: Schroeder, Caroline T. and Zeldes, Amir

Scholarship on underresourced languages bring with them a variety of challenges which make access to the full spectrum of source materials and their evaluation difficult. For Coptic in particular, large scale analyses and any kind of quantitative work become difficult due to the fragmentation of manuscripts, the highly fusional nature of an incorporational morphology, and the complications of dealing with influences from Hellenistic era Greek, among other concerns. Many of these challenges, however, can be addressed using Digital Humanities tools and standards. In this paper, we outline some of the latest developments in Coptic Scriptorium, a DH project dedicated to bringing Coptic resources online in uniform, machine readable, and openly available formats. Collaborative web-based tools create online 'virtual departments' in which scholars dispersed sparsely across the globe can collaborate, and natural language processing tools counterbalance the scarcity of trained editors by enabling machine processing of Coptic text to produce searchable, annotated corpora.


ekdosis: Using LuaL A T E X for Producing TEI xml Compliant Critical Editions and Highlighting Parallel Writings


Authors: Alessi, Robert


ekdosis is a LuaL A T E X package written by R. Alessi designed for multilingual critical editions. It can be used to typeset texts and different layers of critical notes in any direction accepted by LuaT E X. Texts can be arranged in running paragraphs or on facing pages, in any number of columns which in turn can be synchronized or not. Database-driven encoding under L A T E X allows extraction of texts entered segment by segment according to various criteria: main edited text, variant readings, translations or annotated borrowings between texts. In addition to printed texts, ekdosis can convert .tex source files so as to produce TEI xml compliant critical editions. It will be published under the terms of the GNU General Public License (GPL) version 3.



Manuscript Submission Information

To download the journal template go to the journal website JDMDH, click on "About the Journal "then "Submissions". 

It is a two-step submission process: you first submit your paper on an open acess repository (arXiv, HAL) that will provide you with a document identifier. You then you go to the Journal website, click on "Submit an article". You will be asked to select the repository you have chosen before you type in your document identifier.

All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. 




Adler, Melissa A., Joseph T. Tennis, Daniel Martínez-Ávila, José Augusto Chaves Guimarães, Jens-Erik Mai, Ole Olesen-Bagneux, and Laura Skouvig. 2016. “Global/local Knowledge Organization: Contexts and Questions”. In: Proceedings of the Association for Information Science and Technology 53(1):1-4.

Arppe, Antti,  Jordan Lachler, Trond Trosterud, Lene Antonsen, and Sjur N. Moshagen. 2016. “Basic language resource kits for endangered languages: A case study of plains cree”. In Proceedings of the the 2nd Workshop on Collaboration and Computing for Under-Resourced Languages Workshop : 1–8.

Barát, Ágnes H. 2008. “Knowledge Organization in the Cross-Cultural and Multicultural Society”. In: Advances Knowledge Organization 11, Proceedings of the Tenth International ISKO Conference: 91–97.

Baron, Robert. 2012. “ ”All Power to the Periphery” The Public Folklore Thought of Alan Lomax”. In: Journal of Folklore Research , Vol. 49, No. 3. Indiana University Press : 275- 317. Stable URL: https://www.jstor.org/stable/10.2979/jfolkrese.49.3.275

Beghtol, Clare. 1986. “Semantic validity: concepts of warrant in bibliographic classification systems”, Library Resources and Technical Services, Vol. 30 No. 2:109‐25.

Beghtol, Clare. 2001. “Relationships in classificatory structure and meaning”, in Bean, C.A. and Green, R. (Eds), Relationships in the Organization of Knowledge, Kluwer, Dordrecht: 99‐113.

Beghtol, Clare. 2002. “Universal Concepts, Cultural Warrant and Cultural Hospitality”. In: Challenges in Knowledge Representation and Organization for the 21st Century Integration of Knowledge Across Boundaries, Proceedings of the Seventh International ISKO Conference: 45-49.

Beghtol, Clare. 2005. “Ethical Decision-Making for Knowledge Representation and Organization Systems for Global Use.” Journal of the American Society for Information Science and Technology 56 (9):903–12.

Dahlberg, Ingetraut.1992. “Ethics and Knowledge Organization: In Memory of Dr. S.R. Ranganathan in His Centenary Year.” International Classification 19 (1):1–2.

Eveleigh, Alexandra. 2014. “Crowding Out the Archivist? Locating Crowdsourcing  within the Broader Landscape of Participatory Archives,”. In: Crowdsourcing our Cultural Heritage, ed. Mia Ridge 211-229

Fishkin Fisher, Shelley. 2011. “Deep Maps: A Brief for Digital Palimpsest Mapping Projects (DPMPs, or “Deep Maps”)”. In: Journal of Transnational American Studies, 3(2). URL: https://escholarship.org/uc/item/92v100t0

Fraisse, Amel. 2010. “Localisation interne et en contexte des logiciels commerciaux et libres”. Ph. D. thesis, Université de Grenoble, France. URL : https://tel.archives-ouvertes.fr/tel-00995093

Fraisse, Amel, Boitet, Christian, Blanchon, Hervé, Bellynck, Valérie. 2009. “A solution for in context and collaborative localization of most commercial and free software”. In: Proceedings of the 4th Language and technologies Conference, vol 1/1:536-540, Poznan, Poland.

Fraisse, Amel, Zheng Zhang, Alex Zhai, Ronald Jenn, Shelley Fisher Fishkin, Pierre Zweigenbaum, Laurence Favier, Widad Mustafa El Hadi. 2019. “A Sustainable and Open Access Knowledge Organization Model to Preserve Cultural Heritage and Language Diversity”. Information, 10(10), 303.

Harvey, Todd, Andrew Peart and Nathan Salsburg. 2017. “Alan Lomax and the "Grass Roots" Idea”. In: Chicago Review, Vol. 60/61, No. 4/1: 37-45, Stable URL: https://www.jstor.org/stable/44820515

Hudon, Michèle. 1997. “Multilingual Thesaurus Construction-Integrating the Views of Different Cultures in One Gateway to Knowledge and Concepts”. In: Information Services and Use 17: 11–123.

Hudon, Michèle. 1998. “Information access in a multilingual and multicultural environment”. Congrès de l'American Society of Indexers. Seattle (WA).

Krauwer, Steven. 2003. “The basic language resource kit (blark) as the first milestone for the language resources roadmap”. In Proceedings of the International Workshop Speech and Computer.

López-Huertas, María. 2016. “The Integration of Culture in Knowledge Organization Systems.” In Advances in Knowledge Organization, Vol. 15: Knowledge Organization for a Sustainable World, Proceedings of the Fourteenth International ISKO Conference, Rio de Janeiro, Brazil, 13–28. International Society for Knowledge Organization.

Mustafa El Hadi, Widad. 2015. “Cultural Interoperability and Knowledge Organization Systems.” In Organização Do Conhecimento E Diversidade Cultural, Proceedings of the 3rd Brazilian ISKO-Conference, edited by José Augusto Chaves Guimarães and Vera Dodebei: 575–606. Marília, São Paulo: Fundação para o Desenvolvimento do Ensino, Pesquisa e Extensão (FUNDEPE).

Otlet, Paul. 1934. Traité de Documentation: Le livre sur le Livre: Théorie et Pratique, Mundaneum: Bruxelles, Belgium.

Ridge, Mia. (Ed.). 2014. Crowdsourcing our Cultural Heritage. Farnham: Ashgate.

Scannell, Kevin. 2007. “The crubadan project: Corpus building for under-resourced languages. In Building and Exploring Web Corpora”. In: Proceedings of the 3rd Web as Corpus Workshop: 5–15.

Teets, Michael and Matthew Goldner. 2013. “Libraries’ Role in Curating and Exposing Big Data”. Future Internet, 5: 429–438.

Van Hyning ,Victoria, Samantha Blickhan, Chris Lintott, and Laura Trouille. 2017. “Transforming Libraries and Archives through Crowdsourcing”. In: D-Lib Mag. 23(5/6) .

Van Hyning, Victoria. 2019. “Harnessing Crowdsourcing for Scholarly and GLAM Purposes”. Literature Compass, 16(3-4). Available at https://doi.org/10.1111/lic3.12507.

Williams, Alex C., John F. Wallin, Haoyu Yu, Marco Perale, Hyrum D. Carroll, Anne-Francoise Lamblin, Lucy Fortson, Dirk Obbink, Chris J. Lintott, and James H. Brusuelas. 2014. “A computational pipeline for crowdsourced transcriptions of ancient greek papyrus fragments”. In :Proceedings of the International Conference on Big Data:100–105.