Atelier Digit\_Hum

1. A Web Application for Watermark Recognition

Oumayma Bounou ; Tom Monnier ; Ilaria Pastrolin ; Xi SHEN ; Christine Benevent ; Marie-Françoise Limon-Bonnet ; François Bougard ; Mathieu Aubry ; Marc H. Smith ; Olivier Poncet et al.
The study of watermarks is a key step for archivists and historians as it enables them to reveal the origin of paper. Although highly practical, automatic watermark recognition comes with many difficulties and is still considered an unsolved challenge. Nonetheless, Shen et al. [2019] recently introduced a new approach for this specific task which showed promising results. Building upon this approach, this work proposes a new public web application dedicated to automatic watermark recognition entitled Filigranes pour tous. The application not only hosts a detailed catalog of more than 17k watermarks manually collected from the French National Archives (Minutier central) or extracted from existing online resources (Briquet database), but it also enables non-specialists to identify a watermark from a simple photograph in a few seconds. Moreover, additional watermarks can easily be added by the users making the enrichment of the existing catalog possible through crowdsourcing. Our Web application is available at
Section: Data deluge: which skills for wich data?

2. The Artist Libraries Project in the Labex Les passés dans le présent

Félicie Faizand de Maupeou ; Ségolène Le Men.
The creation of the Artist Libraries Project was sparked by the observation that artist libraries are still not well known, yet many art historians are interested in this archive for the value it adds to understanding the person behind the artist and his or her creative process. The problem is that these libraries are rarely physically preserved. To remedy this dispersion, we built an online database and a website that house this valuable source in the form of lists of books and their electronic versions. First data on Monet's library have been made available, and several additional artist libraries from the 19 th and 20 th centuries are on the way for 2019. By gathering all these bibliographical data in a central database, it's possible to explore one library and to compare several. This article explains how we built the database and the website and how the implementation of those IT tools has raised questions about the use of this resource as an archive on the one hand, as well as its value for art history on the other.
Section: Digital libraries and virtual exhibitions

3. TraduXio Project: Latest Upgrades and Feedback

Philippe Lacour ; Aurélien Bénel.
TraduXio is a digital environment for computer assisted multilingual translation which is web-based, free to use and with an open source code. Its originality is threefold-whereas traditional technologies are limited to two languages (source/target), TraduXio enables the comparison of different versions of the same text in various languages; its concordancer provides relevant and multilingual suggestions through a classification of the source according to the history, genre and author; it uses collaborative devices (privilege management, forums, networks, history of modification, etc.) to promote collective (and distributed) translation. TraduXio is designed to encourage the diversification of language learning and to promote a reappraisal of translation as a professional skill. It can be used in many different ways, by very diverse kind of people. In this presentation, I will present the recent developments of the software (its version 2.1) and illustrate how specific groups (language teaching, social sciences, literature) use it on a regular basis. In this paper, I present the technology but concentrate more on the possible uses of TraduXio, thus focusing on translators' feedback about their experience when working in this digital environment in a truly collaborative way.
Section: Digital humanities in languages

4. Mapping the Bentham Corpus: Concept-based Navigation

Pablo Ruiz Fabo ; Thierry Poibeau.
British philosopher and reformer Jeremy Bentham (1748-1832) left over 60,000 folios of unpublished manuscripts. The Bentham Project, at University College London, is creating a TEI version of the manuscripts, via crowdsourced transcription verified by experts. We present here an interface to navigate these largely unedited manuscripts, and the language technologies the corpus was enriched with to facilitate navigation, i.e Entity Linking against the DBpedia knowledge base and keyphrase extraction. The challenges of tagging a historical domain-specific corpus with a contemporary knowledge base are discussed. The concepts extracted were used to create interactive co-occurrence networks, that serve as a map for the corpus and help navigate it, along with a search index. These corpus representations were integrated in a user interface. The interface was evaluated by domain experts with satisfactory results , e.g. they found the distributional semantics methods exploited here applicable in order to assist in retrieving related passages for scholarly editing of the corpus.
Section: Data deluge: which skills for wich data?

5. Transcribing Foucault’s handwriting with Transkribus

Marie-Laure Massot ; Arianna Sforzini ; Vincent Ventresque.
The Foucault Fiches de Lecture (FFL) project aims both to explore and to make available online a large set of Michel Foucault’s reading notes (organized citations, references and comments) held at the BnF since 2013. Therefore, the team is digitizing, describing and enriching the reading notes that the philosopher gathered while preparing his books and lectures, thus providing a new corpus that will allow a new approach to his work. In order to release the manuscripts online, and to collectively produce the data, the team is also developing a collaborative platform, based on RDF technologies, and designed to link together archival content and bibliographic data. This project is financed by the ANR (2017-2020) and coordinated by Michel Senellart, professor of philosophy at the ENS Lyon. It benefits from the partnerships of the ENS/PSL and the BnF. In addition, a collaboration with the European READ/Transkribus project has been started so as to produce automatic transcription of the reading notes.
Section: Digital libraries and virtual exhibitions

6. Optical Recognition Assisted Transcription with Transkribus: The Experiment concerning Eugène Wilhelm's Personal Diary (1885-1951)

Régis Schlagdenhauffen.
This article proposes use the Transkribus software to report on a "user experiment" in a French-speaking context. It is based on the semi-automated transcription project using the diary of the jurist Eugène Wilhelm (1866-1951). This diary presents two main challenges. The first is related to the time covered by the writing process-66 years. This leads to variations in the form of the writing, which becomes increasingly "unreadable" with time. The second challenge is related to the concomitant use of two alphabets: Roman for everyday text and Greek for private issues. After presenting the project and the specificities related to the use of the tool, the experiment presented in this contribution is structured around two aspects. Firstly, I will summarise the main obstacles encountered and the solutions provided to overcome them. Secondly, I will come back to the collaborative transcription experiment carried out with students in the classroom, presenting the difficulties observed and the solutions found to overcome them. In conclusion, I will propose an assessment of the use of this Human Text Recognition software in a French-speaking context and in a teaching situation.
Section: Digital humanities in languages

7. The renewal of the digital humanities. An overview of the transformation of professions in the humanities and social sciences

Marie-Laure Massot ; Agnès Tricoche.
This article presents a study of the French-speaking digital humanities. It is based on the experience of two research engineers from the French National Center for Scientific Research (CNRS) who have been studying these issues for the last ten years. They conducted a survey at the École Normale Supérieure (ENS-Paris) which enabled them to draw up an overview of the transformation of the profession of humanities and social sciences research engineers in the context of the digital humanities. The Digit_Hum initiative, which they run in parallel with their respective activities at the ENS, also provided information for this overview thanks to its role as a space for discussion about the digital humanities along with training and structuring of this field at the ENS and the Université Paris Sciences & Lettres (PSL).
Section: Data deluge: which skills for wich data?

8. Publishing open-access bibliographical data on Ancient Greek and Latin texts: challenges, constraints, progression

Julie Giovacchini ; Laurent Capron.
We present here both some of our thoughts on methodology in relation to the specific constraints that complexify the ways of structuring and accessing bibliographical data in the Sciences of Antiquity, and the solutions adopted by the IPhiS-CIRIS project for dealing with these constraints. The project began in 2014 in a general scientific environment that was still being standardised and structured, with digital bibliographical resources in this disciplinary field becoming increasingly numerous, although of uneven quality and hard to access and/or private.
Section: Sciences of Antiquity and digital humanities

9. Contribution to the recent history of archaeology by using some digital humanities methods and techniques applied to field recording documents of an archaeological site excavated in 1970s

Christophe Tuffery.
This article presents the results of an archaeological archive research. Field recording documents from the Rivaux site in France, which was excavated from the 1970s to the 1990s, were exploited. After digitising a set of field notebook pages, the author developed an application, called Archeotext, which allows transcribing and georeferencing these documents. Some of the results obtained show new ways of exploiting this type of archive by using certain methods and techniques of the digital humanities.
Section: Sciences of Antiquity and digital humanities

10. ArchEthno - a new tool for sharing research materials and a new method for archiving your own research

Florence Weber ; Carlo Zwölf ; Arnaud Trouche ; Agnès Tricoche ; José Sastre.
The archiving of ethnographic material is generally considered a blind spot in ethnographic working methods which place more importance on actual investigations and analysis than on how archives are constructed. A team of computer scientists and ethnographers has built an initial tool for sharing ethnographic materials, based on an SQL relational data model that suited the first survey processed but proved difficult to transpose to other surveys. The team developed a new tool based on dynamic vocabularies of concepts which breaks down archiving into three stages. Firstly ethnographers can select and contextualise their survey materials; secondly they structure them in a database according to the research question discovered during their survey; finally, they share this data with other researchers subject to the opinion of an ethics committee whose members are competent in ethnography.
Section: Data deluge: which skills for wich data?

11. Being loyal to fieldwork: on building the "contract of silence"

Denisa Butnaru.
The aim of the present contribution is to analyze how relations of loyalty emerge between researcher and researched during ethnographic fieldwork and to defend a perspective against the principle of open science. I discuss methodological issues with respect to my several years of multi-sited fieldwork experience in various labs, research centers and medical institutions, during which I inquired into the design and use of exoskeletal devices. Exoskeletal devices are technologies applied to three fields of application: rehabilitation, industry and the armed forces. Their invention is the subject of high levels of economic and scientific competition. Given these constraints, I was compelled to develop "loyalty strategies", one of which I call the "contract of silence". I associate this category with an ethnographic exercise in how to address one's interlocutors during fieldwork. I conceive of this process as a result of consciously retaining the information obtained from interviewees that might endanger the position of the researcher in the field. Although a tacit contract with one's interlocutors during ethnographic fieldwork implies anonymity, certain sensitive fields and research situations require forms of auto-censorship and the control of published results. I associate these strategies with the fabrication of fieldwork secrecy.
Section: Data deluge: which skills for wich data?