Atelier Digit\_Hum

1. A Web Application for Watermark Recognition

Bounou, Oumayma ; Monnier, Tom ; Pastrolin, Ilaria ; SHEN, Xi ; Benevent, Christine ; Limon-Bonnet, Marie-Françoise ; Bougard, François ; Aubry, Mathieu ; Smith, Marc H. ; Poncet, Olivier ; Raverdy, Pierre-Guillaume.

The study of watermarks is a key step for archivists and historians as it enables them to reveal the origin of paper. Although highly practical, automatic watermark recognition comes with many difficultiesand is still considered an unsolved challenge. Nonetheless, Shen et al. [2019] recently introduced a newapproach for this specific task which showed promising results. Building upon this approach, this workproposes a new public web application dedicated to automatic watermark recognition entitled Filigranespour tous. The application not only hosts a detailed catalog of more than 17k watermarks manually collected from the French National Archives (Minutier central) or extracted from existing online resources(Briquet database), but it also enables non-specialists to identify a watermark from a simple photographin a few seconds. Moreover, additional watermarks can easily be added by the users making the enrichment of the existing catalog possible through crowdsourcing. Our Web application is available athttp://filigranes.inria.fr/.

Rubrique : Déluge de données : quelles compétences pour quelles données ?

2. The Artist Libraries Project in the Labex Les passés dans le présent

Faizand de Maupeou, Félicie ; Le Men, Ségolène.

The creation of the Artist Libraries Project was sparked by the observation that artist libraries are still not well known, yet many art historians are interested in this archive for the value it adds to understanding the person behind the artist and his or her creative process. The problem is that these libraries are rarely physically preserved. To remedy this dispersion, we built an online database and a website www.lesbibliothequesdartistes.org that house this valuable source in the form of lists of books and their electronic versions. First data on Monet's library have been made available, and several additional artist libraries from the 19 th and 20 th centuries are on the way for 2019. By gathering all these bibliographical data in a central database, it's possible to explore one library and to compare several. This article explains how we built the database and the website and how the implementation of those IT tools has raised questions about the use of this resource as an archive on the one hand, as well as its value for art history on the other.

Rubrique : Bibliothèques numériques et expositions virtuelles

3. TraduXio Project: Latest Upgrades and Feedback

Lacour, Philippe ; Bénel, Aurélien.

TraduXio is a digital environment for computer assisted multilingual translation which is web-based, free to use and with an open source code. Its originality is threefold-whereas traditional technologies are limited to two languages (source/target), TraduXio enables the comparison of different versions of the same text in various languages; its concordancer provides relevant and multilingual suggestions through a classification of the source according to the history, genre and author; it uses collaborative devices (privilege management, forums, networks, history of modification, etc.) to promote collective (and distributed) translation. TraduXio is designed to encourage the diversification of language learning and to promote a reappraisal of translation as a professional skill. It can be used in many different ways, by very diverse kind of people. In this presentation, I will present the recent developments of the software (its version 2.1) and illustrate how specific groups (language teaching, social sciences, literature) use it on a regular basis. In this paper, I present the technology but concentrate more on the possible uses of TraduXio, thus focusing on translators' feedback about their experience when working in this digital environment in a truly collaborative way.

Rubrique : Humanités numériques en langues

4. Mapping the Bentham Corpus: Concept-based Navigation

Ruiz Fabo, Pablo ; Poibeau, Thierry.

British philosopher and reformer Jeremy Bentham (1748-1832) left over 60,000 folios of unpublished manuscripts. The Bentham Project, at University College London, is creating a TEI version of the manuscripts, via crowdsourced transcription verified by experts. We present here an interface to navigate these largely unedited manuscripts, and the language technologies the corpus was enriched with to facilitate navigation, i.e Entity Linking against the DBpedia knowledge base and keyphrase extraction. The challenges of tagging a historical domain-specific corpus with a contemporary knowledge base are discussed. The concepts extracted were used to create interactive co-occurrence networks, that serve as a map for the corpus and help navigate it, along with a search index. These corpus representations were integrated in a user interface. The interface was evaluated by domain experts with satisfactory results , e.g. they found the distributional semantics methods exploited here applicable in order to assist in retrieving related passages for scholarly editing of the corpus.

Rubrique : Déluge de données : quelles compétences pour quelles données ?

5. Transcrire l'écriture de Foucault avec Transkribus

Massot, Marie-Laure ; Sforzini, Arianna ; Ventresque, Vincent.

The Foucault Fiches de Lecture (FFL) project aims both to explore and to make available online a large set of Michel Foucault’s reading notes (organized citations, references and comments) held at the BnF since 2013. Therefore, the team is digitizing, describing and enriching the reading notes that the philosopher gathered while preparing his books and lectures, thus providing a new corpus that will allow a new approach to his work. In order to release the manuscripts online, and to collectively produce the data, the team is also developing a collaborative platform, based on RDF technologies, and designed to link together archival content and bibliographic data. This project is financed by the ANR (2017-2020) and coordinated by Michel Senellart, professor of philosophy at the ENS Lyon. It benefits from the partnerships of the ENS/PSL and the BnF. In addition, a collaboration with the European READ/Transkribus project has been started so as to produce automatic transcription of the reading notes.

Rubrique : Bibliothèques numériques et expositions virtuelles

6. Transcription assistée par reconnaissance optique avec Transkribus : L’expérience du journal intime d’Eugène Wilhelm (1885-1951)

Schlagdenhauffen, Régis.

Cet article propose de restituer une « expérience utilisateur » du logiciel Transkribus en contexte francophone. Il s’appuie sur le projet de transcription semi-automatisée du journal intime du juriste Eugène Wilhelm (1866-1951). Ce journal comporte deux défis principaux : le premier est lié à la durée de la rédaction, 66 années, qui engendre des variations dans la forme de l’écriture, cette dernière devenant de plus en plus « illisible » le temps passant. Le second défi est lié à l’emploi concomitant de deux alphabets ; romain pour tout ce qui relève du quotidien et grec pour le for privé.L’expérience utilisateur restituée dans cette contribution s’articule autour de deux aspects. Dans un premier temps, après avoir présenté le projet et les spécificités liées à l’usage de l’outil, les principaux obstacles rencontrés et les solutions apportées pour y remédier seront synthétisés. Puis, je reviendrai sur l’expérience collaborative de transcription conduite avec des étudiants en salle de cours en présentant les difficultés observées et les solutions trouvées pour y remédier. En conclusion, je proposerai un bilan relatif à l’utilisation de ce logiciel d’HTR (Human Text Recognition) en contexte francophone et en situation d’enseignement

Rubrique : Humanités numériques en langues

7. Les humanités numériques en renouvellement. Panorama sur la transformation des métiersen sciences humaines et sociales

Massot, Marie-Laure ; Tricoche, Agnès.

Cet article est une réflexion sur les humanités numériques en contexte francophone. Elle s’appuie sur l'expérience de deux ingénieures du Centre National de la Recherche Scientifique travaillant sur ces questions depuis une dizaine d'années. À travers l'enquête qu'elles ont menée à l'École normale supérieure (ENS-Paris), elles dressent un panorama de la transformation du métier d'ingénieur(e) en sciences humaines et sociales dans le contexte des humanités numériques. L'initiative Digit_Hum, qu'elles animent en parallèle de leurs activités respectives à l'École, nourrit également ce témoignage en constituant un espace de discussions, de formations et de structuration des humanités numériques au sein de l'ENS et de l’Université Paris Sciences & Lettres.

Rubrique : Déluge de données : quelles compétences pour quelles données ?

8. Publishing open-access bibliographical data on Ancient Greek and Latin texts: challenges, constraints, progression

Giovacchini, Julie ; Capron, Laurent.

We present here both some of our thoughts on methodology in relation to the specific constraints that complexify the ways of structuring and accessing bibliographical data in the Sciences of Antiquity, and the solutions adopted by the IPhiS-CIRIS project for dealing with these constraints. The project began in 2014 in a general scientific environment that was still being standardised and structured, with digital bibliographical resources in this disciplinary field becoming increasingly numerous, although of uneven quality and hard to access and/or private.

Rubrique : Sciences de l'Antiquité et humanités numériques

9. Contribution to the recent history of archaeology by using some digital humanities methods and techniques applied to field recording documents of an archaeological site excavated in 1970s

Tuffery, Christophe.

This article presents the results of an archaeological archive research project. Field recording documents from the Rivaux site in France which was excavated from the 1970s to the 1990s were exploited. After digitising a set of field notebook pages, the author developed an application called Archeotext which enables these documents to be transcribed and georeferenced. Some of the results obtained show new ways of exploiting this type of archive by using certain methods and techniques from the digital humanities.

Rubrique : Sciences de l'Antiquité et humanités numériques

10. ArchEthno - a new tool for sharing research materials and a new method for archiving your own research

Weber, Florence ; Zwölf, Carlo ; Trouche, Arnaud ; Tricoche, Agnès ; Sastre, José.

The archiving of ethnographic material is generally considered a blind spot in ethnographic working methods which place more importance on actual investigations and analysis than on how archives are constructed. A team of computer scientists and ethnographers has built an initial tool for sharing ethnographic materials, based on an SQL relational data model that suited the first survey processed but proved difficult to transpose to other surveys. The team developed a new tool based on dynamic vocabularies of concepts which breaks down archiving into three stages. Firstly ethnographers can select and contextualise their survey materials; secondly they structure them in a database according to the research question discovered during their survey; finally, they share this data with other researchers subject to the opinion of an ethics committee whose members are competent in ethnography.

Rubrique : Déluge de données : quelles compétences pour quelles données ?

11. Being loyal to fieldwork: on building the "contract of silence"

Butnaru, Denisa.

The aim of the present contribution is to analyze how relations of loyalty emerge between researcher and researched during ethnographic fieldwork and to defend a perspective against the principle of open science. I discuss methodological issues with respect to my several years of multi-sited fieldwork experience in various labs, research centers and medical institutions, during which I inquired into the design and use of exoskeletal devices. Exoskeletal devices are technologies applied to three fields of application: rehabilitation, industry and the armed forces. Their invention is the subject of high levels of economic and scientific competition. Given these constraints, I was compelled to develop "loyalty strategies", one of which I call the "contract of silence". I associate this category with an ethnographic exercise in how to address one's interlocutors during fieldwork. I conceive of this process as a result of consciously retaining the information obtained from interviewees that might endanger the position of the researcher in the field. Although a tacit contract with one's interlocutors during ethnographic fieldwork implies anonymity, certain sensitive fields and research situations require forms of auto-censorship and the control of published results. I associate these strategies with the fabrication of fieldwork secrecy.

Rubrique : Déluge de données : quelles compétences pour quelles données ?

12. Notebook and Open science : toward more FAIR play

Le Béchec, Mariannig ; Gruson-Daniel, Célya ; Lascombes, Clémence ; Schultz, Émilien.

Notebooks are now commonly used in digital research practices. Despite their increasing ubiquity, the characteristics, roles, and uses associated with notebooks have seldom been studied from a social science perspective. In this article, we present an overview of the available empirical work on notebooks in order to describe existing practices, typologies crafted to grasp their diversity, and their limitations when used in data analysis workflows. Following this review, which highlights a focus of studies on interactive computational notebooks specifically within data science rather than research practices in academic contexts, we discuss the role of notebooks as a vector and lever for the FAIR (Findable, Accessible, Interoperable, Reusable) principles associated with open science.