Atelier Digit_Hum

1. A Web Application for Watermark Recognition

Bounou, Oumayma ; Monnier, Tom ; Pastrolin, Ilaria ; SHEN, Xi ; Benevent, Christine ; Limon-Bonnet, Marie-Françoise ; Bougard, François ; Aubry, Mathieu ; Smith, Marc H. ; Poncet, Olivier et al.
The study of watermarks is a key step for archivists and historians as it enables them to reveal the origin of paper. Although highly practical, automatic watermark recognition comes with many difficulties and is still considered an unsolved challenge. Nonetheless, Shen et al. [2019] recently introduced a new approach for this specific task which showed promising results. Building upon this approach, this work proposes a new public web application dedicated to automatic watermark recognition entitled Filigranes pour tous. The application not only hosts a detailed catalog of more than 17k watermarks manually collected from the French National Archives (Minutier central) or extracted from existing online resources (Briquet database), but it also enables non-specialists to identify a watermark from a simple photograph in a few seconds. Moreover, additional watermarks can easily be added by the users making the enrichment of the existing catalog possible through crowdsourcing. Our Web application is available at
Section: Data deluge: which skills for wich data?

2. The Artist Libraries Project in the Labex Les passés dans le présent

Faizand de Maupeou, Félicie ; Le Men, Ségolène.
The creation of the Artist Libraries Project was sparked by the observation that artist libraries are still not well known, yet many art historians are interested in this archive for the value it adds to understanding the person behind the artist and his or her creative process. The problem is that these libraries are rarely physically preserved. To remedy this dispersion, we built an online database and a website that house this valuable source in the form of lists of books and their electronic versions. First data on Monet's library have been made available, and several additional artist libraries from the 19 th and 20 th centuries are on the way for 2019. By gathering all these bibliographical data in a central database, it's possible to explore one library and to compare several. This article explains how we built the database and the website and how the implementation of those IT tools has raised questions about the use of this resource as an archive on the one hand, as well as its value for art history on the other.
Section: Digital libraries and virtual exhibitions

3. Mapping the Bentham Corpus: Concept-based Navigation

Ruiz Fabo , Pablo ; Poibeau , Thierry.
British philosopher and reformer Jeremy Bentham (1748-1832) left over 60,000 folios of unpublished manuscripts. The Bentham Project, at University College London, is creating a TEI version of the manuscripts, via crowdsourced transcription verified by experts. We present here an interface to navigate these largely unedited manuscripts, and the language technologies the corpus was enriched with to facilitate navigation, i.e Entity Linking against the DBpedia knowledge base and keyphrase extraction. The challenges of tagging a historical domain-specific corpus with a contemporary knowledge base are discussed. The concepts extracted were used to create interactive co-occurrence networks, that serve as a map for the corpus and help navigate it, along with a search index. These corpus representations were integrated in a user interface. The interface was evaluated by domain experts with satisfactory results , e.g. they found the distributional semantics methods exploited here applicable in order to assist in retrieving related passages for scholarly editing of the corpus.
Section: Data deluge: which skills for wich data?

4. Transcribing Foucault’s handwriting with Transkribus

Massot , Marie-Laure ; Sforzini , Arianna ; Ventresque , Vincent.
The Foucault Fiches de Lecture (FFL) project aims both to explore and to make available online a large set of Michel Foucault’s reading notes (organized citations, references and comments) held at the BnF since 2013. Therefore, the team is digitizing, describing and enriching the reading notes that the philosopher gathered while preparing his books and lectures, thus providing a new corpus that will allow a new approach to his work. In order to release the manuscripts online, and to collectively produce the data, the team is also developing a collaborative platform, based on RDF technologies, and designed to link together archival content and bibliographic data. This project is financed by the ANR (2017-2020) and coordinated by Michel Senellart, professor of philosophy at the ENS Lyon. It benefits from the partnerships of the ENS/PSL and the BnF. In addition, a collaboration with the European READ/Transkribus project has been started so as to produce automatic transcription of the reading notes.
Section: Digital libraries and virtual exhibitions

5. Optical Recognition Assisted Transcription with Transkribus: The Experiment concerning Eugène Wilhelm's Personal Diary (1885-1951)

Schlagdenhauffen, Régis.
This article proposes use the Transkribus software to report on a "user experiment" in a French-speaking context. It is based on the semi-automated transcription project using the diary of the jurist Eugène Wilhelm (1866-1951). This diary presents two main challenges. The first is related to the time covered by the writing process-66 years. This leads to variations in the form of the writing, which becomes increasingly "unreadable" with time. The second challenge is related to the concomitant use of two alphabets: Roman for everyday text and Greek for private issues. After presenting the project and the specificities related to the use of the tool, the experiment presented in this contribution is structured around two aspects. Firstly, I will summarise the main obstacles encountered and the solutions provided to overcome them. Secondly, I will come back to the collaborative transcription experiment carried out with students in the classroom, presenting the difficulties observed and the solutions found to overcome them. In conclusion, I will propose an assessment of the use of this Human Text Recognition software in a French-speaking context and in a teaching situation.
Section: Digital humanities in languages