Corpus and Models for Lemmatisation and POS-tagging of Classical French
TheatreArticleAuteurs : Jean-Baptiste Camps

; Simon Gabay

; Paul Fièvre ; Thibault Clérice

; Florian Cafiero

0000-0003-0385-7037##0000-0001-9094-4475##NULL##0000-0003-1852-9204##0000-0002-1951-6942
Jean-Baptiste Camps;Simon Gabay;Paul Fièvre;Thibault Clérice;Florian Cafiero
This paper describes the process of building an annotated corpus and training models for classical French literature, with a focus on theatre, and particularly comedies in verse. It was originally developed as a preliminary step to the stylometric analyses presented in Cafiero and Camps [2019]. The use of a recent lemmatiser based on neural networks and a CRF tagger allows to achieve accuracies beyond the current state-of-the art on the in-domain test, and proves to be robust during out-of-domain tests, i.e.up to 20th c.novels.
Volume : 2021
Rubrique : Humanités numériques en langues
Publié le : 14 février 2021
Accepté le : 14 février 2021
Soumis le : 18 mai 2020
Mots-clés : Computer Science - Computation and Language