Corpus and Models for Lemmatisation and POS-tagging of Classical French
TheatreArticleAuthors: Jean-Baptiste Camps

; Simon Gabay

; Paul Fièvre ; Thibault Clérice

; Florian Cafiero

0000-0003-0385-7037##0000-0001-9094-4475##NULL##0000-0003-1852-9204##0000-0002-1951-6942
Jean-Baptiste Camps;Simon Gabay;Paul Fièvre;Thibault Clérice;Florian Cafiero
This paper describes the process of building an annotated corpus and training models for classical French literature, with a focus on theatre, and particularly comedies in verse. It was originally developed as a preliminary step to the stylometric analyses presented in Cafiero and Camps [2019]. The use of a recent lemmatiser based on neural networks and a CRF tagger allows to achieve accuracies beyond the current state-of-the art on the in-domain test, and proves to be robust during out-of-domain tests, i.e.up to 20th c.novels.
Volume: 2021
Section: Digital humanities in languages
Published on: February 14, 2021
Accepted on: February 14, 2021
Submitted on: May 18, 2020
Keywords: Computer Science - Computation and Language