Jean-Baptiste Camps ; Simon Gabay ; Paul Fièvre ; Thibault Clérice ; Florian Cafiero
-
Corpus and Models for Lemmatisation and POS-tagging of Classical French
Theatre
This paper describes the process of building an annotated corpus and training
models for classical French literature, with a focus on theatre, and
particularly comedies in verse. It was originally developed as a preliminary
step to the stylometric analyses presented in Cafiero and Camps [2019]. The use
of a recent lemmatiser based on neural networks and a CRF tagger allows to
achieve accuracies beyond the current state-of-the art on the in-domain test,
and proves to be robust during out-of-domain tests, i.e.up to 20th c.novels.