Camps, Jean-Baptiste and Gabay, Simon and Fièvre, Paul and Clérice, Thibault and Cafiero, Florian - Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre

jdmdh:6485 - Journal of Data Mining & Digital Humanities, February 14, 2021, 2021
Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre

Authors: Camps, Jean-Baptiste and Gabay, Simon and Fièvre, Paul and Clérice, Thibault and Cafiero, Florian

This paper describes the process of building an annotated corpus and training models for classical French literature, with a focus on theatre, and particularly comedies in verse. It was originally developed as a preliminary step to the stylometric analyses presented in Cafiero and Camps [2019]. The use of a recent lemmatiser based on neural networks and a CRF tagger allows to achieve accuracies beyond the current state-of-the art on the in-domain test, and proves to be robust during out-of-domain tests, i.e.up to 20th c.novels.


Volume: 2021
Section: Digital humanities in languages
Published on: February 14, 2021
Submitted on: May 18, 2020
Keywords: Computer Science - Computation and Language


Share

Consultation statistics

This page has been seen 168 times.
This article's PDF has been downloaded 57 times.