Mike Kestemont ; Jeroen De Gussem - Integrated Sequence Tagging for Medieval Latin Using Deep Representation Learning

jdmdh:1398 - Journal of Data Mining & Digital Humanities, August 6, 2017, Special Issue on Computer-Aided Processing of Intertextuality in Ancient Languages - https://doi.org/10.46298/jdmdh.1398
Integrated Sequence Tagging for Medieval Latin Using Deep Representation LearningArticle

Authors: Mike Kestemont ; Jeroen De Gussem

    In this paper we consider two sequence tagging tasks for medieval Latin: part-of-speech tagging and lemmatization. These are both basic, yet foundational preprocessing steps in applications such as text re-use detection. Nevertheless, they are generally complicated by the considerable orthographic variation which is typical of medieval Latin. In Digital Classics, these tasks are traditionally solved in a (i) cascaded and (ii) lexicon-dependent fashion. For example, a lexicon is used to generate all the potential lemma-tag pairs for a token, and next, a context-aware PoS-tagger is used to select the most appropriate tag-lemma pair. Apart from the problems with out-of-lexicon items, error percolation is a major downside of such approaches. In this paper we explore the possibility to elegantly solve these tasks using a single, integrated approach. For this, we make use of a layered neural network architecture from the field of deep representation learning.


    Volume: Special Issue on Computer-Aided Processing of Intertextuality in Ancient Languages
    Section: Towards a Digital Ecosystem: NLP. Corpus infrastructure. Methods for Retrieving Texts and Computing Text Similarities
    Published on: August 6, 2017
    Accepted on: August 5, 2017
    Submitted on: August 4, 2017
    Keywords: Computer Science - Computation and Language,Computer Science - Learning,Statistics - Machine Learning

    2 Documents citing this article

    Consultation statistics

    This page has been seen 5238 times.
    This article's PDF has been downloaded 997 times.