Marja Vierros ; Erik Henriksson - Preprocessing Greek Papyri for Linguistic Annotation

jdmdh:1385 - Journal of Data Mining & Digital Humanities, 8 juin 2017, Numéro spécial sur le traitement assisté par ordinateur de l‘intertextualité dans les langues anciennes -
Preprocessing Greek Papyri for Linguistic AnnotationArticle

Auteurs : Marja Vierros ORCID1; Erik Henriksson 1

  • 1 Department of World Cultures

Greek documentary papyri form an important direct source for Ancient Greek. It has been exploited surprisingly little in Greek linguistics due to a lack of good tools for searching linguistic structures. This article presents a new tool and digital platform, “Sematia”, which enables transforming the digital texts available in TEI EpiDoc XML format to a format which can be morphologically and syntactically annotated (treebanked), and where the user can add new metadata concerning the text type, writer and handwriting of each act of writing. An important aspect in this process is to take into account the original surviving writing vs. the standardization of language and supplements made by the editors. This is performed by creating two different layers of the same text. The platform is in its early development phase. Ongoing and future developments, such as tagging linguistic variation phenomena as well as queries performed within Sematia, are discussed at the end of the article.

Volume : Numéro spécial sur le traitement assisté par ordinateur de l‘intertextualité dans les langues anciennes
Rubrique : Vers un écosystème numérique : NLP. Infrastructure de corpus. Méthodes de récupération des textes et de calcul des similarités de textes
Publié le : 8 juin 2017
Accepté le : 19 avril 2017
Soumis le : 26 février 2016
Mots-clés : JavaScript,Python,MySQL,TEI EpiDoc XML,Greek,papyri,linguistic annotation,treebank,dependency grammar,[SHS.CLASS] Humanities and Social Sciences/Classical studies,[SHS.LANGUE] Humanities and Social Sciences/Linguistics,[SHS.STAT] Humanities and Social Sciences/Methods and statistics

2 Documents citant cet article

Statistiques de consultation

Cette page a été consultée 5125 fois.
Le PDF de cet article a été téléchargé 1340 fois.