Intertextuality in Ancient Languages - Special Issue

Special Issue on

Computer-Aided Processing of Intertextuality in Ancient Languages

Edited by Marco BÜCHLER (Göttingen Centre for Digital Humanities, Germany) and Laurence MELLERIN (Sources Chrétiennes, HiSoMA, Lyon, France)

This special issue originates in the International workshop on computer aided­processing of intertextuality in ancient languages, held in Lyon (2nd-4th June 2014), coorganized by HiSoMA (UMR 5189, Lyon), LIRIS (UMR 5205, Villeurbanne) and the Göttingen Centre for Digital Humanities (e-TRAP), with the support of the National Research Agency (ANR Biblindex) and the Partner University Fund (PUF).

This workshop was initiated as the conclusive meeting of the ANR project BIBLINDEX, which aims at establishing an exhaustive statement of the biblical references found in the texts of the Late Antiquity and the Middle Ages. Were gathered computer scientists and digital humanists. The sessions presented the state of art regarding concepts and technics used to process quotations and text-reuses in ancient languages.

Thanks to the editorial system of the JDMDH, the proceedings of this workshop have been open to other contributions also dealing with intertextuality, linguistic preprocessing and the preservation of scholarly research results, specifically applied to corpora in Ancient Languages and for which few online resources exist (Ancient Greek, Latin, Hebrew, Syriac, Coptic, Arabic, Ethiopic, etc.).

Part 1: Towards a Digital Ecosystem: NLP. Corpus infrastructure. Methods for Retrieving Texts and Computing Text Similarities

  • Methods for the detection of intertexts and text reuse, manual (e.g. crowd-sourcing) or automatic (e.g. algorithms);
  • Infrastructure for the preservation of digital texts and quotations between different text passages;
  • Linguistic preprocessing and data normalisation, such as lemmatisation of historical languages, root stemming, normalisation of variants, etc.

1) Preprocessing Greek Papyri for Linguistic Annotation

Authors: Vierros, Marja and Henriksson, Erik

Greek documentary papyri form an important direct source for Ancient Greek. It has been exploited surprisingly little in Greek linguistics due to a lack of good tools for searching linguistic structures. This article presents a new tool and digital platform, " Sematia " , which enables transforming the digital texts available in TEI EpiDoc XML format to a format which can be morphologically and syntactically annotated (treebanked), and where the user can add new metadata concerning the text type, writer and handwriting of each act of writing. An important aspect in this process is to take into account the original surviving writing vs. the standardization of language and supplements made by the editors. This is performed by creating two different layers of the same text. The platform is in its early development phase. Future developments, such as tagging linguistic variation phenomena as well as queries performed within Sematia, are discussed at the end of the article.

2) From manuscript catalogues to a handbook of Syriac literature: Modeling an infrastructure for

Authors: Gibson, Nathan P. and Michelson, David A. and Schwartz, Daniel L.

Despite increasing interest in Syriac studies and growing digital availability of Syriac texts, there is currently no up-to-date infrastructure for discovering, identifying, classifying, and referencing works of Syriac literature. The standard reference work (Baumstark's Geschichte) is over ninety years old, and the perhaps 20,000 Syriac manuscripts extant worldwide can be accessed only through disparate catalogues and databases. The present article proposes a tentative data model for's New Handbook of Syriac Literature, an open-access digital publication that will serve as both an authority file for Syriac works and a guide to accessing their manuscript representations, editions, and translations. The authors hope that by publishing a draft data model they can receive feedback and incorporate suggestions into the next stage of the project.



Part 2: Managing different types of text re-uses

This part focuses on the conceptual definitions, the modelling of the unstable idea of “quotation” and the XML-TEI encoding to implement for its characterization.

Part 3: Visualisation of intertextuality and text reuse

Part 4: Project presentations

1) Dealing with all types of quotations (and their parallels) in a closed corpus: The methodology of the Project The literary tradition in the third and fourth centuries CE: Grammarians, rhetoricians and sophists as sources of Graeco-Roman literature

Authors: Rodríguez-Noriega, Lucía

The Project The literary tradition in the third and fourth centuries CE: Grammarians, rhetoricians and sophists as sources of Graeco-Roman literature (FFI2014-52808-C2-1-P) aims to trace and classify all types of quotations, both explicit (with or without mention of the author and/or title) and hidden, in a corpus comprising the Greek grammarians, rhetoricians and " sophists " of the third and fourth centuries CE. At the same time, we try to detect whether or not these are first-hand quotations, and if our quoting authors (28 in all) are, in turn, secondary sources for the same citations in later authors. We also study the philological (textual) aspects of the quotations in their context, and the problems of limits they sometimes pose. Finally, we are interested in the function of the quotation in the citing work. This is the first time that such a comprehensive study of this corpus is attempted. This paper explains our methodology, and how we store all these data in our electronic card-file.

2) Editing New Testament Arabic Manuscripts in a TEI-base: fostering close reading in Digital Humanities

Authors: Clivaz, Claire and Schulthess, Sara and Sankar, Martial

If one is convinced that " quantitative research provides data not interpretation " [Moretti, 2005, 9], close reading should thus be considered as not only the necessary bridge between big data and interpretation but also the core duty of the Humanities. To test its potential in a neglected field – the Arabic manuscripts of the Letters of Paul of Tarsus – an enhanced, digital edition has been in development as a progression of a Swiss National Fund project. This short paper presents the development of this edition and perspectives regarding a second project. Based on the Edition Visualization Technology tool, the digital edition provides a transcription of the Arabic text, a standardized and vocalized version, as well as French translation with all texts encoded in TEI XML. Thanks to another Swiss National Foundation subsidy, a new research project on the unique New Testament, trilingual (Greek-Latin-Arabic) manuscript, the Marciana Library Gr. Z. 11 (379), 12th century, is currently underway. This project includes new features such as " Textlink " , " Hotspot " and notes: HumaReC.


As this special issue allows continuous updates, it is still possible to add a contribution if you are working on these topics.


Marco Büchler: mbuechler(at)gcdh(dot)de

Laurence Mellerin: laurence.mellerin(at)mom(dot)fr