This paper presents some computer tools and linguistic resources of the GREgORI project. These developments allow automated processing of texts written in the main languages of the Christian Middel East, such as Greek, Arabic, Syriac, Armenian and Georgian. The main goal is to provide scholars with tools (lemmatized indexes and concordances) making corpus-based linguistic information available. It focuses on the questions of text processing, lemmatization, information retrieval, and bitext alignment.
The Text Alignment Network (TAN) is a suite of XML encoding formats intended to serve anyone who wishes to encode, exchange, and study multiple versions of texts (e.g., translations, paraphrases), and annotations on those texts (e.g., quotations, word-for-word correspondences). This article focuses on TAN’s innovative intertextual pointers, which, I argue, provide an unprecedented level of readability, interoperability, and semantic context. Because TAN is a new, experimental format, this article provides a brief introduction to the format and concludes with comments on progress and future prospects.
The project is to develop a database, which is planned to include all available information on the use of the Bible in the patristic works of Migne's Patrologia Graeca. Utilization of the data will be available through a web page equipped with necessary tools for developing data mining techniques and other methods of analysis. The main aim of the project is to revive the catenae, the ancient exegetical tool for biblical interpretation.
The production of digital critical editions of texts using TEI is now a widely-adopted procedure within digital humanities. The work described in this paper extends this approach to the publication of gnomologia (anthologies of wise sayings) , which formed a widespread literary genre in many cultures of the medieval Mediterranean. These texts are challenging because they were rarely copied straightforwardly ; rather , sayings were selected , reorganised , modified or re-attributed between manuscripts , resulting in a highly interconnected corpus for which a standard approach to digital publication is insufficient. Focusing on Greek and Arabic collections , we address this challenge using semantic web techniques to create an ecosystem of texts , relationships and annotations , and consider a new model – organic , collaborative , interconnected , and open-ended – of what constitutes an edition. This semantic web-based approach allows scholars to add their own materials and annotations to the network of information and to explore the conceptual networks that arise from these interconnected sayings .
This paper describes the Quantitative Criticism Lab, a collaborative
initiative between classicists, quantitative biologists, and computer
scientists to apply ideas and methods drawn from the sciences to the study of
literature. A core goal of the project is the use of computational biology,
natural language processing, and machine learning techniques to investigate
authorial style, intertextuality, and related phenomena of literary
significance. As a case study in our approach, here we review the use of
sequence alignment, a common technique in genomics and computational
linguistics, to detect intertextuality in Latin literature. Sequence alignment
is distinguished by its ability to find inexact verbal similarities, which
makes it ideal for identifying phonetic echoes in large corpora of Latin texts.
Although especially suited to Latin, sequence alignment in principle can be
extended to many other languages.
The software programs generally used with the TLG (Thesaurus Linguae Graecae)
and the CLCLT (CETEDOC Library of Christian Latin Texts) CD-ROMs are not well
suited for finding quotations and allusions. QuotationFinder uses more
sophisticated criteria as it ranks search results based on how closely they
match the source text, listing search results with literal quotations first and
loose verbal parallels last.
The Project The literary tradition in the third and fourth centuries CE: Grammarians, rhetoricians and sophists as sources of Graeco-Roman literature (FFI2014-52808-C2-1-P) aims to trace and classify all types of quotations, both explicit (with or without mention of the author and/or title) and hidden, in a corpus comprising the Greek grammarians, rhetoricians and " sophists " of the third and fourth centuries CE. At the same time, we try to detect whether or not these are first-hand quotations, and if our quoting authors (28 in all) are, in turn, secondary sources for the same citations in later authors. We also study the philological (textual) aspects of the quotations in their context, and the problems of limits they sometimes pose. Finally, we are interested in the function of the quotation in the citing work. This is the first time that such a comprehensive study of this corpus is attempted. This paper explains our methodology, and how we store all these data in our electronic card-file.
If one is convinced that " quantitative research provides data not interpretation " [Moretti, 2005, 9], close reading should thus be considered as not only the necessary bridge between big data and interpretation but also the core duty of the Humanities. To test its potential in a neglected field – the Arabic manuscripts of the Letters of Paul of Tarsus – an enhanced, digital edition has been in development as a progression of a Swiss National Fund project. This short paper presents the development of this edition and perspectives regarding a second project. Based on the Edition Visualization Technology tool, the digital edition provides a transcription of the Arabic text, a standardized and vocalized version, as well as French translation with all texts encoded in TEI XML. Thanks to another Swiss National Foundation subsidy, a new research project on the unique New Testament, trilingual (Greek-Latin-Arabic) manuscript, the Marciana Library Gr. Z. 11 (379), 12th century, is currently underway. This project includes new features such as " Textlink " , " Hotspot " and notes: HumaReC.