2017

1. From manuscript catalogues to a handbook of Syriac literature: Modeling an infrastructure for Syriaca.org

Gibson, Nathan P. ; Michelson, David A. ; Schwartz, Daniel L..

Despite increasing interest in Syriac studies and growing digital availability of Syriac texts, there is currently no up-to-date infrastructure for discovering, identifying, classifying, and referencing works of Syriac literature. The standard reference work (Baumstark's Geschichte) is over ninety years old, and the perhaps 20,000 Syriac manuscripts extant worldwide can be accessed only through disparate catalogues and databases. The present article proposes a tentative data model for Syriaca.org's New Handbook of Syriac Literature, an open-access digital publication that will serve as both an authority file for Syriac works and a guide to accessing their manuscript representations, editions, and translations. The authors hope that by publishing a draft data model they can receive feedback and incorporate suggestions into the next stage of the project.

2. Editing New Testament Arabic Manuscripts in a TEI-base: fostering close reading in Digital Humanities

Clivaz, Claire ; Schulthess, Sara ; Sankar, Martial.

If one is convinced that " quantitative research provides data not interpretation " [Moretti, 2005, 9], close reading should thus be considered as not only the necessary bridge between big data and interpretation but also the core duty of the Humanities. To test its potential in a neglected field – the Arabic manuscripts of the Letters of Paul of Tarsus – an enhanced, digital edition has been in development as a progression of a Swiss National Fund project. This short paper presents the development of this edition and perspectives regarding a second project. Based on the Edition Visualization Technology tool, the digital edition provides a transcription of the Arabic text, a standardized and vocalized version, as well as French translation with all texts encoded in TEI XML. Thanks to another Swiss National Foundation subsidy, a new research project on the unique New Testament, trilingual (Greek-Latin-Arabic) manuscript, the Marciana Library Gr. Z. 11 (379), 12th century, is currently underway. This project includes new features such as " Textlink " , " Hotspot " and notes: HumaReC.

Rubrique : Présentations de projets

3. Dealing with all types of quotations (and their parallels) in a closed corpus: The methodology of the Project The literary tradition in the third and fourth centuries CE: Grammarians, rhetoricians and sophists as sources of Graeco-Roman literature

Rodríguez-Noriega, Lucía.

The Project The literary tradition in the third and fourth centuries CE: Grammarians, rhetoricians and sophists as sources of Graeco-Roman literature (FFI2014-52808-C2-1-P) aims to trace and classify all types of quotations, both explicit (with or without mention of the author and/or title) and hidden, in a corpus comprising the Greek grammarians, rhetoricians and " sophists " of the third and fourth centuries CE. At the same time, we try to detect whether or not these are first-hand quotations, and if our quoting authors (28 in all) are, in turn, secondary sources for the same citations in later authors. We also study the philological (textual) aspects of the quotations in their context, and the problems of limits they sometimes pose. Finally, we are interested in the function of the quotation in the citing work. This is the first time that such a comprehensive study of this corpus is attempted. This paper explains our methodology, and how we store all these data in our electronic card-file.

Rubrique : Présentations de projets

4. Version Variation Visualization (VVV): Case Studies on the Hebrew Haggadah in English

Cheesman, Tom ; Roos, Avraham.

The ‘Version Variation Visualization’ project has developed online tools to support comparative, algorithm-assisted investigations of a corpus of multiple versions of a text, e.g. variants, translations, adaptations (Cheesman, 2015, 2016; Cheesman et al., 2012, 2012-13, 2016; Thiel, 2014; links: www.tinyurl.com/vvvex). A segmenting and aligning tool allows users to 1) define arbitrary segment types, 2) define arbitrary text chunks as segments, and 3) align segments between a ‘base text’ (a version of the ‘original’ or translated text), and versions of it. The alignment tool can automatically align recurrent defined segment types in sequence.Several visual interfaces in the prototype installation enable exploratory access to parallel versions, to comparative visual representations of versions’ alignment with the base text, and to the base text visually annotated by an algorithmic analysis of variation among versions of segments. Data can be filtered, viewed and exported in diverse ways. Many more modes of access and analysis can be envisaged. The tool is language neutral. Experiments so far mostly use modern texts: German Shakespeare translations. Roos is working on a collection of approx. 100 distinct English-language translations of a Hebrew text with ancient Hebrew and Aramaic passages: the Haggadah (Roos, 2015)

Rubrique : Visualisation de l'intertextualité et de la réutilisation des textes

5. Measuring and Mapping Intergeneric Allusion in Latin Poetry using Tesserae

Burns, Patrick J..

Most intertextuality in classical poetry is unmarked, that is, it lacks objective signposts to make readers aware of the presence of references to existing texts. Intergeneric relationships can pose a particular problem as scholarship has long privileged intertextual relationships between works of the same genre. This paper treats the influence of Latin love elegy on Lucan’s epic poem, Bellum Civile, by looking at two features of unmarked intertextuality: frequency and distribution. I use the Tesserae project to generate a dataset of potential intertexts between Lucan’s epic and the elegies of Tibullus, Propertius, and Ovid, which are then aggregrated and mapped in Lucan’s text. This study draws two conclusions: 1. measurement of intertextual frequency shows that the elegists contribute fewer intertexts than, for example, another epic poem (Virgil’s Aeneid), though far more than the scholarly record on elegiac influence in Lucan would suggest; and 2. mapping the distribution of intertexts confirms previous scholarship on the influence of elegy on the Bellum Civile by showing concentrations of matches, for example, in Pompey and Cornelia’s meeting before Pharsalus (5.722-815) or during the affair between Caesar and Cleopatra (10.53-106). By looking at both frequency and proportion, we can demonstrate systematically the generic enrichment of Lucan’s Bellum Civile with respect to Latin love elegy.

Rubrique : Vers un écosystème numérique : NLP. Infrastructure de corpus. Méthodes de récupération des textes et de calcul des similarités de textes

6. QuotationFinder - Searching for Quotations and Allusions in Greek and Latin Texts and Establishing the Degree to Which a Quotation or Allusion Matches Its Source

Herren, Luc.

The software programs generally used with the TLG (Thesaurus Linguae Graecae) and the CLCLT (CETEDOC Library of Christian Latin Texts) CD-ROMs are not well suited for finding quotations and allusions. QuotationFinder uses more sophisticated criteria as it ranks search results based on how closely they match the source text, listing search results with literal quotations first and loose verbal parallels last.

Rubrique : Présentations de projets

7. Integrated Sequence Tagging for Medieval Latin Using Deep Representation Learning

Kestemont, Mike ; De Gussem, Jeroen.

In this paper we consider two sequence tagging tasks for medieval Latin: part-of-speech tagging and lemmatization. These are both basic, yet foundational preprocessing steps in applications such as text re-use detection. Nevertheless, they are generally complicated by the considerable orthographic variation which is typical of medieval Latin. In Digital Classics, these tasks are traditionally solved in a (i) cascaded and (ii) lexicon-dependent fashion. For example, a lexicon is used to generate all the potential lemma-tag pairs for a token, and next, a context-aware PoS-tagger is used to select the most appropriate tag-lemma pair. Apart from the problems with out-of-lexicon items, error percolation is a major downside of such approaches. In this paper we explore the possibility to elegantly solve these tasks using a single, integrated approach. For this, we make use of a layered neural network architecture from the field of deep representation learning.

Rubrique : Vers un écosystème numérique : NLP. Infrastructure de corpus. Méthodes de récupération des textes et de calcul des similarités de textes

8. Bioinformatics and Classical Literary Study

Chaudhuri, Pramit ; Dexter, Joseph P..

This paper describes the Quantitative Criticism Lab, a collaborative initiative between classicists, quantitative biologists, and computer scientists to apply ideas and methods drawn from the sciences to the study of literature. A core goal of the project is the use of computational biology, natural language processing, and machine learning techniques to investigate authorial style, intertextuality, and related phenomena of literary significance. As a case study in our approach, here we review the use of sequence alignment, a common technique in genomics and computational linguistics, to detect intertextuality in Latin literature. Sequence alignment is distinguished by its ability to find inexact verbal similarities, which makes it ideal for identifying phonetic echoes in large corpora of Latin texts. Although especially suited to Latin, sequence alignment in principle can be extended to many other languages.

Rubrique : Présentations de projets

9. Computer - Assisted Processing of Intertextuality in Ancient Languages

Hedges, Mark ; Jordanous, Anna ; Lawrence, K. Faith ; Roueché, Charlotte ; Tupman, Charlotte.

The production of digital critical editions of texts using TEI is now a widely-adopted procedure within digital humanities. The work described in this paper extends this approach to the publication of gnomologia (anthologies of wise sayings) , which formed a widespread literary genre in many cultures of the medieval Mediterranean. These texts are challenging because they were rarely copied straightforwardly ; rather , sayings were selected , reorganised , modified or re-attributed between manuscripts , resulting in a highly interconnected corpus for which a standard approach to digital publication is insufficient. Focusing on Greek and Arabic collections , we address this challenge using semantic web techniques to create an ecosystem of texts , relationships and annotations , and consider a new model – organic , collaborative , interconnected , and open-ended – of what constitutes an edition. This semantic web-based approach allows scholars to add their own materials and annotations to the network of information and to explore the conceptual networks that arise from these interconnected sayings .

Rubrique : Présentations de projets

10. Digital Greek Patristic Catena (DGPC). A brief presentation

Paparnakis, Athanasios ; Domouchtsis, Constantinos.

The project is to develop a database, which is planned to include all available information on the use of the Bible in the patristic works of Migne's Patrologia Graeca. Utilization of the data will be available through a web page equipped with necessary tools for developing data mining techniques and other methods of analysis. The main aim of the project is to revive the catenae, the ancient exegetical tool for biblical interpretation.

Rubrique : Présentations de projets

11. A Classification of Manuscripts Based on A New Quantitative Method. The Old Latin Witnesses of John's Gospel as Text Case

Pastorelli, David.

A new method for grouping manuscripts in clusters is presented with the calculation of distances between readings, then between witnesses. A classification algorithm (" Hierarchical Ascendant Clustering "), achieved through computer-aided processing, enables the construction of trees illustrating the textual taxonomy obtained. This method is applied to the Old Latin witnesses of the Gospel of John, and, in order to provide a study of a reasonable size, to a chapter as a whole (chapter 14). The result basically confirms the text-types identified by Bonatius Fischer, founder of the Vetus Latina Institute, while it invalidates the classification adopted by the current edition of the Vetus Latina of the Gospel of John.

Rubrique : Gestion de différents types de réutilisations de textes

12. TEI-encoding of text reuses in the BIBLINDEX Project

Hue-Gay, Elysabeth ; Mellerin, Laurence ; Morlock, Emmanuelle.

This paper discusses markup strategies for the identification and description of text reuses in a corpus of patristic texts related to the BIBLINDEX Project, an online index of biblical references in Early Christian Literature. In addition to the development of a database that can be queried by canonical biblical or patristic references, a sample corpus of patristic texts has been encoded following the guidelines of the TEI (Text Encoding Initiative), in order to provide direct access to quoted and quoting text passages to the users of the https://www.biblindex.info platform.

Rubrique : Gestion de différents types de réutilisations de textes

13. Interactive Tools and Tasks for the Hebrew Bible: From Language Learning to Textual Criticism

Winther-Nielsen, Nicolai.

This contribution to a special issue on “Computer-aided processing of intertextuality” in ancient texts will illustrate how using digital tools to interact with the Hebrew Bible offers new promising perspectives for visualizing the texts and for performing tasks in education and research. This contribution explores how the corpus of the Hebrew Bible created and maintained by the Eep Talstra Centre for Bible and Computer can support new methods for modern knowledge workers within the field of digital humanities and theology be applied to ancient texts, and how this can be envisioned as a new field of digital intertextuality. The article first describes how the corpus was used to develop the Bible Online Learner as a persuasive technology to enhance language learning with, in, and around a database that acts as the engine driving interactive tasks for learners. Intertextuality in this case is a matter of active exploration and ongoing practice. Furthermore, interactive corpus-technology has an important bearing on the task of textual criticism as a specialized area of research that depends increasingly on the availability of digital resources. Commercial solutions developed by software companies like Logos and Accordance offer a market-based intertextuality defined by the production of advanced digital resources for scholars and students as useful alternatives to often inaccessible and expensive printed versions. It is reasonable to expect that in the future interactive […]

Rubrique : Vers un écosystème numérique : NLP. Infrastructure de corpus. Méthodes de récupération des textes et de calcul des similarités de textes

14. Intertextual Pointers in the Text Alignment Network

Kalvesmaki, Joel.

The Text Alignment Network (TAN) is a suite of XML encoding formats intended to serve anyone who wishes to encode, exchange, and study multiple versions of texts (e.g., translations, paraphrases), and annotations on those texts (e.g., quotations, word-for-word correspondences). This article focuses on TAN’s innovative intertextual pointers, which, I argue, provide an unprecedented level of readability, interoperability, and semantic context. Because TAN is a new, experimental format, this article provides a brief introduction to the format and concludes with comments on progress and future prospects.

Rubrique : Présentations de projets

15. Encoding (inter)textual insertions in Latin "grammatical commentary"

Bureau, Bruno ; Nicolas, Christian ; Pinche, Ariane.

The ancient commentaries provide a large sample of quotations from classical or biblical texts for which Latin gramamrians developed a complex system of insertion of quoted texts. The paper examines how to encode these places using XML Tei, and focuses on difficult cases, such as inaccurate quotations, or quotations of partly or wholly lost texts.

Rubrique : Gestion de différents types de réutilisations de textes