EpiSearch. Identifying Ancient Inscriptions in Epigraphic Manuscripts

Hundreds of thousands of Greek and Latin inscriptions from the Roman world have survived until today, scattered across all continents and spanning fifteen centuries of ancient history. Epigraphic documents constitute an essential source of evidence for our knowledge of the ancient world and can be considered an authentic repository of big data. However, a significant number of inscriptions have not been preserved in their material form. In fact, their texts can only be recovered thanks to the so-called epigraphic manuscripts: crucial documents consisting of handwritten transcriptions made in post-classical times. In spite of their importance, these manuscripts have seldom received sufficient scholarly attention, and use of state-of-the-art digital tools for the study of the transcribed inscriptions is completely lacking. EpiSearch is a pilot project that explores the application of technologies deployed in the field of the Digital Humanities to recover the epigraphic evidence found in epigraphic manuscripts. Each step of the project is intended as a proof of concept, in view of a future large-scale and collaborative research plan. As a sample, we chose an epigraphic manuscript composed by the Venetian ecclesiastical antiquarian Giovanni Antonio Astori (1672-1743) and preserved in the Marciana Library in Venice: Marc. Lat.


The Epigraphic Cultures of the Ancient World, the Handwritten Tradition of Epigraphy and the EpiSearch project
Epigraphy is the science that studies communication through written media (for two different approaches to defining the term: Panciera 2012; Grossi 2016).In the ancient world, a variety of epigraphic cultures developed across the Near East and the Mediterranean for about four millennia (ca.3,400 BCE -600 CE), using different writing systems to reproduce numerous languages, none of which is still spoken today.In particular, from the 9th century BCE onwards, the spread of the alphabetic writing promoted by the Phoenician city-states led to the development of different scripts across the Greek speaking world, including Italy, until the Greek and Latin alphabets became predominant in the Hellenistic and Roman imperial times (Ferrara 2021).Under the Roman empire, writing became an extremely common social practice, which extended over a vast territory, developing across three continents (Europe, Asia, Africa) and embracing areas that until then had not produced any written documents.Thanks to the widespread diffusion of basic mass literacy, between the late 1st century BCE and the early 3rd century CE, the tendency to write simple and complex messages on an incredible variety of supports (not only wax tablets, papyrus and parchment, but also stone, wood, clay and metal objects, as well as walls, rocks and all sorts of surfaces) became a 'global' cultural practice and a privileged medium of social communication (Boyes, Steele, Astoreca 2021).In recent years, scholars have labelled this phenomenon as the 'epigraphic habit' of the Romans (MacMullen 1982;Beltrán Lloris 2015).Even if the majority of the epigraphic texts produced in the ancient world have been lost because of the perishable nature of the objects upon which they were written, hundreds of thousands of Greek and Latin inscriptions have survived until today (Mullen, Bowman 2021), along with a more limited number of texts written in fragmentary languages, such as Gaulish, Raetic, Celtiberian etc. (for a recent set of digital publications on these languages see http://aelaw.unizar.es/publications).This body of evidence represents a direct legacy of the writing cultures of the ancient world, a unique testimony which has come down to us without any mediation, unlike the texts of ancient literary authors, which have mostly survived thanks to copies made in the Middle Ages and later.Ancient inscriptions are now accessible through a vast collection of printed corpora and online digital resources, which offer information on a multitude of epigraphic sources that constitute a crucial category of documents for the study of ancient history, as well as an immense repository of big data (Rossi, De Santis 2018; Velázquez Soriano, Espinosa Espinosa 2021).However, not all the inscriptions that are known to us have been preserved in their material form.In fact, the texts of many of them can only be reconstructed from transcriptions that were made in post-classical times.These texts are documented by the so-called epigraphic manuscripts, which represent a rare kind of handwritten sources, but very rich in information, since they offer the only record for textual sources whose original supports have not survived (Buonocore 2015;Calvelli, Cresci Marrone, Buonopane 2019).Epigraphic manuscripts are also crucial for our knowledge of the lifecycle of ancient inscribed objects and monuments, meaning that they help us reconstruct the individual story of each inscription, from the moment when it was produced to the present, or to the time when it was last documented (in case it was later destroyed or disappeared).In spite of the invaluable amount of information that they can provide, in recent decades epigraphic manuscripts have substantially been neglected by scientific research.The common assumption is that all work on handwritten sources related to epigraphy was already carried out by the founding fathers of the discipline in the second half of the 19th century, but the advent of the Digital Humanities shows that this is not the case.The EpiSearch project explores the possibilities offered by state-of-the-art technologies to investigate epigraphic manuscripts with the aim of creating a system able to link the data recorded in handwritten texts and those registered in the main digital repositories of Greek and Latin inscriptions that are currently available online.These include the Searchable Greek Inscriptions tool promoted by the Packard Humanities Institute (PHI: https://inscriptions.packhum.org)and the databases of the international federation of epigraphic databases named EAGLE (Electronic Archive of Greek and Latin Epigraphy), in particular the Epigraphic Database Roma (EDR: http://www.edr-edr.it),which provides texts, bibliographic citations and descriptive data for Latin and Greek inscriptions from ancient Italy (including Sicily and Sardinia), the Epigraphische Datenbank Heidelberg (EDH: https://edh.ub.uni-heidelberg.de),which contains the texts of Latin and bilingual inscriptions from the provinces of the Roman Empire, and the Epigraphic Database Bari (EDB: https://www.edb.uniba.it),devoted to inscriptions promoted by the Christian community of ancient Rome.Another important dataset is offered by the Epigraphik Datenbank Clauss-Slaby (EDCS: http://www.manfredclauss.de),which supplies texts and bibliographic citations (lemmata of editions) for nearly all published Latin inscriptions.EpiSearch is conceived as the initial segment of a broader and more ambitious research project, of which it constitutes a proof of concept.As a working example, we identified an epigraphic manuscript written in Venice between the late 1600s and the early 1700s by a local antiquarian, named Giovanni Antonio Astori.The manuscript is currently kept at the Marciana National Library in Venice, where it is shelfmarked as MS Marc.Lat.XIV,200 (4336).The project has so far included three main steps.The first one involved the application of Handwritten Text Recognition (HTR) technologies for the automatic acquisition of the manuscript's contents.The second encompassed designing an integrated system, created by collecting data from the main online epigraphic databases; this gave us the possibility to match the inscriptions that are transcribed in the manuscript with their editions in digital resources.The last step will produce a visually annotated version of the manuscript with hyperlinks to the online databases for connecting the transcriptions of inscriptions with their current digital editions.The EpiSearch team includes Federico Boschetti: Institute for Computational Linguistics "A.Zampolli" -National Research Council of Italy (CNR-ILC), Pisa / Venice Centre for Digital and Public Humanities (VeDPH), Ca' Foscari University of Venice; Lorenzo Calvelli, PI: Department of Humanities and VeDPH, Ca' Foscari University of Venice; Franz Fischer: Department of Humanities and VeDPH, Ca' Foscari University of Venice; Daniele Fusi: VeDPH, Ca' Foscari University of Venice; Silvia Orlandi: Sapienza University of Rome; Thea Sommerschield: Department of Humanities, Ca' Foscari University of Venice; Tatiana Tommasi: Department of Humanities, Ca' Foscari University of Venice. [LC]

Giovanni Antonio Astori (Venice, 1672-1743) and His Epigraphic Manuscript
Giovanni Antonio Astori was a learned ecclesiastical antiquarian born in Venice in 1672.We are informed in detail about his life thanks to a highly reliable biography published by the Italian bibliographer Gian Maria Mazzuchelli in 1753 (Mazzuchelli 1753;cf. Cappelletti 2011, 261-2).Other important sources to understand Astori's interests are the letters exchanged within his cultural network; in fact, he was in contact with some of the most important intellectuals of his time and had the opportunity to exchange ideas on antiquarian subjects with them.For example, Astori was in contact with Ludovico Antonio Muratori (1672-1750), the author of the Novus thesaurus veterum inscriptionum (1739-1742).Of the epistolary exchange between Astori and Muratori, only Astori's letters have survived.They are 11 in total, dated from 1705 to 1709 and currently preserved in Modena, Biblioteca Estense Universitaria, Archivio Muratori, 37.10 (1 letter), 49.38 (1 letter), 52.1 (8 letters), 86.4c (1 letter).They were all edited by Di Campli in 1995 (Di Campli, Forlani 1995, 285-91); a digitized copy of 10 of them is also available online through the Internet Culturale web portal (https://www.internetculturale.it/).Thanks to all these sources, we know that Astori started his ecclesiastical career in the 1690s and, by the same time, was already skilful at transcribing ancient inscriptions.In fact, epigraphy had already become one of his main interests.Around 1697, Astori wrote his first antiquarian articles (see, for example, Astori 1697).These works clearly show that he was interested in analysing the ancient epigraphic monuments that were then visible in Venice, paying specific attention to both the material and palaeographical aspects of ancient Greek and Latin inscriptions.These features can also be detected in Astori's epigraphic manuscript.In the following years, Astori continued to study antiquarian subjects but was not able to accomplish any significant works in this field.He died in Venice in 1743.Astori's epigraphic manuscript was already known to Theodor Mommsen, who used it for compiling the Corpus inscriptionum Latinarum (CIL), in particular CIL V, pars I, Inscriptiones regionis Italiae decimae (Venetia et Histria), published in 1872 (as can be seen from CIL V, p. 205).The manuscript gained the attention of scholars only about one hundred years later, when it was mentioned in the studies on Venetian antiquarian collections carried out by Zorzi (1988, 90-1) and Favaretto (1990, 356-7, 384, 390).However, until now only one short, yet well-researched, article has been dedicated exclusively to Astori's epigraphic manuscript (Bodon 1996).Astori's codex is currently kept in Venice, Marciana National Library, with the following shelfmark: Marc.Lat.XIV,200 (4336).A summary description of it is provided by Zorzanello (1985, 273).It is a paper codex composed of IV front flyleaves + 11 leaves (with leaves 3bis, 6bis and 6ter having been added to the original quire) + II back flyleaves.The following leaves are blank: front flyleaves Ir, IIIv, IVv; 3bis v, 6bis r, 6ter v, 8r-11v; back flyleaves Ir-IIv.The original cover of the manuscript is still preserved as front flyleaf IIrv and back flyleaf Irv.In front flyleaves Ir, IIv, IIIr and IVr different hands wrote notes and bibliographical references.In particular, at front flyleaf IVr three Latin inscriptions were transcribed, two of which are ancient (CIL V 2180 and CIL V 2168), while the other is medieval (dated around the mid-14th century CE).Notes written by a hand different from Astori's are clearly visible also in some leaves of the epigraphic collection (for example at f. 2v nr.16 and at f. 4r nr.21); they were intended to update the locations where the inscriptions could be seen.From the analysis of its contents, the manuscript can be dated approximately between 1706 and 1713.The terminus post quem can be fixed thanks to the transcription of CIL V 2792 (f.1v nr.10), a Latin inscription from Montegrotto, near Padua, discovered not long before 1706 (Breve relazione 1706, 113).The terminus ante quem is based upon the transcription of CIL V 2151 (f.1r nr. 2), attested by Astori in the house of Bertucci Contarini.Considering that Bertucci died in 1713, it is possible to say that the transcription was produced before that year.However, considering other external sources, first of all Astori's and his contemporaries' epistolary exchanges, one may infer that Astori had already transcribed almost all of the inscriptions of the manuscript between 1700 and 1704 (cf.Zeno 1785, 90 and 222).In the following period Astori rearranged the materials collected and created the manuscript as it is still visible today.However, Astori never managed to give his epigraphic collection a definitive organisation and, therefore, it remained incomplete.The original front cover of the codex contains the title of the work, probably written by Astori himself: Inscriptiones Graecae et Lat(in)ae quae Venetiis reperiunt(ur) aut nondum editae, aut correctius si ab aliis vulgatae s(un)t, public(atae) nunc demum.This title clearly shows that Astori wanted to collect all the Greek and Latin inscriptions which were visible at his time in Venice and had not yet been published or had already been published with some mistakes.The contents of the manuscript are particularly interesting.In fact, Venice is a unique case of study.While the city and the lagoon sites around it did not develop on top of ancient Roman settlements, many ancient monuments, including Greek and Roman inscriptions, were reused as building materials or collected for antiquarian purposes (Calvelli 2018, 87-9).Yet, in the course of the past centuries, most of these inscribed monuments were displaced or destroyed and can no longer be seen.For this reason, the information contained in epigraphic manuscripts related to the Venice area is of fundamental value.Excluding the notes added by another hand, Astori's manuscript includes the transcriptions of 56 inscriptions, 33 of which are Greek and 23 are Latin.Of these inscriptions, 36 are still preserved, while 20 are lost or of unknown location.In Astori's manuscript all the inscriptions bear a sequential number in Arabic numerals, but there are some exceptions, and the numbering of the epigraphic texts is not always consistent.In particular, two inscriptions (CIG 802 at f. 4v and CIL V 3906 at f. 6v) are not numbered; one (Prioux 2002(Prioux -2003 at f. 6v at f. 6v) bears only a previous number and seven (CIG 2307a triple entry of the Corpus -and CIG 7002 at f. 6r; CIG 3239, CIL V 2191 and CIL V 2232 at f. 6v) bear two numbers, one in Arabic and the other in Roman numerals.Most of these exceptions are visible in f. 6rv, a part of the manuscript that the author rearranged; in fact, even if f. 6v is visible today, Astori tried to delete it by adding f. 6bis r.Moreover, the order in which the inscriptions are transcribed is not clear, making it difficult to understand precisely Astori's classification criteria.Astori transcribed inscriptions belonging especially to Venetian private and public collections (43 inscriptions), but also inscribed monuments which had been reused in the city of Venice ('epigraphic spolia': at least 8 inscriptions).For each inscription, Astori first specified its location; he then transcribed the epigraphic text in upper case, respecting the division of lines and, in many cases, reproducing some palaeographical features of the letters.For Greek inscriptions, Astori also wrote a translation in Latin.Moreover, in many cases (about half of the total), Astori made an ink drawing of the monument bearing the inscription, sometimes preceded by a preparatory pencil sketch.These drawings were often made on separate leaves and then stuck to the manuscript with sealing wax.The analysis of the characteristics of the manuscript makes it clear that we are dealing with preparatory materials.The codex can be considered as the last phase of a work in progress, which never reached the stage of a printed edition.However, the 'work in progress' nature of the manuscript helps us understand how Astori created it.Astori saw the inscriptions in person and transcribed their texts, often drawing a sketch of the monuments upon which they were carved.Finally, Astori assembled the materials that he had already collected in the codex, probably choosing all the inscriptions that he wanted to publish, but without giving them a final order.
The epigraphic codex written by Astori is of fundamental value for several reasons (cf.Calvelli 2004, 444).It allows us to better understand the state of the epigraphic studies in Venice in the early modern period.At the same time, it gives us the possibility to analyse otherwise lost phases of the life-cycles of the inscriptions.Finally, Astori's attention towards the physical monuments makes his work particularly useful for studying ancient inscriptions which are no longer preserved or are still to be identified. [TT]

Astori's Manuscript on eScriptorium
An epigraphic manuscript is an edge case study for layout analysis and HTR techniques, because a) text regions are included in the image regions representing the epigraphic monuments; b) texts are multilingual and rendered in different scripts; c) alphabetical signs may be fragmented.As shown in Figure 1, the complex layout of a typical page is constituted by the following kinds of regions: location (light blue), numbering (light red), drawing of the epigraphic monument (orange), Latin inscription (magenta), Greek inscription (purple), and translation from Greek to Latin (lime).Astori's manuscript consists only of 14 leaves (17 written pages), which do not provide a sufficient amount of text to create a new training set from scratch and successfully apply it to the rest of the manuscript.But the layout analysis and HTR techniques can be used even on a few pages already entirely transcribed by hand, at least for the following tasks: mapping the text glyph by glyph on the facsimile and testing the fine tuning of a model with a minimal amount of data.The detached research unit of the CNR-ILC at the VeDPH provides the scholars and students affiliated to the Centre with an instance of eScriptorium (version 0.11.0 available at https://gitlab.inria.fr/scripta/escriptorium)installed on the servers maintained by ILC4CLARIN (https://ilc4clarin.ilc.cnr.it).The digital facsimile of Astori's epigraphic manuscript was kindly provided by the Marciana Library and was uploaded on the platform (currently accessible only to the members of the project).The regions of interest have been manually identified and marked according to the SegmOnto guidelines (Gabay, Camps, Pinche, Carboni 2021).Due to the peculiarity of epigraphic manuscripts, we defined the following subtypes: CustomZone:provenance for the location; NumberingZone:inscriptionNumber for the numeric identifier of each inscription; GraphicZone:textBearingObject for the image of the epigraphic monument; CustomZone:greekInscription and CustomZone:latinInscription for the transcriptions; and CustomZone:translation for the Latin translation of Greek inscriptions. [FB]

Mapping the Transcription on the Facsimile
The recognition of text baselines by Kraken (https://kraken.re/master/index.html),which is the layout analyzer and HTR engine behind eScriptorium, is highly accurate (as shown in Figure 2) even if the text of the inscriptions is inside the drawing of the epigraphic monument.The manual digitization of Astori's transcriptions of the inscriptions was used as input into the ALTO-XML file downloaded from eScriptorium after the layout analysis, in order to map the text to the facsimile line by line.The fine-grained mapping, glyph by glyph, was obtained by overfitting the HTR engine.In normal conditions, overfitting must be prevented to avoid a biased recognition of samples absent in the training set.But in our case, we had at our disposal the complete manual transcription of the inscriptions (which are a small amount of text) and our task was exclusively to identify the coordinates of each glyph on the facsimile.Thus, the training set entirely corresponds to the text: the recognition approximates 100% of accuracy and, as the desired side effect, provides the coordinates of each glyph.Only fragmentary letters, such as the vestigia in the last line of the inscription number 7 (Figure 2) must be treated by hand and represented by dotted characters or lacunae.The ALTO-XML file downloaded from the current version of eScriptorium, even if based on Kraken, contains only the coordinates of text lines and words, not of glyphs.Thus, for this operation we used Kraken through the command line on a local computer, outside the eScriptorium environment. [FB]

HTR Applied to Locations and Translations
Latin and Greek inscriptions were faithfully transcribed by Astori in capital letters.As a consequence, the variation among the glyphs is rather limited.On the contrary, Astori wrote his notes about the locations of the inscribed monuments and the Latin translations of Greek inscriptions in cursive script, with a sensitive variety of glyphs, allographs and ligatures.Tatiana Tommasi manually digitised both the transcriptions of the inscriptions and Astori's notes and translations.As a proof of concept, we applied HTR to these parts in cursive.Due to the small amount of cursive text in Astori's epigraphic manuscript, we searched for other documents written by Astori, similar in script.For this purpose, we used Astori's letters addressed to Muratori (facsimile available online: https://bit.ly/3DOYtXC,https://bit.ly/3FAmqmW,and https://bit.ly/3SVvV2W).We acquired by Optical Character Recognition (OCR) the transcription published by Di Campli and Forlani (1995).We used Tesseract (https://github.com/tesseract-ocr/tesseract)as OCR engine and we edited the result constituted by the interpretative printed edition of the letters to obtain a faithful diplomatic transcription glyph by glyph (https://github.com/vedph/episearch-htr). The transcription, mapped on the facsimile line by line, was used for fine-tuning an existing model created by Chagué & Clérice (10.5281/zenodo.6657809)from data available on HTR-United (https://htr-united.github.io).Figure 3 Figure 1.Venice, Marciana Library, Marc.Lat.XIV, 200 (4336), f. 1v; regions of interest coloured by type.