Text Alignment in Ancient Greek and Georgian: A Case-Study on the First Homily of Gregory of Nazianzus

This paper discusses the word level alignment of lemmatised bitext consisting of the Oratio I of Gregory of Nazianzus in its Greek model and Georgian translation. This study shows how the direct and empirical observations offered by an aligned text enable an accurate analysis of techniques of translation and many philological parameters of the text.


INTRODUCTION
The original Greek texts of the Homilies of Gregory of Nazianzus (329-390 BC; about this author, see [Coulie, 1995]) were from early times translated into the different languages of the Christian East [Coulie, 1994].This paper offers some conclusions resulting from the analysis of word level alignment of a bitext composed by the Greek model (called ST "source text") of the first homily of Gregory of Nazianzus, and its Georgian translation (called TT "target text").This homily, entitled Εἰς τὸ ἅγιον Πάσχα καὶ εἰς τὴν βραδυτῆτα "On Easter and the delay" [CPG 3010], was written in 362 AD.The Georgian translation was made by Ephrem Mtsire (ეფრემ მცირე, Ephrem Mcire), also known as Ephrem the Lesser (11 th century) [Doborjginidze, 2009:65-93].This work paves the way for a broader analysis of GreekGeorgian translations, especially, but not exclusively, regarding bilingual lexical correspondences.The study of the oriental versions of this homily was already initiated in the framework of the Nazianzos Project (see http://nazianzos.fltr.ucl.ac.be), which ensured publication of critical editions of Arabic [Tuerlinckx, 2001], Syriac [Haelewyck, 2011], and Georgian [Metreveli et al., 1998] versions of this text, followed by articles, analysing some aspects of their textual correspondence and translation techniques (for example in [Coulie, 2000]).In this context, our goals are the following: • Offering multilingual digital dictionaries (for simple words) and translation memory files (for multi-word expressions); • Offering materials based on the empirical evidence rooted in corpus observations, in order to contribute to the study of the translation methods used by the authors of the Christian East.To reach these goals, lemmatised corpora and text-alignment tools are used.

I Available linguistic data 1.1 Corpora
The lemmatised concordance of Gregory of Nazianzus' Greek texts is available through the Thesaurus Sancti Gregorii Nazianzeni published by [Mossay et al., 1990].The computerised data of this thesaurus, recovered and updated for the needs of the GREgORI Project, has now been gathered in a corpus based on the Unicode encoding standard and on the TEI guidelines.On the Georgian side, the corpus consists of the Georgian translations of the thirteen homilies published in the Corpus Nazianzenum.Note that Gregory of Nazianzus' homilies have been translated from Greek into Georgian several times by different authors.In the case of this first homily, the most important translators are the above mentioned Ephrem Mtsire, as well as Euthymius the Hagiorite (ეფთვიმე მთაწმიდელი Eptvime Mtac ̣mideli "from the Holy Mountain") ( †1028) [about this author, see Kazhdan, 1991].Here, for our first approach, we have deliberately chosen to use Ephrem's translation because of its literalness in comparison to Euthymius' free style of translation [Metreveli, 1998:XV].
The Georgian version of the first homily by Gregory of Nazianzus was published in [Metreveli, 1998:2-17] and has been lemmatised by the authors of this paper with the collaboration of Professor Bernard Coulie.Table 1 lists the frequencies of the words, the lemmata and the different word-forms attested in these texts.

Lexical and morphological tagging
Both ST and TT are lemmatised.Each word is tagged with lexical (i.e.lemma) and morphosyntactic (i.e.part-of-speech) information.The lemmatisation in Greek follows the rules described in [Kindt, 2004]; see the website of the GREgORI Project for the part-ofspeech tagset.On the other hand, one can find the lemmatisation principles regarding Georgian texts in [Coulie et al., 2013].Texts are processed by lexical look-up (with the electronic dictionaries of the GREgORI Project) followed by a step of automatic or manual disambiguation for words corresponding to more than one lemma in the dictionaries; in other words, each word of the corpus receives a single lemma corresponding to its use in the context in which it appears.Lexical look-up and disambiguation are made by using the NLPsoftware Unitex, described in [Paumier, 2016] (about the role of this software in the project, see the contribution of [Kindt, 2017] in the present issue).

Alignment
ST and TT are then aligned as shown on Figures 1 and 2. Each token of the texts is followed, enclosed between braces, by its lemma, by a part-of-speech tag and by a sequential identification number (linked in the data-base to the exact references of this token in the original text).Alignment is processed with the mkAlign software [Fleury, 2012].A first alignment is done automatically; texts are segmented in "translation units" (TU) on the basis of the punctuation marks used as sentence boundaries (Figure 1).A second alignment process is done manually in order to identify more specific "translation units", as close as possible to the "lexical units" (Figure 2).This word-by-word alignment process will become increasingly automatized when all the resources, such as translational memories will be exploited.The Georgian translator Ephrem the Minor belongs to the so-called "hellenophile" school.This literary trend adopts the principle of formally equivalent translation [Doborjginidze 2009:65-90], almost slavishly reproducing all the particularities of the source language, leading to the translation being positioned as close as possible to its model.Accordingly, the source and target sentences of this bitext enjoy a very similar structure, to the point that their respective segments may be delimited along the same boundaries.Therefore, the translation units frequently link one word of the ST with one word of the TT: • a word from ST is omitted in TT: The article ἡ is omitted in TT because this part-of-speech does not exist in Georgian.
• the word order may be different between ST and TT: Here, in TT, the translator did not respect the word order of the ST and altered the sequence of the noun and pronoun.This change explains the discrepancy between TU numbers and identification numbers.
The result of the alignment process is saved in a Translation Memory eXchange file (.tmx) called 'bitext'.This TMX format is extended from the XML format, as shown below: This bitext is then loaded into the database of the GREgORI Project and processed with specific software, allowing to edit the bilingual concordances and create the bilingual dictionary or 'translation memory'.Figure 3, below, shows a bilingual Greek-Georgian concordance of the verbs ἀποδίδωμι and δίδωμι.It is obvious that the alignment and morphosyntactic annotation enable extracting information as much exactly as possible in comparison with common methods used in translation studies.Indeed, one should take into consideration that in this case we are dealing with the evidently different languages marked by quite distinct morphological and syntactic regulations: Greek and Georgian are belonging to the different family of languages, Georgian enjoying by agglutinative structure and the complex verbal morphology, features rendering it significantly different from Greek language.On the other side, we are dealing here with the ancient and critically edited pair of texts based on the study of whole manuscript tradition.Consequentially, we use the reconstructed texts, both, as model or translation.This is different from the contemporary ST-TT pairs of texts, where the strictly formal dependence of the TT on the ST is obvious: indeed, there we deal with immediate filiation between these two concrete items.Consequentially, these two factors, namely, the structural dissimilarity of Greek and Georgian languages, combined with the peculiarities inherent to the state of conservation ad reconstruction of the ancient texts, make the automatic detection of the related equivalent units from the St and TT harder within the Old Greek and Georgian bitext.Moreover, there are no plenty of studies in the field of digital humanities dedicated to this pair of languages.We do not have the necessary tools, data-bases or case studies to reuse for such research which is taking its very first steps now.Even more importantly, the Georgian language still poorly provided by software tools.
All this results in a need of the well annotated bitext.Morphosyntactic tagging is, in such case, an indispensable step to provide exhaustive information about each unit of bitext enabling, therefore, precise and well specified requests and accurate extraction of information.
In addition, one needs to take into consideration that for our purpose the identification well discriminated and equivalent units from ST and TT is essential, since the project GREgORI is conceived for philologists editing ancient texts and studying ancient translational technics.The ancient models are systematically translated many times by different translators using different translation technics: this makes the link between the ST and TT subtle and variable as the case might be.These subtleties must be accurately discriminated thanks to the morphosyntactic tagging, since this is the main purpose of the GREgORI project.
Summing up, the morphosyntactic annotation and alignment enable detection of related units within the bitext given that our study is aiming to the highest philological precision.These strategies are indispensable for supporting the accurate extraction of information when general context related to this bitext is marked by scarcity of the comparative studies, by lacking software tools for the ancient languages and, and by the usage of morphosyntactically quite distinct pair of languages.

Lexical equivalence
As noted before, the bitext offers a formally equivalent translation of a high degree of precision.This often leads to two consequences in TT: a very low level of terminological fluctuation and the creation of neologisms.

A very low level of terminological fluctuation
We generally observe a strict terminological correspondence between ST and TT.Usually, no fluctuation occurs when translating ST's terminology in TT, even for frequently used terms.For example, the occurrences of ἀνάπαυσις (N+Com) (1-2), used twice in ST, are only rendered by the word განსუენებაჲ (N+Com) in TT, despite synonyms being available for this lexical unit.

Creation of neologisms
The other consequence of the formally equivalent translation is the rise in TT of neologisms.Some of them, being constructed in the same way as their Greek models, are contrived, unnatural words in Georgian.They are reproducing their model slavishly, accurately reflecting Greek structures alien to genuine native usage.This is for example the case of παρέρχομαι (V), rendered by თანაწარჴდომა (V+Mas) in ( 5): (5) καὶ ἡμᾶς παρῆλθεν ὁ ὀλοθρεύων [PG35, col.397A] და ჩუენ თანაწარგუჴდა მომსრველი da čuen tanac ̣arguqda "l'exterminateur est passé à côté de nous" [SC, p. 74, § 3, l.2] "and the (Destroyer) passed us over" [Shaff, 2007:203] Or ἀποδίδωμι (V), translated through უკუნცემა (V+Mas) in ( 6): "restituons à l'image ce qui est de l'image" [SC, p. 76-77, § 4, l. 9-10] "let us give back to the Image what is made after Image" [Shaff, 2007:203] In the first example (5), the word თანა-წარგუჴდა begins with the element თანა-"with", which is used as a postposition in Ancient Georgian and never to build verbs.Its use as though it was a preverb, in combination with a conjugated form of a verb, is not natural.Instead, it is a slavish reproduction of the preverb παρα present in the Greek form παρῆλθεν.

Terminological fluctuation
In spite of the principle of a formally equivalent translation, it must be taken into account that any translation shows at least a few instances of terminological fluctuation.For example, αἰδέσιμος "respectable, venerable", occurring twice in ST, has been translated in two different ways in TT.Its first rendering, in (7), is the adjective განსაკრთომელ gansaḳrtomel "fearful, frightening", while its second instance, in (8), is the locution ღირს პატივთაჲ ġirs ṗaṭivtaj "worthy of respect". (

Failure of word-by-word correspondence
A correspondence between the units of ST and TT does not necessarily imply that strictly one token on the one side is equivalent to another one on the other side.Translation equivalence is a relation between two units with the same meaning from both sides but, obviously, word-byword correspondence is sometimes impossible to achieve, as in (9-10).

[ST] PRO+Pers + V vs V [TT]
A conjugated verb accompanied by a personal pronoun in ST often matches with a conjugated verb in TT, without a personal pronoun.In Georgian, pronouns are directly included in the verbal structure through an appropriate morphological mark.In (11), the morpheme მm-in მაბრალობთ m-abralobt is the equivalent of the Greek pronoun μοι.In other words, the Greek personal pronoun will no longer be present in the Georgian translation, leading to the frequent asymmetry of this type. (

Asymmetrical equivalences [ST] A = V+Part [TT]
We emphasise below some asymmetrical equivalences from the Greek-Georgian bitext.The correspondence [ST] A = V+Part [TT] is frequently attested, and is justified on the linguistic level as well; a Georgian lexeme being morphologically a participle is often performing the function of an adjective qualifying the name in the sentence [Coulie et al., 2013:183-184].Some grammars categorise such words as "verbal adjectives".However, given that from a morphological point of view these units are clearly participles in Georgian, we opted to label them as "V+Part", as seen in ἀγαθὸν_{ἀγαθός.A} vs კეთილად_{კეთილი.V+Part} in ( 12): (12) πλάστην ἀγαθὸν [PG35, col.397A] მოქმედად კეთილად mokmedad ḳetilad "un bon modeleur" (SC, p. 74, § 2, l. 6) "as a good modeler" [Shaff, 2007:203] The word კეთ-ილ-ი shows the morphology of a past participle, based on a verbal root კეთებ-ა "doing, performing, realising", with the morpheme -ილ-proper to past participles, and the nominative case ending -ი (participles can be declined in Georgian).Thus, კეთ-ილ-ი has a meaning of "done, performed", but the unit is used as an adjective with the extensional meaning of "good, well done".
Adverbs in ST are widely affected by asymmetrical renderings in TT, and the following formulae are common:

[ST] I+Adv = V+Part [TT] [ST] I+Adv = A [TT]
Indeed, a considerable number of adverbs in Georgian is formed through inflecting adjectives and participles in the adverbial case.This relates in particular to the so-called adverbs of manner characterising the manner by which the action expressed by the verb is performed.They are considered as "derivative" adverbs, in contrast with the "primary" ones.We tagged as "adverbs" only "primary" forms, while "derivatives"being adjectives or participles declined in the adverbial caseare merely considered as declined adjectives and participles, and are labelled as such, e.g.καθαρῶς_{καθαρῶς.I+Adv} vs წმიდად_{წმიდაჲ.A} in ( 14) [Coulie et al., 2013:192-194 The adjective წმიდაჲ c ̣midaj "pure" has been put in the adverbial case (წმიდ-ად c̩ mid-ad), which enables to express the meaning of "purely".Similarly, the participle კეთილი ḳetili discussed above, once put in the adverbial case, will express the meaning of "well, nicely, pleasantly", and its matched pair in the ST will be an adverb, such as καλῶς_{καλῶς.It is thus generally true that an adverbial form of an adjective or participle in Georgian will correspond to an adverb in many other languages.

[ST] A=N+Com [TT]
Similarly, the genitive case of the common names in TT will quite often express the same meaning as the adjectives of ST: λιθίναις_{λίθινος.A} vs ქვისათა_{ქვაჲ.N+Com} σαρκίναις_{σάρκινος.A} vs ჴორცთანი_{ჴორცი.N+Com}

[ST] PRO+Ref1s vs N+Com [TT]
The Georgian noun თავი tavi "head" is often used in the function of reflexive pronoun, a fact that will also generate a frequent asymmetrical equivalence, since the usage of a reflexive pronoun is common, such as in ( 16): (16) μυστηρίῳ μικρὸν ὑπεχώρησα ὅσον ἐμαυτὸν ἐπισκέψασθαι [PG 35, col.396B] საიდუმლოსა მცირედ განვეშორე რაოდენ თავის განცდადმდე saidumlosa mcired ganvešore raoden tavis gancdadmde "j'ai manifesté un recul devant le mystère, le temps de m'examiner" (SC, p. 74, § 2, l. 2) "I withdrew a little while at a Mystery, as much as was needful to examine myself" [Shaff, 2007:203] Yet many other situations of asymmetry are frequent, especially with "functional" words that are differently classified in the grammar of the respective languages of ST and TT.For example, the widespread word καὶ is tagged as a particle in Greek while its Georgian equivalent, და, is considered a conjunction: καὶ_{καί.I+Part} vs და_{და.I+Conj}.

III Conclusion
The work on word-level alignment between the Greek and Georgian texts of the first homily of Gregory of Nazianzus was carried out in the framework of the GREgORI Project.This work complements the previous studies focused on the analysis of techniques of translation from Greek into the different languages of the Christian East.This methodbeing based on the direct and empirical observations offered by an aligned bitextenables systematising those previous researches.Applying the same method to the twelve other already published Georgian homilies, a corpus of 138,741 words, is the next step.Of course, increasing bilingual data consisting of previously identified TU-s will allow for an ever increasing automatisation of the alignment process.Other alignment strategies, such as statistical methods, will be tested before being applied to new texts.At the same time, the GREgORI Project is beginning to run the same methodology on the versions of the works of the Theologian translated into other languages of the Christian East.

Table 1
. Number of words, lemmata and word-forms in Gregory's Homilies (Greek texts and Georgian versions).