A Classification of Manuscripts Based on A New Quantitative Method. The Old Latin Witnesses of John’s Gospel as Text Case

A new method for grouping manuscripts in clusters is presented with the calculation of distances between readings, then between witnesses. A classification algorithm (“Hierarchical Ascendant Clustering”), achieved through computer-aided processing, enables the construction of trees illustrating the textual taxonomy obtained. This method is applied to the Old Latin witnesses of the Gospel of John, and, in order to provide a study of a reasonable size, to a chapter as a whole (chapter 14). The result basically confirms the text-types identified by Bonatius Fischer, founder of the Vetus Latina Institute , while it invalidates the classification adopted by the current edition of the Vetus Latina of the Gospel of John.


INTRODUCTION
Statistics are useful for textual criticism of the New Testament1 and computer-aided processing is now unavoidable.However, most of the quantitative studies made of the text of the New Testament are based on the number of agreements between manuscripts.For every variation-unit and for every pair of witnesses, one counts the agreements: 1 when they have the same reading, 0 otherwise.The processing of the data involves preparing a table of the agreements two by two in the form of a lower triangular matrix, to convert the results into percentages, then to proceed to their classification in lists where the percentages are sorted in decreasing order.The credit for this belongs to Ernest Colwell who created the impetus with the Multiple Readings Method in the late 50s 2 .Two developments ensued with the Claremont Profile Method of F. Wisse and P.R. McReynolds3 in the 60s, and the Comprehensive Profile Method of B.D. Ehrman 4 in the 80s.
The counting of agreements between manuscripts is not, however, the only statistical tool.Another method is Data Analysis, of which the aim is to highlight the multidimensional character of the sample.Data Analysis is divided into two complementary lines of approach.The first is Correspondence Analysis or Multidimensional Scaling, which for qualitative data brings out a limited number of synthetic components.Developed at the end of the 60s, this approach chiefly owes its success to the geometrical presentation it employs.The goal of Correspondence Analysis is to produce a visual display of groups of information that involve qualitative variables such as manuscript variants.Several analyses of this kind have been produced in the New Testament field since the end of the 70s.Mention may be made of the three articles by C.-B. Amphoux on the Epistle of James (1978-1981)  5 , the thesis of T.J. Finney on the Epistle to the Hebrews (1999) 6 , and the work of W. Willker on the first five chapters of John's Gospel, published online (2008) 7 .
The clustering method is the second line of approach.It seeks to reduce individual elements to homogenous classes and for that reason is extremely useful for textual taxomony.To our knowledge, it has been little used to date for New Testament philology 8 .
It is this latter statistical tool that we will present and apply to a sample text.With that in mind, it will first be useful and necessary to define mathematical distances.Our starting point is to establish a calculation of distances between readings, then between witnesses.The advantage of the method thus construed is twofold.On the one hand, it is a question of characterizing quantitatively the relative weight of a variant.Unlike the binary counting of the agreements (1 or 0), the distance between readings allows it to be seen that the variations are not all equivalent.According to the kind of variation and the length of the variation-unit, the distance between two readings oscillates between 0 and 12. On the other hand, the calculation of the distance between manuscripts proves to be a robust tool: it avoids an excessive alienation of the atypical witnesses that present a large number of singular readings, and at the same time offers the necessary precision for a meaningful classification of witnesses.This method will be applied to the Old Latin witnesses of the Gospel of John (37 to date) and, for a study of a reasonable size, to a chapter as a wholewe have selected chapter 14.The text of these manuscripts is available on the website www.iohannes.com,as part of the edition of the Vetus Latina, under the direction of D.C. Parker9 .10 manuscripts are entirely lacunose for this chapter 10 .The four manuscripts that have at the most 10% of the text (Table 1: VL 40 to VL 46) are also excluded from the present study insofar as their low attestation does not yield reliable results 11 .The result is that 23 codices are used for classification: Palatinus (e, VL 2), Vercellensis (a, VL 3), Veronensis (b, VL 4), Bezae (d, VL 5), Colbertinus (c, VL 6), Sangermanensis primus (g 1 , VL 7), Corbeiensis (ff 2 , VL 8), Fossatensis (VL 9A), Brixianus (f, VL 10), Rehdigeranus (l, VL 11), Würzburg Univ.67 (VL 11A), Monacensis (q, VL 13), Usserianus primus (r 1 , VL 14), Aureus (aur, VL 15), Sangallensis (n, VL 16), Sangallensis 48 (δ, VL 27), Sangermanensis secundus (g 2 , VL 29), Gatianus (gat, VL 30), palimpsest of a Gallican lectionary (VL 32), Carnotensis (VL 33), Book of Mulling (μ, VL 35), Sangallensis 60 (VL 47), Sangallensis 51 (VL 48).The text of the Vulgate (vg) is added, counting as one witness (edition of Stuttgart).As soon as a variation occurs within this set of witnesses, a variation-unit is constituted.Thus we obtain a relatively large amount of text, as shown in Annex 1. There, it can be seen that there are 136 variation-units for 529 words if we take the Vulgate as a point of reference, which are listed according to the 31 verses of John 14.

I. THE DISTANCE BETWEEN READINGS
A variation-unit can be subject to three kinds of variation: (1) presence / absence; (2) substitution; (3) displacement 12 .In an article of 1988, C.-B. Amphoux 13 proposed a quantification of the difference between two readings for each kind of variation.We take this measurement system, slightly modified: displacement: 1 point per displaced group (the words are the same); examples: cognovistis me / me cognovistis (L9c) = 1; parare vobis locum / locum parare vobis (L2d) = 1; substitution: 1 point if the substitution is made with a similar word or form, 2 points otherwise (most of the words of the variation-unit are the same except the replaced words); examples: dicit ei / dicit illi (L5a) = 1; quia / quoniam (L28a) = 1, but paracletum / advocatum (L16b) = 2; presence / absence: 2 points for a verb, a noun, an adjective or an adverb formed from an adjective, 1 point for the others (pronoun, conjunction, preposition, adverb, particle, etc.) (most of the words vary and the calculation is identical whether we count from the reading where the words are present, or from the reading where they are absent); examples: patrem / patrem meum (L16a) = 1; mihi dedit / mihi dedit Pater (L31e) = 2.
The identification of a displacement or a presence / absence is relatively easy.A substitution is noted when one word changes inside a phrase; however, when the substitution concerns several words, it must always be calculated word for word.Finally, two out of the three, or all three, kinds of variation can be combined.Examples: accipere saeculum non potest / hic mundus non potest accipere (L17a) = 4 in sum (displacement of accipere = 1; substitution saeculum / mundus = 2; presence / absence of hic = 1).All the distances between readings are given in Annex 1.

II. THE DISTANCE BETWEEN MANUSCRIPTS
12 Thus Fee G.D. On the Types, Classification, and Presentation of Textual Variation.Epp E.J., Fee G.D. Studies in the Theory and Method of New Testament Textual Criticism (Studies and Documents, 45).W.B. Eerdmans (Grand Rapids), 1993:63: "The kinds of variation narrow to three: (1) add/omit ... (2) substitution ... and (3) word order ... any two or all three of these may occur in combination in any set of variants". 13 From the distance between readings, it is possible to define a distance between two manuscripts.The safest definition from a mathematical point of view is that of the Euclidean distance.Considered within this perspective, the distance between two witnesses is equal to the square root of the sum of the squares of the distances between their readings taken two by two.By way of illustration, we may consider the first five variation-units in verse 1 and calculate the Euclidean distance between codex Palatinus (e, VL 2) and the Vulgate (vg): Variation-units L1a L1b L1c L1d L1e Distances between the readings of VL 2 (e) and vg 0 1 3 2 0 The distance between VL 2 (e) and vg is then equal to: This definition is now applied to the 136 variation-units for the Vulgate.We give in Table 2 the distances between vg and the 23 Old Latin manuscripts under consideration for John 14.The distances are sorted in increasing order, i.e. from the manuscript closest to vg to the farthest.Codex Palatinus (e, VL 2) is the farthest from the Vulgate.In addition to that manuscript, we find that the Old Latin manuscripts known to be the most representative of the early, pre-Vulgate text are the farthest from vg: VL 3 (a), 14 (r 1 ), 5 (d), 13 (q), 6 (c), 8 (ff 2 ), and 4 (b).Conversely, the Old Latin manuscripts more influenced by the Vulgate (mixed texts) are: VL 33, 7 (g 1 ), 29 (g 2 ), 11A, and 47.On the other hand, the manuscripts VL 32 and 16 (n) should for the moment be put to one side because of their lacunae.Indeed, if the same list of distances is drawn up for VL 32, vg is the fifth witness and four other manuscripts are closer to VL 32; likewise, for VL 16 (n), vg is in 10th place.The distance between two manuscripts only makes sense within the context of all the 24 witnesses.Only in this network of interrelationships is a comparison possible.It is thus necessary to calculate the distances of all the witnesses two by two.For John 14, one obtains 276 values presented in a lower triangular matrix (Table 3, see end of article)14 .The majority of the quantitative studies would then show for each witness a column where the distances from the other manuscripts would be sorted in increasing order.The search for a classification on this model would prove difficult, if not impossible, because of the number of columns to be treated (23) and their identical length.In the face of this mass of data, a systematic method using the computer is preferable.The most appropriate statistical tool in our view is the clustering method.

III. CLASSIFICATION
After first presenting this method, we shall then go on to interpret the processing of the data of John 14.

The Clustering Method
The purpose of the clustering method is to distribute the individual elements of the sample in clusters where they are as similar as possible.At the same time, the clusters should be as dissimilar as possible.The construction of a hierarchical clustering entails, first, the formation of small clusters with very similar individual elements, then from these clusters to build others less and less homogeneous, until the sample is exhausted.This algorithm is called the "Hierarchical Ascendant Clustering" (HAC).It is based on the calculation of the distance between two distinct clusters.The rule for calculating this distance is called an "aggregation criterion".The aggregation criteria are as varied as the many possibilities of classification.Let us mention three of the most common: -The minimum step or the criterion of the nearest neighbours (single linkage): the distance between two clusters is the smallest distance between elements of the two clusters.
-The maximum step or the criterion of the most remote neighbours (complete linkage): the distance between two clusters is the greatest distance between elements of the two clusters.
-The criterion of the average (average linkage): the distance between two clusters is the average of all the distances between elements of the two clusters.
The distance between clusters having been defined, the algorithm of the HAC then proceeds to the next stages.At the initial stage, each witness forms a cluster.Thus, there are 24 clusters.In the first stage, the table of distances between witnesses makes it possible to bring together the two closest witnesses, which will be aggregated into a new cluster.We then obtain 23 clusters.In the second stage, a new table of distances between the 23 clusters is established.The two closest clusters are gathered together and we obtain 22 clusters.The process is repeated until we obtain only one cluster.The result of the HAC is represented by a clustering tree or "dendrogram".This encompasses all stages of aggregation.The interpreter must therefore locate a natural break in the tree, enabling him or her to identify the number of homogeneous clusters and to give them a meaning.The tree schematizes distances between witnesses, or conversely a close relationship.In no way does it reflect any filiation and it looks nothing like a stemma codicum 15 .
For statistical processing, we used the free software R and chose as the criterion of aggregation of clusters the method of Ward, which is the standard method used for Euclidean data 16 .The manuscript VL 32 (the remains of a Gallican lectionary) had to be removed from the data because it disturbed the results, having only the text of the verses 13-19 of John 14.The resulting tree is given in Figure 1 (see end of article).

Interpretation of the Classification
The classification identifies two groups: in the lower part of the tree are the Old Latin manuscripts most representative of the pre-Vulgate text (12 witnesses), and in the upper part, the manuscripts whose text is more influenced by the Vulgate (11 witnesses).We suggest an interpretation of this classification into two groups, made in the light of the text-types masterfully synthesized by Bonatius Fischer 17 .

Group 1: the Old Latin manuscripts
It is generally agreed that the Old Latin manuscripts can be divided into two text-types.The African type is the text from around 230 and is represented primarily by codex Bobiensis (k, VL 1), then by Cyprian and e (VL 2).It gradually gives way to the European type of which b (VL 4), ff 2 (VL 8) and i (VL 17) are the leaders (the Italian text of the years 350-380).Unfortunately, k (likewise i) is lacunose for John.VL 3 (a) and 16 (n) represent a first form of the European type, close to Novatian.Also belonging to the European Old Latin manuscripts group whose African stratum is still present are codex Colbertinus (c, VL 6) and the problematic codex Bezae (d, VL 5), and codex Usserianus primus (r 1 , VL 14), which is the main representative of a subgroup of the Italian type (Gallo-Irish group).
These Old Latin manuscripts are found in the group of witnesses located in the lower part of the tree (Group 1).The only notable exception is the presence of codex Sangallensis 48 (δ, VL 27).In our study of the text of the Gospel of John in the Latin translation of Origen's Commentary on Matthew, we noticed that VL 27 formed in itself a cluster in the tree because of the large number of conflated  readings 18 .The copyist seemingly has before his eyes two distinct texts, one of which preserves Old Latin readings.A classification of group 1 without this manuscript provides better results (Figure 2).
The algorithm provides a tree which enables us to follow the construction of each node with a graduated scale (Figure 1: 0-50; Figure 2: 0-35).In reading the tree, the interpreter has to determine where to make a break that enables a number of homogeneous clusters to be identified.The first clusters formed, and thus the more homogeneous ones, are in the group of manuscripts influenced by the Vulgate (Group 2).As regards the Old Latin manuscripts group (Group 1), we have four clusters in the following order of appearance: (1) VL 6 (c), 11 (l), 8 (ff 2 ), 16 (n), and 4 (b); (2) VL 10 (f), 13 (q), 14 (r 1 ), and 5 (d) ; (3) VL 3 (a) ; (4) VL 2 (e).Six comments may be made: 1.The first cluster formed clearly corresponds to the European type, of which codex Veronensis (b, VL 4) and Corbeiensis (ff 2 , VL 8) are the best representatives.Moreover, VL 6 (c) is known to be close to VL 8 (ff 2 ) 19 .
2. VL 14 (r 1 ), which is an excellent witness of the European text and belongs to the Gallo-Irish group, shares more than one reading with codex Monacensis (q, VL 13).The philologist will not fail to be surprised by the place of codex Rehdigeranus (l, VL 11).It is generally agreed that q and l form a subgroup of the European text type 20 .However l, which contains only the verses 23-31 of John 14, opposes this view here.
3. Codex Brixianus (f, VL 10) contains a "mixed" text 21 : as regards the gospel of John, it is classified with the Old Latin manuscripts, the Vulgate influence being secondary.
4. In the construction of the clustering tree, codex Bezae (d, VL 5) precedes VL 3 (a) and 2 (e).At the philological level, the text of d for the gospels of Luke and John is between e and a.It has an African stratum while under the influence of the European text 22 .
5. Codex Vercellensis (a, VL 3), which represents the oldest form of the European text, usually in the company of fragments of St. Gallen, VL 16 (n), constitutes a cluster by itself.

Codex
Palatinus (e, VL 2) is the most difficult to classify following the formation of clusters.It is its African stratum that no doubt accounts for this fact.
Sangermanensis primus (g 1 , VL 7) also has without doubt an Old Latin stratum but, in this case, the influence of Vulgate is predominant.VL 29 contains a mixed text which derives from a regional group of Vulgate manuscripts, Celtic family DELQR.VL 47 has an Irish mixed text (Old Latin in Jn 1,29-3,26).Finally, VL 11A was recently included in the list of the Vetus Latina Institut and Hugh Houghton 23 has shown that the text of Jn 1,1-5,40 and 12,34-13,10 is Old Latin.
2. Manuscripts VL 35, 30 and 48 have in common that they display an Irish mixed text for John (Celtic family DELQR).VL 9A must be added to these witnesses.The recent study of Houghton 24 prompted the inclusion of this manuscript in the list of Old Latin manuscripts.The author concludes from a survey of synonymous terms that this manuscript has an Old Latin sourcethe parallels with VL 2, 3, 13, 14 show the antiquity of the source textbut at the same time that there is no closeness to a particular manuscript.But according to our previous study 25 , 9A forms a cluster with codex Aureus (aur, VL 15) which contains a mixed text with Vulgate readings and Old Latin readings of the European type.This cluster VL 9A-15 was formed immediately after the cluster VL 35, 30, 48.In the present case, the classification is similar: 9A is inserted into the Celtic family, just before the inclusion of aur, which forms a cluster apart.

Classification into Text-Types
The classification into two groups that has been achieved deserves to be compared with that proposed by Philip Burton 26 , insofar as discrepancies with our conclusions are apparent.Burton identifies the existence of two European text-types (not one), the second containing the Vulgate and mixed texts; the two groups are a d q r 1 e and aur c f ff 2 l vg.His classification is based on a review of the translation of nine Greek terms.His main result is the non-existence of a European kernel formed by b and ff 2 .On the one hand, b would belong to the first group for John 1-9 and to the second for 10-21.On the other hand, ff 2 should be classified with a mixed text and the Vulgate.In our view, the sample used by Burton is too small, in terms of not only the number of variants but also the number of manuscripts.The sample of 136 variation-units of John 14 represents more than 520 words and involves 23 witnesses.The classification thereby obtained confirms the European kernel b-ff 2 in the Old Latin witnesses (Group 1) and also refutes any exclusion of manuscripts c, l and f from this group.Furthermore, it demonstrates conclusively the separation between the group of Old Latin manuscripts (Group 1) and the group influenced by the Vulgate (Group 2).

CONCLUSION
Following this path, we can draw conclusions both on the method and on the results obtained.From a methodological point of view, the calculation of distances between readings, then between witnesses, is mainly characterized by stability and robustness of results.In terms of readings, the method allows us to take into account quantitatively the various kinds of variation on a scale ranging from 0 to 12, while the counting of the agreements standardizes the difference between readings (0 or 1).In terms of manuscripts, it makes a continuous reading of the text of John 14 possible.It thus avoids an arbitrary selection of variation-units that possibly would lead textual material to be overlooked that could be useful for the classification and especially to increase unduly a particular textual affinity.We have to apply a proportionality principle that results in a reliable classification of witnesses.The selection, for example, of the non-Vulgate readings of a manuscript alters the results; the same methodological error was made for Greek manuscripts of the New Testament when the percentages of agreements between two manuscripts were calculated by excluding the readings of the "Textus Receptus".In sum, attention must be paid to the network of interrelationships within which a comparison is alone possible.This is the goal of the method of "classification" ("Hierarchical Ascendant Clustering"), which analyzes hundreds of distances.In John 14, the clusters revealed by the algorithm show clearly that the text-types synthesized by Bonifatius Fischer are not to be doubted.On the contrary, textual taxonomy finds in the clustering method a solid foundation.
P.H. Burton, H.A.G.Houghton, R.F.MacLachlan, and D.C. Parker are now editing the Vetus Latina of the Gospel of John, of which the first nine chapters have already been published 27 .The second Old Latin Gospel being published is that of Mark edited by Jean-Claude Haelewyck 28 .A comparison between the two projects is illuminating in this regard.The latter endeavours to reconstruct the texttypes along the lines proposed by B. Fischer: K, the ancient African text VL 1 [k]), C, the recent African http://jdmdh.episciences.orgISSN 2416-5999, an open-access journal essentially confirms the text-types previously generated for the gospels, especially those of Mark.Patristic quotations are then used to locate in time and space these different text-types.

Table 1 :
Old Latin Witnesses lacunose in John 14

Table 2 :
Distances between vg and the Old Latin Witnesses in John 14

Table 3 :
Distances between Manuscripts in the 136 Variation-Units of John 14 Figure 1.Classification of 23 Latin Witnesses in John 14 Figure 2. Classification of 11 Old Latin Witnesses in John 14 Annex 1: Variation-Units and Distances between Readings in John 14