Visualising pre-standard spelling practice: Understanding the interchange of ‹ch(t)› and ‹th(t)› in Older Scots

Alphabetic spelling systems rarely display perfectly consistent one-to-one relationships between graphic marks and speech sounds. This is particularly true for languages without a standard written form. Nevertheless, such non-standard spelling systems are far from being anarchic, as they take on a conventional structure resulting from shared communities and histories of practice. Elucidating said structure can be a substantial challenge for researchers presented with textual evidence alone, since attested variation may represent differences in sound structure as well as differences in the grapho-phonological mapping itself. In order to tease apart these factors, we present a tool — Medusa — that allows users to create visual representations of the relationship between sounds and spellings (sound substitution sets and spelling substitution sets). Our case study for the tool deals with a longstanding issue in the historical record of mediaeval Scots, where word-final <cht>, <ch>, <tht> and <th> appear to be interchangeable, despite representing reflexes of distinct pre-Scots sounds: [x], [xt] and [ θ ]. Focusing on the documentary record in the Linguistic Atlas of Older Scots ([LAOS, 2013]), our exploration surveys key graphemic categories, mapping their lexical distributions and taking us through evidence from etymology, phonological typology, palaeography and historical orthograpy. The result is a novel reconstruction of the underlying sound values for each one of the target items in the record, alongside a series of sound and spelling changes that account for the data.


INTRODUCTION
Throughout the mediaeval period, vernacular languages such as Older Scots (OSc) and Middle English (ME) lacked an orthographic standard.This meant there was no prescriptively correct way of spelling any given word, allowing scribes to draw from a range of orthographic conventions, mainly of Old English and (Anglo-)French origin to 'sound out' their own speech.The result was a variety of ways for spelling individual words, based on a multiplicity of ways of representing individual sounds.This variation depended partly on the scribe's pronunciation, but also on lexical context (what word the sound came up in, whether the sound was at the start or end of a word, what the preceding letter was, etc.), and writing tradition (local, temporal, genre-or language-based) (for seminal work on scribal spelling systems, see [McIntosh, 1956], and [Samuels, 1963]).An example of this heterogeneity of spellings is given in Figure 1, representing a sample of the different forms for the word YEAR in the Linguistic Atlas of Older Scots corpus ([LAOS, 2013]) which brings together over 1,200 Older Scots legal texts from the period between 1380 and 1500.Ultimately, each sound (or phone) in medieval spelling could have several spellings -a spelling substitution set -and, vice-versa, each spelling unit (or grapheme) could be used to represent multiple phones -a sound substitution set (cf. [Laing, 1999], [Laing and Lass, 2003], [Kopaczyk et al., 2018]).Figure 2 gives example visualisations of such sets, based on the data in the From Inglis To Scots Corpus ([FITS, forthcoming]), a unique resource for exploring nonstandard spelling practices.In what follows, we give an overview of the challenges and advantages of creating and using such visualisations.The FITS corpus contains the spellings of more than 100,000 word tokens from the LAOS corpus.In order to fully explicate these tools, we focus on the reconstruction of a particularly difficult set of overlapping sound and spelling substitution sets, those involving the spellings <ch>, <cht>, <th> and <tht>, among others. 1 Visualisations such as those in Figure 2 are intended to help researchers understand the diversity of sound-spelling correlations in a corpus, as well as the relative frequency of each phone (size of the blue bubble), grapheme (size of the mustard bubble) and the links between them (thickness of a line represents a proportion of the total tokens).Such information -in this case for OSc -helps guide the reconstruction of the spoken language of particular periods and locations and trace linguistic change over time and space.Methodologically, these substitution sets are made possible by the careful construction of a grapho-phonologically parsed corpuswhich matches reconstructed phones to graphemes (see [Kopaczyk et al., 2018] for details)linked to a custom-built visualisation tool.The latter is a weighted network graph built with the D3 Javascript library, and which we have christened Medusa (cf.[Alcorn et al., forthcoming]).

I BUILDING HISTORICAL SUBSTITUTION SETS AND RECONSTRUCTING CHANGE
In order to reconstruct the sound values behind spelling forms, we must triangulate from a number of factors.Spelling is one, of course, but the etymological source of a given word and 1 Throughout the article, we use the standard linguistic convention of placing graphemes in angled brackets < > and phones in square brackets [ ].For the latter, we use the symbols of the International Phonetic Association.They are not intended to represent allophonic variation but "a reasonable transcriptional response" ( [Lass and Laing, 2013: Section 2.4.2]) to the sound value aimed at by the scribe.Present-day English equivalents of words are given in small caps.yhere, ʒer(e), ʒhere, iere, yeer, yer, yhe, yheere, yheir, yheir(is), yheir(e), yhere, yhere(is), þhere, ʒeere, ʒeir, ʒeir(e)-, ʒere, ʒerr, ʒer, ʒer(is), ʒeyr(e), ʒher, ʒ er (is) … its present-day pronunciations must also be considered, as well as the sound and spelling changes that are likely -or unlikely -to occur in the history of the language in question.For every instance of a grapheme, therefore, FITS provides a triad made up of (i) an OSc grapheme, (ii) an OSc phone and (iii) a Pre-Scots, etymological phone (usually from Old English, but see [Alcorn et al., 2017]), as in Figure 3.Where there is a discrepancy between the Pre-Scots and the OSc phones, the relevant linguistic development is reconstructed.In the example in Figure 3, the change from [i] to [ɪ] is characterised as an instance of Short Vowel Lowering (SVL), following [Aitken and Macafee, 2002:10-11].Ultimately, FITS can be seen as a corpus of such triads alongside a corpus of linguistic changes.

II CASE STUDY: GRAPHEMIC OVERLAP OF <CH(T)> AND <TH(T)>
A famously difficult pattern of sound-spelling mappings in the early history of Scots -and one that needed a solution for the grapho-phonological parsing of the FITS Corpus -is that of the ends of words such as burgh, thought or cloth.In such words, OSc scribes used an overlapping range of spellings for the reflexes2 of Pre-Scots [x], [xt] and [θ].Indeed, according to [Johnston, 1997a: 101], the Older Scots spellings "<ch, cht, th, tht> appear interchangeable, leading some authorities to conclude that they are just graphical variants".In other words, here we have a number of cases where etymologically distinct sound categories (Pre-Scots [x], [xt] and [θ]) have at least partially merged spellings (<ch, cht, th, tht>).This prompts questions such as whether the phonic substance of these spellings has also merged and what changes to the sound and spelling systems need to be postulated in order to account for the spelling overlap.
The pattern is complex in that it entails three seemingly unrelated linguistic changes: a. Final <t> being lost in many OSc forms derived from Pre-Scots [xt]  A glimpse of the complexity of these overlaps is evident from the Medusa-generated OSc spelling substitution sets for word-final [θ] and [x] (from Pre-Scots [θ] and [x]), in Figure 4.
Recall that the thickness of the blue line joining sounds to spellings gives an indication of the proportion of phone tokens represented by the relevant grapheme, so that the main variants here are clearly <ch>, <th>, <cht> and <tht> for both etymological categories.The superscript < t > (represented as <^t>) is an important variant for [θ]-words, and a minor one for [x]-words.
Finally, [x] alone is represented by <gh>, for which it is a major variant.  .As a result, the <th> digraph was brought into Northern Middle English and early Scots from Anglo-Latin, where it had long been used to spell dental fricatives, first in Greek <θ>-words, and then in Old English names (cf.[Benskin, 1977] and [Molineaux et al., 2020]). 4  For the expected spellings, we can safely assume that the underlying sounds are also unchanged from their etymological values, e.g. that <cht> represented [xt] in OSc, as it did in Old English.Nonetheless, a substantial minority of tokens have spellings whose sound values are not so evident.Among these are the <th> and <tht> spellings for etymological [x] (17.3%); the <th> and <tht> spellings for [xt] (32.2%), and <ch> and <cht> for [θ] (10.4%).How should these be explained?Are these simply spelling innovations and, if so, what was their motivation?Or do they signify that the original sound values had subsequently changed?As we assume scribes were not randomly choosing their spelling repertoires and were "capable of sophisticated and subtle linguistic analysis" ( [Laing and Lass, 2003: 258]), we postulate that these choices encode worthwhile information, either at the level of the orthography or the sound system.In the following subsections we attempt to elucidate the likely reasons behind such choices.

The rise of <gh> and the optimisation of graphemic contrast
While both <ch> and <gh> are expected spellings for OSc [x], as in night being spelled <nicht> or <night>, and both were available in Old English scribal practices, <gh> was quite rare, so it 4 OSc <^t> is attested 128 times in words which had Pre-Scots [θ] and only four times in forms derived from Pre-Scots /x/ words.In general, it may be straightforwardly considered an abbreviation of <th>. is surprising how frequent it becomes in the FITS data. 5Given the 120-year span of our corpus, we are able look at the development of spellings for this category, to try and identify a diachronic trend.Figure 6 shows the proportion of graphemic representations for etymological [x] over the decades of the FITS Corpus. 6While our earliest data is rather sparse, the overall trend in the later data indicates a reduction in the use of <ch> and an increase in the use of <gh> to represent the velar fricative.While both these options were available, we suggest that, from the Old English period, the rise of <gh> represents an optimisation of spelling contrast.The visualisation in Figure 7 provides the motivation for this: besides representing the velar and dental fricatives, <ch> could also stand for a palato-alveolar affricate, [tʃ] (see [Bann and Corbett, 2015: 23]).On the other hand, <gh> only represents [x] and its variant before front vowels, [ç].Ultimately, since <ch>=[x/ç] had more sound representations to compete with than <gh>=[x/ç], <gh> was, among the two, the optimal candidate for this role.An important pattern that one of the reviewers rightly highlights is that of the strong lexical skew among <gh> spellings, which occur most often in the word BURGH (63%).Given that LAOS texts are predominantly administrative documents from early urban centres, this is an extremely frequent item.It is therefore a relevant question whether this pattern obtains outside of our target textual genre. 6While our focus here is primarily on the spelling alternations in the corpus, there is also a significant phonological process that affects the frequency of etymological [x] spellings in OSc.This is the process of vocalisation of [x], usually via an intermediate process of weakening (see [CoNE, 2013: Gamma Weakening -GW, and Coda Vocalisation -CV]).The result, as a reviewer points out, is that the phonological outcome of [x] is often a vocalic realisation (cf.borrow BURGH).Our analysis, focusing on the consonantal spellings, leaves these forms aside.

SPARSE DATA
Figure 7: Overlapping sound substitution sets for <gh> and <ch> across the FITS corpus

Final <t> loss and hypercorrective <t> insertion
The loss of final <t> in etymological [xt] forms and the simultaneous appearance of unetymological <t> in reflexes of [x]-and [θ]-final morphemes (see Figure 5) has been noted in the literature on Older Scots phonology.The first process has been linked to the more general, though sporadic and non-localised phenomenon of /t/-deletion in the morpheme-final clusters [xt, st, ft, pt, kt]  this has been attributed to hypercorrection, that is, to the speakers' uncertainty as to the lexical incidence of final [t], following its sporadic loss (see also [Meurman-Solin, 1997], [Romaine, 1984], [CoNE, 2013: Final Consonant Excrescence -FCE]).Whether the hypercorrection is phonic, as claimed by [Johnston, 1997a: 101] and [Romaine, 1984], or purely orthographic (a so-called backspelling), the alternation between <ch> and <cht> for etymological [x] and [xt] (as well as between <th> and <tht>, as we shall see) can be seen as a product of [t]-deletion and consequent hypercorrection.

[xt]-dentalisation and the <cht>~<th> overlap
An additional change which shows up in the literature on Older Scots is the process of [xt] dentalisation, which is invoked as an explanation for OSc <th> and <tht> spellings for Pre-Scots [xt] in forms such as dother DAUGHTER and vortht WORTH.Evidence for the reality of this as a sound change comes from the fact that its outcome is still visible in some North-Eastern dialects of Scots today ( [Macafee and Aitken 2002: §5.2 fn.87]).According to [Johnston, 1997b: 505] (see also [Grant 1931: xxxv], [Dieth, 1932: 113] Theta -TXT] claims that "less commonly" [x] > [θ] occurs without the following [t], word finally.While this seems to coincide with some of the spellings in FITS (such as FITS bourth BURGH), lack of phonic motivation and absence of present-day reflexes for such a change -at least in the Scots context -lead us to believe that this is merely a graphemic change, explained in §2.5, below.9 While it is likely that OSc <tht> spellings corresponding to Pre-Scots [xt] retain a [t] in pronunciation, it is also possible that in some cases the <t> is retained in the spelling alone, as part of the more general ambiguity of this spelling (as per §2.3).In the case of <tht> spellings derived from Pre-Scots [θ], final <t> is most likely only graphic, as [θt] is not a part of any Pre-Scots inventories, providing little ground for hypercorrective pronunciations.Furthermore, as [Murray, 1873:128] points out, <tht>-and <th>-final words are happily rhymed throughout the period and "the spellings mouth, mouitht, zenyth, zenytht, with, witht, are found promiscuously on the same page of early books and MSS".Interestingly, [Meurman-Solin, 1997: 121] notes that <tht> is a "spelling practice" found predominantly in formal and legal texts associated with central administration during the fifteenth century and that it only spreads to more informal registers and the periphery beginning in the sixteenth century, disappearing altogether in the early seventeenth century.Taking all this evidence together, we consider OSc <tht>, where derived from Pre-Scots [θ], to represent unchanged OSc [θ].

Leter-shape overlap and <ch>~<th> transcription confusion
As for the use of <ch> in place of etymological [θ], as well as <th> for etymological [x], the explanation is likely not to do with the spelling system, but rather with letter shapes and their subsequent transcription by the compilers of editions.Indeed, it has been claimed that "[t]he interchange of <c, t> in this set is perhaps adequately explained by purely orthographic considerations, namely the confusability in secretary hand of <c> and <t>" (Macafee and Aitken, 2002: fn. 87).This is borne out in the manuscripts underlying FITS, where the two letter shapes are typically indistinguishable (Figure 8).However, such figurae or allographs (see [Benskin, 1997] and [Kopaczyk et al., 2018], respectively) are almost never transcribed as <c> in Pre-Scots On the other hand, the literature suggests that Pre-Scots [xt] can be realised as OSc [θ(t)].This leads transcribers to carefully consider the near-identical initial shapes in the digraphs (or trigraphs), as <ch(t)> is taken to represent the un-fronted variant [x(t)] (Figure 10a) and <th(t)> the fronted one ([θ(t)], cf. Figure 10b).This leads to a shape-based appraisal of all such graphs (i.e. the ambiguous <c~t>) when followed by <h(t)> in non-initial position.More problematically, however, it appears that transcribers take this approach not only with etymological [xt] words, but also with etymological [θ] and [x] words.Consequently, where the relevant shape is deemed more <c>-like, etymological [θ] is transcribed as <ch(t)> (Figure 10c), despite no proposed process of [θ]>[x(t)].We consider these cases to be miscategorised <ch(t)> spellings which should be merged in favour of <th(t)>.Likewise, in etymological [x] cases where the relevant figura is seen as <t>-like, it is erroneously transcribed as <th(t)> as we see in Figure A close analysis of the data suggests that the OSc scribes followed clear principles mapping spellings to sounds, even if these led to overlapping spelling substitution sets.This consistency allows us to reconstruct the sound values for each one of the spellings given in the FITS Corpus with a large degree of confidence and insert them within triads such as those in Figure 3.

III CONCLUSIONS:
The FITS Corpus is a unique resource for exploring the historical relationship between sounds and spellings.While it relies on traditional historical phonological practices to establish likely grapho-phonological links, these are facilitated and enhanced by visualisation tools.Among such tools, we have shown the advantages of Medusa-type graphemic and sound substitution sets (Figures 2,4,7,9), as well as graphs representing the proportions of OSc spellings in a given category, either across the entire corpus (Figure 5) or diachronically (Figure 6).
While we have seen how grapho-phonologically parsed corpora, alongside key visualisation tools, can be used to tease apart overlapping sound and spelling patterns in Older Scots, their applicability is far broader.Ultimately, FITS provides a suite of tools for generating and revising likely scenarios for sound or spelling changes including: • A means to visualise relationships between historical sounds and spellings • A way of quantifying these relationships and link them to linguistic and extralinguistic factors • A way to visualise the distribution of sounds and spellings across time and space 10 As shown in Figure 7, the FITS Corpus assumes that, adjacent to high and mid front vowels, Older Scots [x] surfaced as [ç] (in phonological terms, these two are allophones).This alternation is not represented in Table 1 for simplicity's sake, but we do indeed assume words like micht MIGHT were articulated with a palatal fricative, [ç]. 11Superscript question-marks in the table represent cases where there is likely alternation between phonologically excrescent [t] and backspelled <t> in the same spelling category, and where establishing a definitive distinction between the two is impossible.

Figure 1 :
Figure 1: a sample of the 118 spelling types for the word YEAR, out of 2,243 tokens across 857 texts in LAOS (letters in parentheses are expanded abbreviation symbols) FITS Corpus substitution sets for (a) the consonantal sounds associated with grapheme <y> and (b) the spellings representing the phone [k].

Figure 3 :
Figure 3: A schematic representation of source-sound-spelling triads in FITS (above) and a single triad for the grapheme <y> in the spelling fysch for OSc FISH attested in the corpus (below)

Figure 4 :
Figure 4: Overlapping OSc spelling substitution sets for word-final [θ] and [x] in the FITS Corpus

Figure 6 :
Figure 6: Proportion of OSc graphemes for tokens of etymological [x] by decade in the FITS corpus , [CoNE, 2013: Transposition x to Theta -TXT]), this process can be thought of as an instance of progressive place assimilation where the first element of the consonant cluster is articulated in the same place of the second element ([xt]>[θt]). 7This would explain OSc <tht> spellings for Pre-Scots [xt] as instances of 'fronting' to [θt].In the cases where the sequence surfaces as <th>, the <t> element would be subsequently subject to [t]-deletion (cf.§2.3, above) and therefore represent an instance of [θ]. 8In this account of [xt] dentalisation, the change of [x]> [θ] is only motivated for the etymological [xt] category, not for Pre-Scots [x].However, [CoNE, 2013: Transposition x to

Table 1
gives a detailed breakdown of reconstructed triads for our target categories, invoking relevant sound and spelling changes that link them together.10

Table 1 :
Summary of proposed OSc phones by etymological category and spelling form in the FITS corpus, with relevant sound and spelling changes 11