Some Reflections on the Interface Between Professional Machine Translation Literacy and Data Literacy

Due to the widespread use of data-driven neural machine translation, both by professional translators and layperson users, an adequate machine translation literacy on the part of the users of this technology is becoming more and more important. At the same time, the increasing datafication of both the private and the business sphere requires an adequate data literacy in modern society. The present article takes a closer look at machine translation literacy and data literacy and investigates the interface between the two concepts. This is done to lay the preliminary theoretical foundations for a didactic project aiming to develop learning resources for teaching data literacy in its machine translation-specific form to students of BA programmes in translation/specialised communication.


INTRODUCTION
Like many other industries, the translation industry has undergone an accelerating process of digitisation and datafication in recent years [Sandrini, 2017]. 1 A high-performing translation technology based on the interplay of digitisation and datafication is the data-driven architecture of neural machine translation (NMT), which is having a disruptive effect on the translation industry [ELIS, 2021].Given the high relevance of NMT both in the field of professional translation and in wider society, Bowker and Ciro [2019] call for developing an adequate machine translation (MT) literacy.This applies both to professional translators and to laypersons, who often employ MT as an aid to understand foreign-language content.Of course, an increasing digitisation and datafication can be observed not only in the translation industry and other professional sectors but in all layers of modern society, which requires an adequately educated citizenry.One of the many initiatives that have been established in this context is the European Union's Digital Competence Framework 2.0 [European Union, 2019; see also O'Brien and Ehrensberger Dow, 2020, 148], which specifies key components of digital competence in five areas (1. information and data literacy; 2. communication and collaboration; 3. digital content creation; 4. safety; 5. problem-solving).Parallel to this extensive EU framework, which covers a wide range of digital competencies, there are various more targeted initiatives focusing on individual competencies.One area with particular relevance is data literacy (see area 1 of the EU's Digital Competence Framework), since, inter alia, the accumulation of large volumes of (high-quality) digital training data is one of the crucial factors contributing to recent advances in artificial intelligence (AI) research and modern societies rely heavily on data-based decision making. 2 In this context, [Ridsdale et al., 2015, 2] claim that data literacy is "an essential ability required in the global knowledge-based economy" and that "any country that does not have a technology and data-savvy citizenry will ultimately be left behind both socially and economically" [ibid.: 8].Given the concurrent rise in prominence and the high relevance of MT literacy and data literacy, the current article will take a closer look at these two concepts and explore their interface.This will serve to lay the preliminary theoretical foundations for the DataLit MT project, a publicly funded project at the Institute of Translation and Multilingual Communication at TH Köln-University of Applied Sciences, Germany.The project aims to develop didactic resources for teaching data literacy to students of BA programmes in translation and specialised communication. 3

I MT LITERACY
According to O'Brien and Ehrensberger-Dow [2020, 146], "MT literacy means knowing how MT works, how it can be useful in a particular context, and what the implications are of using MT for specific communicative needs".Bowker [2021a, 26] points out that MT literacy is not a static but rather a dynamic concept, which can be adapted according to the needs of individual target audiences.Recent work on MT literacy has stressed the overall societal relevance of this concept beyond professional translation [Bowker and Ciro, 2019, 35] and hence tended to focus on MT literacy requirements of layperson audiences such as the scholarly community [Bowker and Ciro, 2019] or students (and teachers) of non-translation specific undergraduate university programmes [Bowker, 2021b].The present article takes a step back from this wide conception of MT literacy and focuses on professional MT literacy, which is understood here as the full range of MT-related competencies professional translators (and other language professionals) may require in order to participate successfully in the various phases of the MT-assisted professional translation process (for a recent overview of this process, see Krüger [2019]).Figure 1 below depicts a rough outline of a 5-dimensional model of professional MT literacy, which is based on the analysis of relevant MT literature in the field.The model merely serves to provide structure to the following discussion and is not intended to be a fully-fledged MT literacy/competence model.The following sections provide a more detailed overview of the five proposed dimensions of professional MT literacy.Owing to space limitations, the overview is rather cursory (although references to more detailed sources are given) and it disregards the multitude of interrelations between the individual dimensions.However, it does illustrate the complexity of the concept and the range of competencies thought to be involved.

Technical dimension
The technical dimension of professional MT literacy is concerned with knowledge about:

Linguistic dimension
This dimension is concerned with knowledge about: 1. pre-editing texts for MT (including knowledge about controlled languages and translation-oriented authoring [Marzouk and Hansen-Schirra, 2019]);  [Koehn and Wiggins, 2021]; etc.

Economic dimension
This dimension is concerned with knowledge about: 1. methods for estimating/measuring the effort involved in machine translation postediting (MTPE) 4 [Daems et al., 2017];

Societal dimension
This dimension is concerned with knowledge about: 1. ethical aspects of MT (including knowledge about the crucial role of human translators in providing high-quality MT training data, about copyright issues, data ownership/dispossession, etc. [Moorkens et al., 2016]); 2. MT-induced changes in the public perception/industry-internal role of human translators (potential translator marginalisation/disempowerment, etc. [Moorkens et al., 2016;Sakamoto, 2019]); 3. potential biases in MT systems (gender, race, etc. [Saunders and Byrne, 2020]); etc.
Individual aspects of the MT literacy dimensions sketched above are discussed in more detail in section 3, which is concerned with the interface between professional MT literacy and data literacy.

II DATA LITERACY
Given the high societal relevance of data literacyas discussed briefly in the introductory section -, it is not surprising that a plethora of definitions and initiatives promoting the development of this literacy in modern societies has emerged (for an overview, see Misra While there is no generally accepted definition yet, there seems to be a consensus that data literacy overlaps, to some extent, with literacies such as information literacy and statistical literacy [Ridsdale et al., 2015, 8;Misra, 2021, 5]. 5 At the same time, the relevant literature stresses that data literacy comprises not merely a set of technical skills associated with data science, analytics or statistics but that it also involves critically thinking about and handling data in different contexts [Misra, 2021, 7]. 6he current discussion is based on the approach to data literacy developed by Ridsdale et al. [2015], since it has been widely adopted for a range of data literacy initiatives.The definition proposed by these authors is a synthesis of others in the literature and is both fine-grained and, at the same time, broad enough to be applicable to a wide range of contexts.According to Ridsdale et al. [2015, 11], data literacy is "the ability to collect, manage, evaluate, and apply data, in a critical manner".Based on this definition and a survey of the relevant literature on data literacy, the authors developed a Data Literacy Competencies Matrix (DLCM), which is depicted in Figure 2 below: The matrix is structured along five key abilities/knowledge areas (grey fields in Figure 2), which are subdivided into 22 competencies.These competencies are further subdivided into conceptual competencies (blue fields), core competencies (green fields) and advanced competencies (red fields).Associated with these competencies are 64 knowledge types or tasks, which are not depicted in Figure 2 but which are included in the full version of the matrix provided in [ibid., 38].According to the authors, the matrix is "intended to form the basis of ongoing conversations about standards for assessing and evaluating levels of data literacy, and to inform the creation of learning outcomes in data literacy education" [ibid., 3).The individual elements of the matrix (focusing on conceptual and core competencies) are discussed in more detail in the next section with respect to the interface between professional MT literacy and data literacy.

III THE INTERFACE BETWEEN PROFESSIONAL MT LITERACY AND DATA LITERACY
The conceptual framework of the DLCM includes the conceptual competency introduction to data, which is concerned with general knowledge and understanding of data and its uses and applications [Ridsdale et al., 2015, 38].As a meta-competency permeating the other competencies of the matrix, it is difficult to associate with specific (sub)dimensions of professional MT literacy.
The first links with the professional MT literacy model outlined in section 1 can be established within the data collection knowledge area of the DLCM.The core competency data discovery and collection (including the knowledge types/tasks data exploration, identifying and collecting useful data) is covered by point 3 of the technical MT literacy dimension (MT training, and here particularly selecting MT training data).This would include, e.g., knowledge of suitable training data repositories such as the TAUS Data Marketplace7 or the OPUS corpus collection8 or data search services such as TAUS Matching Data9 .The core competency evaluating and ensuring quality of data and sources (critically assessing the trustworthiness of data sources, critically evaluating data quality) is also relevant for MT training data selection.Here, the linguistic dimension of professional MT literacy is also relevant, particularly point 5 on linguistic quality requirements for suitable MT training data.The societal dimension also plays a role here, especially point 3 concerned with potential MT biases (which are the result of biased datasets).
The competencies listed in the data management area of the DLCM also interface mostly with the technical dimension of professional MT literacy.The core competencies data organisation (assessing data organization requirements, organizing data, etc.), and data manipulation (assessing methods for data cleaning, identifying outliers/anomalies and cleaning data) are required in MT training pipelines (selecting/compiling/organising/cleaning data).Relevant data cleaning steps prior to MT training include, e.g., filtering out sentence pairs with identical source and target sides or anomalous length ratios, filtering out empty or overly long sentences, etc. [Buj et al., 2020, 331].The advanced competencies data conversion, metadata creation/use, data curation/security/re-use and data preservation also interface primarily with the technical dimension of professional MT literacy.These advanced competencies will not be discussed in any more detail here.
In the data evaluation area, the conceptual competency data tools is concerned with knowledge of data analysis tools/techniques and selecting/applying such tools/techniques.A link can be established here with data selection as part of an MT training pipeline in that selecting such data may require their automated analysis with the aim of identifying (un)desirable linguistic patterns (such as gender bias, see societal dimension) using corpus analysis tools such as Sketch Engine 10 .The core competencies basic data analysis (analysing data, evaluating analysis results, etc.), data interpretation (reading data representations, identifying key take-away points and discrepancies) and identifying problems using data (problems in practical workplace situations and higher-level policy, environmental and other problems) can be linked to point 5 of the technical MT literacy dimension (automatic metrics for MT quality estimation/evaluation), to point 4 of the linguistic dimension (manual MT quality evaluation) as well as to points 1 (methods for estimating/measuring MTPE effort), 2 (methods for MTPE price calculation) and 3 (feasible productivity gains in MTPE) of the economic dimension.For example, analysing translation edit rate (TER), translation time or other automatic PE effort indicators and correlating these with automatic/manual MT quality indicators in a translation agency may provide a detailed picture on how the deployed MT engines are performing in terms of quality, on how translation productivity is impacted by these engines and whether there are any discrepancies between expected and actual engine quality/productivity gains.The ability to read data representations (e.g., performance indicators such as TER or BLEU scores published by MT vendors) can also help MT buyers to select the appropriate MT system for their translation scenarios.The core competencies data visualization (creating meaningful graphical representations of data, etc.) and presenting data (verbally) (assessing the desired outcomes of data presentation, assessing audience needs, etc.) can be linked to point 4 of the economic dimension of MT literacy (setting-up/optimising business for MT integration), e.g., when potential MT productivity gains must be presented to the company management, which may have to sign off on the required process steps.The core competency data-driven decision making (prioritizing information gathered from data, converting data into actionable information, assessing/implementing possible decisions/solutions) can also be linked to this point, when data has shown that shifting to an MT-assisted translation process is indeed beneficial and the corresponding business processes have to be set-up/optimised for MT.This also touches upon point 6 of the technical MT literacy dimension, which is concerned with the technical aspects of integrating NMT systems into translation workflows/systems.
In the data application area, the conceptual competency critical thinking is concerned with the awareness of high-level issues and/or challenges associated with data when working with such data and the conceptual competency data culture includes recognizing the importance of data and fostering the critical use of data.The relevance of critical thinking for both MT and data literacy was already discussed briefly in section 2. For MT, Bowker [2019, 53] points out that "it has the potential to help if used critically, but to harm if used carelessly".Here, a link can be established with points 2 and 3 of the economic MT literacy dimension in that a critical approach to data-driven MT will help arrive at a realistic picture of feasible productivity gains in MTPE as well as an adequate awareness of potential MT-induced business risks.Critical thinking is also relevant to the societal dimension of MT literacy because of the importance of being aware of ethical aspects of MT or the phenomenon of MT bias and critically reflecting on MT-induced changes in the public perception/industry-internal role of human translators.The conceptual competency data ethics includes such an awareness of legal/ethical issues associated with data and with applying/working with data in an ethical manner.11Again, all three points of the societal dimension of data literacy are relevant here.In this context, Moorkens [2020, 27] highlights the responsibility of translator training institutions "to introduce business ethics and to highlight contemporary work practices in order to prepare students for future roles as both translators and translation industry workers" since they may "sooner or later become gatekeepers and makers of decisions about work practices and data harvesting that will impact many other stakeholders within the translation industry".The core competency data citation (knowledge of widely accepted data citation methods and creating correct citations for secondary data sets) is less immediately relevant to professional MT literacy.The core competency data sharing (assessing methods/platforms for data sharing and sharing data legally and ethically) can be linked again to point 3 of the technical MT literacy dimension (MT training, including knowledge of suitable training data repositories).The advanced competency evaluating decisions based on data can be linked to the economic dimension of professional MT literacy, and here particularly to point 2 (when MT prices are adjusted in reaction to MT productivity data gathered over time) and to point 3 (when the expectations concerning MT productivity may have to be adjusted according to such historical productivity data).

CONCLUSION
This article has hopefully shown that exploring the interface between MT literacy and data literacy is a fruitful endeavour and that this interface may warrant a more extensive and detailed follow-up analysis.Another promising next step in MT literacy research and education may be to establish different competency levels for individual MT sub-competencies since, depending on the particular audience and MT context, the competencies subsumed under the five professional MT literacy dimensions may have to be more or less well developed.For example, the Data Literacy Competence Framework by Schüller [2020, 25] makes a distinction between basic, advanced and expert levels.In translation technology research, Abaitua [2001,[36][37] proposes six translation technology-related roles (consultant, user, instructor, evaluator, manager, developer), which could also serve as a starting point for defining MT literacy competency levels.
Concerning the appropriate timing of data literacy education, Ridsdale et al. [2015, 2] state that "[t]he best place to begin this initiative is the undergraduate curriculum in post-secondary institutions, due in part to their overarching goal of producing globally competitive, critically thinking, well-equipped graduates".Against this backdrop, the DataLit MT project12 at the Institute of Translation and Multilingual Communication at TH Köln -University of Applied Sciences, Germany, aims to develop didactic resources for teaching data literacy to students of BA programmes in translation/specialised communication.With regard to the core subject matter of such BA programmes, the project will create didactic resources, which can be used to teach the various sub-competencies outlined in Ridsdale et al.'s Data Literacy Competencies Matrix in their translation-specific form of MT literacy.The teaching resources will be made publicly available in early 2023 in the form of Jupyter notebooks. 13These resources at the interface of MT literacy and data literacy can complement the teaching resources developed in the context of more extensive MT training initiatives such as the MultiTraiNMT project14 .Ideally, students working with these resources will develop an adequate MT literacy for their later professional careers while at the same time becoming data-savvy citizens well equipped for the modern knowledge economy.

Figure 1 .
Figure 1. Outline of a 5-dimensional professional MT literacy model.