An interactive visualization of Google Books Ngrams with R and Shiny: Exploring a(n) historical increase in onset strength in a(n) huge database

Julia Schlüter; Fabian Vetter

doi:10.46298/jdmdh.5582

Julia Schlüter ; Fabian Vetter - An interactive visualization of Google Books Ngrams with R and Shiny

jdmdh:5582 - Journal of Data Mining & Digital Humanities, 22 décembre 2020, Numéro spécial sur les visualisations en linguistique historique - https://doi.org/10.46298/jdmdh.5582

An interactive visualization of Google Books Ngrams with R and ShinyArticle

Auteurs : Julia Schlüter ¹; Fabian Vetter ¹

1 University of Bamberg

Using the re-emergence of the /h/ onset from Early Modern to Present-Day English as a case study, we illustrate the making and the functions of a purpose-built web application named (an:a) lyzer for the interactive visualization of the raw n-gram data provided by Google Books Ngrams (GBN). The database has been compiled from the full text of over 4.5 million books in English, totalling over 468 billion words and covering roughly five centuries. We focus on bigrams consisting of words beginning with graphic preceded by the indefinite article allomorphs a and an, which serve as a diagnostic of the consonantal strength of the initial /h/. The sheer size of this database affords us the possibility to attain a maximal diachronic resolution, to distinguish highly specific groups of -initial lexical items, and even to trace the diffusion of the observed changes across individual lexical units. The functions programmed into the app enable us to explore the data interactively by filtering, selecting and viewing them according to various parameters that were manually annotated into the data frame. We also discuss limitations of the database, of the app and of the explorative data analysis. The app is publicly accessible online at https://osf.io/ht8se/.

https://doi.org/10.46298/jdmdh.5582

Source : HAL:hal-02149498v4

Volume : Numéro spécial sur les visualisations en linguistique historique

Publié le : 22 décembre 2020

Accepté le : 10 décembre 2020

Soumis le : 18 juin 2019

Mots-clés : [SHS.LANGUE]Humanities and Social Sciences/Linguistics, [en] R, Data visualization, Shiny, Google Books Ngrams, Google Books, corpus linguistics, historical phonology, historical linguistics, n-grams

Licence : Hal authorisation v1

Références bibliographiques

3 Documents citant cet article

Partager et exporter

Statistiques de consultation

Cette page a été consultée 2915 fois.

Le PDF de cet article a été téléchargé 1776 fois.