How to visualize high-dimensional data: a roadmap

Hermann Moisl

doi:10.46298/jdmdh.5594

Hermann Moisl - How to visualize high-dimensional data: a roadmap

jdmdh:5594 - Journal of Data Mining & Digital Humanities, 23 décembre 2020, Numéro spécial sur les visualisations en linguistique historique - https://doi.org/10.46298/jdmdh.5594

How to visualize high-dimensional data: a roadmapArticle

Auteurs : Hermann Moisl ¹

1 Newcastle University [Newcastle]

Discovery of the chronological or geographical distribution of collections of historical text can be more reliable when based on multivariate rather than on univariate data because multivariate data provide a more complete description. Where the data are high-dimensional, however, their complexity can defy analysis using traditional philological methods. The first step in dealing with such data is to visualize it using graphical methods in order to identify any latent structure. If found, such structure facilitates formulation of hypotheses which can be tested using a range of mathematical and statistical methods. Where, however, the dimensionality is greater than 3, direct graphical investigation is impossible. The present discussion presents a roadmap of how this obstacle can be overcome, and is in three main parts: the first part presents some fundamental data concepts, the second describes an example corpus and a high-dimensional data set derived from it, and the third outlines two approaches to visualization of that data set: dimensionality reduction and cluster analysis.

https://doi.org/10.46298/jdmdh.5594

Source : HAL:hal-02145440v2

Volume : Numéro spécial sur les visualisations en linguistique historique

Publié le : 23 décembre 2020

Accepté le : 15 décembre 2020

Soumis le : 21 juin 2019

Mots-clés : [SHS]Humanities and Social Sciences, [en] Data visualization, multivariate data, high dimensionality, dimensionality reduction, cluster analysis

Hermann Moisl - How to visualize high-dimensional data: a roadmap

Références bibliographiques

1 Document citant cet article

Partager et exporter

Statistiques de consultation