Comparing Human-Perceived Cluster Characteristics through the Lens of CIPHE: Measuring Coherence beyond Keywords

Eklund, Anton; Forsman, Mona; Drewes, Frank

doi:10.46298/jdmdh.15044

Anton Eklund ; Mona Forsman ; Frank Drewes - Comparing Human-Perceived Cluster Characteristics through the Lens of CIPHE: Measuring Coherence beyond Keywords

jdmdh:15044 - Journal of Data Mining & Digital Humanities, 7 mars 2025, NLP4DH - https://doi.org/10.46298/jdmdh.15044

Comparing Human-Perceived Cluster Characteristics through the Lens of CIPHE: Measuring Coherence beyond KeywordsArticle

Auteurs : Eklund, Anton ¹; Forsman, Mona ²; Drewes, Frank ³

1 Umeå University
2 Adlede AB
3 Umeå Universitet Teknisk-Naturvetenskaplig Fakultet

A frequent problem in document clustering and topic modeling is the lack of ground truth. Models are typically intended to reflect some aspect of how human readers view texts (the general theme, sentiment, emotional response, etc), but it can be difficult to assess whether they actually do. The only real ground truth is human judgement. To enable researchers and practitioners to collect such judgement in a cost-efficient standardized way, we have developed the crowdsourcing solution CIPHE -- Cluster Interpretation and Precision from Human Exploration. CIPHE is an adaptable framework which systematically gathers and evaluates data on the human perception of a set of document clusters where participants read sample texts from the cluster. In this article, we use CIPHE to study the limitations that keyword-based methods pose in topic modeling coherence evaluation. Keyword methods, including word intrusion, are compared with the outcome of the thorougher CIPHE on scoring and characterizing clusters. The results show how the abstraction of keywords skews the cluster interpretation for almost half of the compared instances, meaning that many important cluster characteristics are missed. Further, we present a case study where CIPHE is used to (a) provide insights into the UK news domain and (b) find out how the evaluated clustering model should be tuned to better suit the intended application. The experiments provide evidence that CIPHE characterizes clusters in a predictable manner and has the potential to be a valuable framework for using human evaluation in the pursuit of nuanced research aims.

https://doi.org/10.46298/jdmdh.15044

Source : zenodo.org:14622379

Volume : NLP4DH

Publié le : 7 mars 2025

Accepté le : 9 février 2025

Soumis le : 9 janvier 2025

Licence : Attribution 4.0 International (CC BY 4.0)

Fichiers

Nom	Taille
JDMDH__Comparing_Human_Perceived_Cluster_Characteristics_through_the_Lens_of_CIPHE.pdf md5 : a9a98d84f421c1f752dccc10158e923b	1.01 MB

Anton Eklund ; Mona Forsman ; Frank Drewes - Comparing Human-Perceived Cluster Characteristics through the Lens of CIPHE: Measuring Coherence beyond Keywords

Fichiers

Références bibliographiques

Partager et exporter

Statistiques de consultation