Anton Eklund ; Mona Forsman ; Frank Drewes - Comparing Human-Perceived Cluster Characteristics through the Lens of CIPHE: Measuring Coherence beyond Keywords

jdmdh:15044 - Journal of Data Mining & Digital Humanities, 7 mars 2025, NLP4DH - https://doi.org/10.46298/jdmdh.15044
Comparing Human-Perceived Cluster Characteristics through the Lens of CIPHE: Measuring Coherence beyond KeywordsArticle

Auteurs : Eklund, Anton ORCID1; Forsman, Mona ORCID2; Drewes, Frank ORCID3

  • 1 Umeå University
  • 2 Adlede AB
  • 3 Umeå Universitet Teknisk-Naturvetenskaplig Fakultet

A frequent problem in document clustering and topic modeling is the lack of ground truth. Models are typically intended to reflect some aspect of how human readers view texts (the general theme, sentiment, emotional response, etc), but it can be difficult to assess whether they actually do. The only real ground truth is human judgement. To enable researchers and practitioners to collect such judgement in a cost-efficient standardized way, we have developed the crowdsourcing solution CIPHE -- Cluster Interpretation and Precision from Human Exploration. CIPHE is an adaptable framework which systematically gathers and evaluates data on the human perception of a set of document clusters where participants read sample texts from the cluster. In this article, we use CIPHE to study the limitations that keyword-based methods pose in topic modeling coherence evaluation. Keyword methods, including word intrusion, are compared with the outcome of the thorougher CIPHE on scoring and characterizing clusters. The results show how the abstraction of keywords skews the cluster interpretation for almost half of the compared instances, meaning that many important cluster characteristics are missed. Further, we present a case study where CIPHE is used to (a) provide insights into the UK news domain and (b) find out how the evaluated clustering model should be tuned to better suit the intended application. The experiments provide evidence that CIPHE characterizes clusters in a predictable manner and has the potential to be a valuable framework for using human evaluation in the pursuit of nuanced research aims.


Volume : NLP4DH
Publié le : 7 mars 2025
Accepté le : 9 février 2025
Soumis le : 9 janvier 2025

Fichiers

Nom Taille
JDMDH__Comparing_Human_Perceived_Cluster_Characteristics_through_the_Lens_of_CIPHE.pdf
md5 : a9a98d84f421c1f752dccc10158e923b
1.01 MB

Statistiques de consultation

Cette page a été consultée 104 fois.
Le PDF de cet article a été téléchargé 26 fois.