Anna Shadrova - Topic models do not model topics: epistemological remarks and steps towards best practices

jdmdh:7595 - Journal of Data Mining & Digital Humanities, 27 octobre 2021, 2021 - https://doi.org/10.46298/jdmdh.7595
Topic models do not model topics: epistemological remarks and steps towards best practicesArticle

Auteurs : Anna Shadrova ORCID1,2,3

The social sciences and digital humanities have recently adopted the machine learning technique of topic modeling to address research questions in their fields. This is problematic in a number of ways, some of which have not received much attention in the debate yet. This paper adds epistemological concerns centering around the interface between topic modeling and linguistic concepts and the argumentative embedding of evidence obtained through topic modeling. It concludes that topic modeling in its present state of methodological integration does not meet the requirements of an independent research method. It operates from relevantly unrealistic assumptions, is non-deterministic, cannot effectively be validated against a reasonable number of competing models, does not lock into a well-defined linguistic interface, and does not scholarly model topics in the sense of themes or content. These features are intrinsic and make the interpretation of its results prone to apophenia (the human tendency to perceive random sets of elements as meaningful patterns) and confirmation bias (the human tendency to perceptually prefer patterns that are in alignment with pre-existing biases). While partial validation of the statistical model is possible, a conceptual validation would require an extended triangulation with other methods and human ratings, and clarification of whether statistical distinctivity of lexical co-occurrence correlates with conceputal topics in any reliable way.


Volume : 2021
Publié le : 27 octobre 2021
Accepté le : 4 octobre 2021
Soumis le : 16 juin 2021
Mots-clés : Topic modeling,digital humanities,information extraction for scientific inquiry,[SHS]Humanities and Social Sciences,[SHS.LANGUE]Humanities and Social Sciences/Linguistics,[INFO.INFO-DL]Computer Science [cs]/Digital Libraries [cs.DL]

2 Documents citant cet article

Statistiques de consultation

Cette page a été consultée 3318 fois.
Le PDF de cet article a été téléchargé 1493 fois.