Anna Shadrova - Topic models do not model topics: epistemological remarks and steps towards best practices

jdmdh:7595 - Journal of Data Mining & Digital Humanities, October 27, 2021, 2021 -
Topic models do not model topics: epistemological remarks and steps towards best practices

Authors: Anna Shadrova

The social sciences and digital humanities have recently adopted the machine learning technique of topic modeling to address research questions in their fields. This is problematic in a number of ways, some of which have not received much attention in the debate yet. This paper adds epistemological concerns centering around the interface between topic modeling and linguistic concepts and the argumentative embedding of evidence obtained through topic modeling. It concludes that topic modeling in its present state of methodological integration does not meet the requirements of an independent research method. It operates from relevantly unrealistic assumptions, is non-deterministic, cannot effectively be validated against a reasonable number of competing models, does not lock into a well-defined linguistic interface, and does not scholarly model topics in the sense of themes or content. These features are intrinsic and make the interpretation of its results prone to apophenia (the human tendency to perceive random sets of elements as meaningful patterns) and confirmation bias (the human tendency to perceptually prefer patterns that are in alignment with pre-existing biases). While partial validation of the statistical model is possible, a conceptual validation would require an extended triangulation with other methods and human ratings, and clarification of whether statistical distinctivity of lexical co-occurrence correlates with conceputal topics in any reliable way.

Volume: 2021
Published on: October 27, 2021
Accepted on: October 4, 2021
Submitted on: June 16, 2021
Keywords: Topic modeling,digital humanities,information extraction for scientific inquiry,[SHS]Humanities and Social Sciences,[SHS.LANGUE]Humanities and Social Sciences/Linguistics,[INFO.INFO-DL]Computer Science [cs]/Digital Libraries [cs.DL]


Consultation statistics

This page has been seen 726 times.
This article's PDF has been downloaded 300 times.