Out of Context! Managing the Limitations of Context Windows in ChatGPT-4o Text Analyses

Mervaala, Erkki; Kousa, Ilona

doi:10.46298/jdmdh.15090

Erkki Mervaala ; Ilona Kousa - Out of Context! Managing the Limitations of Context Windows in ChatGPT-4o Text Analyses

jdmdh:15090 - Journal of Data Mining & Digital Humanities, 7 mars 2025, NLP4DH - https://doi.org/10.46298/jdmdh.15090

Out of Context! Managing the Limitations of Context Windows in ChatGPT-4o Text AnalysesArticle

Auteurs : Mervaala, Erkki ¹; Kousa, Ilona ²

1 Finnish Environment Institute
2 University of Helsinki

In recent years, large language model (LLM) applications have surged in popularity, and academia has followed suit. Researchers frequently seek to automate text annotation - often a tedious task – and, to some extent, text analysis. Notably, popular LLMs such as ChatGPT have been studied as both research assistants and analysis tools, revealing several concerns regarding transparency and the nature of AI-generated content. This study assesses ChatGPT’s usability and reliability for text analysis – specifically keyword extraction and topic classification – within an “out-of-the-box” zero-shot or few-shot context, emphasizing how the size of the context window and varied text types influence the resulting analyses. Our findings indicate that text type and the order in which texts are presented both significantly affect ChatGPT’s analysis. At the same time, context-building tends to be less problematic when analyzing similar texts. However, lengthy texts and documents pose serious challenges: once the context window is exceeded, “hallucinated” results often emerge. While some of these issues stem from the core functioning of LLMs, some can be mitigated through transparent research planning.

https://doi.org/10.46298/jdmdh.15090

Source : zenodo.org:14945842

Volume : NLP4DH

Publié le : 7 mars 2025

Accepté le : 9 février 2025

Soumis le : 16 janvier 2025

Mots-clés : Large language models, ChatGPT, Text analysis, Green transition, Parliamentary speeches

Licence : Creative Commons Attribution 4.0 International (CC BY 4.0)

Financement :

Source : OpenAIRE Graph

Establishing Silicon Isotopes as Weathering Tracers for Paleoenvironmental Studies; Financeur: European Commission; Code: 327768

Fichiers

Nom	Taille
OutOfContext_Mervaala_Kousa.pdf md5 : 3eb9025bcaf998d283fd5c060824e116	918.36 KB

Publications

isNewVersionOf

Mervaala, E., & Kousa, I. (2024). Order Up! Micromanaging Inconsistencies in ChatGPT-4o Text Analyses. In Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities (pp. 521-535). Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities. Association for Computational Linguistics. 10.18653/v1/2024.nlp4dh-1.51 ¹

1 Zenodo

Références bibliographiques

Partager et exporter

Statistiques de consultation

Cette page a été consultée 1563 fois.

Le PDF de cet article a été téléchargé 1496 fois.