Blouin Baptiste ; Cécile Armand ; Christian Henriot - HistText: An Application for leveraging large-scale historical textbases

jdmdh:11756 - Journal of Data Mining & Digital Humanities, November 10, 2023, 2023 -
HistText: An Application for leveraging large-scale historical textbasesArticle

Authors: Blouin Baptiste ORCID1,2,3; Cécile Armand ORCID4; Christian Henriot ORCID4

  • 1 Laboratoire d'Informatique et Systèmes
  • 2 Laboratoire d'Informatique et des Systèmes
  • 3 Laboratoire d'Informatique et des Systèmes (LIS) (Marseille, Toulon)
  • 4 Institut de recherches Asiatiques

This paper introduces HistText, a pioneering tool devised to facilitate large-scale data mining in historical documents, specifically targeting Chinese sources. Developed in response to the challenges posed by the massive Modern China Textual Database, HistText emerges as a solution to efficiently extract and visualize valuable insights from billions of words spread across millions of documents. With a user-friendly interface, advanced text analysis techniques, and powerful data visualization capabilities, HistText offers a robust platform for digital humanities research. This paper explores the rationale behind HistText, underscores its key features, and provides a comprehensive guide for its effective utilization, thus highlighting its potential to substantially enhance the realm of computational humanities.

Volume: 2023
Section: Project presentations
Published on: November 10, 2023
Accepted on: November 3, 2023
Submitted on: August 22, 2023
Keywords: natural language processing,data mining,Text analysis,history,Chinese,document,[SHS]Humanities and Social Sciences,[INFO]Computer Science [cs]
    Source : HAL
  • Elites, networks, and power in modern urban China (1830-1949); Funder: European Commission; Code: 788476

Consultation statistics

This page has been seen 526 times.
This article's PDF has been downloaded 313 times.