Authors: Blouin Baptiste ORCID1,2,3; Cécile Armand ORCID4; Christian Henriot ORCID4

This paper introduces HistText, a pioneering tool devised to facilitate large-scale data mining in historical documents, specifically targeting Chinese sources. Developed in response to the challenges posed by the massive Modern China Textual Database, HistText emerges as a solution to efficiently extract and visualize valuable insights from billions of words spread across millions of documents. With a user-friendly interface, advanced text analysis techniques, and powerful data visualization capabilities, HistText offers a robust platform for digital humanities research. This paper explores the rationale behind HistText, underscores its key features, and provides a comprehensive guide for its effective utilization, thus highlighting its potential to substantially enhance the realm of computational humanities.

Volume: 2023
Section: Project presentations
Published on: November 10, 2023
Accepted on: November 3, 2023
Submitted on: August 22, 2023
Keywords: natural language processing,data mining,Text analysis,history,Chinese,document,[SHS]Humanities and Social Sciences,[INFO]Computer Science [cs]
  • Elites, networks, and power in modern urban China (1830-1949); Funder: European Commission; Code: 788476

