Blouin Baptiste ; Cécile Armand ; Christian Henriot - HistText: An Application for leveraging large-scale historical textbases

jdmdh:11756 - Journal of Data Mining & Digital Humanities, 10 novembre 2023, 2023 - https://doi.org/10.46298/jdmdh.11756
HistText: An Application for leveraging large-scale historical textbasesArticle

Auteurs : Blouin Baptiste ORCID1,2,3; Cécile Armand ORCID4; Christian Henriot ORCID4

  • 1 Laboratoire d'Informatique et Systèmes
  • 2 Laboratoire d'Informatique et des Systèmes
  • 3 Laboratoire d'Informatique et des Systèmes (LIS) (Marseille, Toulon)
  • 4 Institut de recherches Asiatiques

This paper introduces HistText, a pioneering tool devised to facilitate large-scale data mining in historical documents, specifically targeting Chinese sources. Developed in response to the challenges posed by the massive Modern China Textual Database, HistText emerges as a solution to efficiently extract and visualize valuable insights from billions of words spread across millions of documents. With a user-friendly interface, advanced text analysis techniques, and powerful data visualization capabilities, HistText offers a robust platform for digital humanities research. This paper explores the rationale behind HistText, underscores its key features, and provides a comprehensive guide for its effective utilization, thus highlighting its potential to substantially enhance the realm of computational humanities.


Volume : 2023
Rubrique : Présentations de projets
Publié le : 10 novembre 2023
Accepté le : 3 novembre 2023
Soumis le : 22 août 2023
Mots-clés : natural language processing,data mining,Text analysis,history,Chinese,document,[SHS]Humanities and Social Sciences,[INFO]Computer Science [cs]
Financement :
    Source : HAL
  • Elites, networks, and power in modern urban China (1830-1949); Financeur: European Commission; Code: 788476

Statistiques de consultation

Cette page a été consultée 534 fois.
Le PDF de cet article a été téléchargé 317 fois.