Michael Gervers ; Gelila Tilahun - Temporal Sequencing of Documents

jdmdh:12520 - Journal of Data Mining & Digital Humanities, 24 septembre 2024 - https://doi.org/10.46298/jdmdh.12520
Temporal Sequencing of DocumentsArticle

Auteurs : Michael Gervers ORCID1; Gelila Tilahun ORCID1

We outline an unsupervised method for temporal rank ordering of sets of historical documents, namely American State of the Union Addresses and DEEDS, a corpus of medieval English property transfer documents. Our method relies upon effectively capturing the gradual change in word usage via a bandwidth estimate for the non-parametric Generalized Linear Models (Fan, Heckman, and Wand, 1995). The number of possible rank orders needed to search through for cost functions related to the bandwidth can be quite large, even for a small set of documents. We tackle this problem of combinatorial optimization using the Simulated Annealing algorithm, which allows us to obtain the optimal document temporal orders. Our rank ordering method significantly improved the temporal sequencing of both corpora compared to a randomly sequenced baseline. This unsupervised approach should enable the temporal ordering of undated document sets.


Publié le : 24 septembre 2024
Accepté le : 15 septembre 2024
Soumis le : 7 novembre 2023
Mots-clés : Computer Science - Computation and Language

Statistiques de consultation

Cette page a été consultée 356 fois.
Le PDF de cet article a été téléchargé 53 fois.