Deep Learning for Period Classification of Historical Hebrew Texts

Chaya Liebeskind; Shmuel Liebeskind

doi:10.46298/jdmdh.5864

Chaya Liebeskind ; Shmuel Liebeskind - Deep Learning for Period Classification of Historical Hebrew Texts

jdmdh:5864 - Journal of Data Mining & Digital Humanities, 13 juin 2020, 2020 - https://doi.org/10.46298/jdmdh.5864

Deep Learning for Period Classification of Historical Hebrew TextsArticle

Auteurs : Chaya Liebeskind ¹; Shmuel Liebeskind

1 Jerusalem College of Technology

In this study, we address the interesting task of classifying historical texts by their assumed period of writ-ing. This task is useful in digital humanity studies where many texts have unidentified publication dates.For years, the typical approach for temporal text classification was supervised using machine-learningalgorithms. These algorithms require careful feature engineering and considerable domain expertise todesign a feature extractor to transform the raw text into a feature vector from which the classifier couldlearn to classify any unseen valid input. Recently, deep learning has produced extremely promising re-sults for various tasks in natural language processing (NLP). The primary advantage of deep learning isthat human engineers did not design the feature layers, but the features were extrapolated from data witha general-purpose learning procedure. We investigated deep learning models for period classification ofhistorical texts. We compared three common models: paragraph vectors, convolutional neural networks (CNN) and recurrent neural networks (RNN), and conventional machine-learning methods. We demon-strate that the CNN and RNN models outperformed the paragraph vector model and the conventionalsupervised machine-learning algorithms. In addition, we constructed word embeddings for each timeperiod and analyzed semantic changes of word meanings over time.

https://doi.org/10.46298/jdmdh.5864

Source : HAL:hal-02324617v2

Volume : 2020

Publié le : 13 juin 2020

Accepté le : 13 juin 2020

Soumis le : 23 octobre 2019

Mots-clés : Machine Learning,Deep Learning,Diachronic Corpus,Period Classification,[INFO]Computer Science [cs],[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]

Chaya Liebeskind ; Shmuel Liebeskind - Deep Learning for Period Classification of Historical Hebrew Texts

Références bibliographiques

3 Documents citant cet article

Partager et exporter

Statistiques de consultation