Pablo Ruiz Fabo ; Thierry Poibeau - Mapping the Bentham Corpus: Concept-based Navigation

jdmdh:5044 - Journal of Data Mining & Digital Humanities, 6 mars 2019, Atelier Digit\_Hum - https://doi.org/10.46298/jdmdh.5044
Mapping the Bentham Corpus: Concept-based NavigationArticle

Auteurs : Pablo Ruiz Fabo ORCID; Thierry Poibeau ORCID1

  • 1 Lattice - Langues, Textes, Traitements informatiques, Cognition - UMR 8094

British philosopher and reformer Jeremy Bentham (1748-1832) left over 60,000 folios of unpublished manuscripts. The Bentham Project, at University College London, is creating a TEI version of the manuscripts, via crowdsourced transcription verified by experts. We present here an interface to navigate these largely unedited manuscripts, and the language technologies the corpus was enriched with to facilitate navigation, i.e Entity Linking against the DBpedia knowledge base and keyphrase extraction. The challenges of tagging a historical domain-specific corpus with a contemporary knowledge base are discussed. The concepts extracted were used to create interactive co-occurrence networks, that serve as a map for the corpus and help navigate it, along with a search index. These corpus representations were integrated in a user interface. The interface was evaluated by domain experts with satisfactory results , e.g. they found the distributional semantics methods exploited here applicable in order to assist in retrieving related passages for scholarly editing of the corpus.


Volume : Atelier Digit\_Hum
Rubrique : Déluge de données : quelles compétences pour quelles données ?
Publié le : 6 mars 2019
Accepté le : 6 mars 2019
Soumis le : 18 décembre 2018
Mots-clés : Jeremy Bentham,manuscripts,corpus navigation,entity linking,keyphrase extraction, [ INFO.INFO-CL ] Computer Science [cs]/Computation and Language [cs.CL], [ SHS.PHIL ] Humanities and Social Sciences/Philosophy, [ SHS.LANGUE ] Humanities and Social Sciences/Linguistics

1 Document citant cet article

Statistiques de consultation

Cette page a été consultée 2299 fois.
Le PDF de cet article a été téléchargé 1282 fois.