Clustering and Relational Ambiguity: from Text Data to Natural Data

Nicolas Turenne

doi:10.46298/jdmdh.4

Nicolas Turenne - Clustering and Relational Ambiguity: from Text Data to Natural Data

jdmdh:4 - Journal of Data Mining & Digital Humanities, 24 juin 2014, 2014 - https://doi.org/10.46298/jdmdh.4

Clustering and Relational Ambiguity: from Text Data to Natural DataArticle

Auteurs : Nicolas Turenne

Text data is often seen as "take-away" materials with little noise and easy to process information. Main questions are how to get data and transform them into a good document format. But data can be sensitive to noise oftenly called ambiguities. Ambiguities are aware from a long time, mainly because polysemy is obvious in language and context is required to remove uncertainty. I claim in this paper that syntactic context is not suffisant to improve interpretation. In this paper I try to explain that firstly noise can come from natural data themselves, even involving high technology, secondly texts, seen as verified but meaningless, can spoil content of a corpus; it may lead to contradictions and background noise.

https://doi.org/10.46298/jdmdh.4

Source : arXiv.org:1311.5401

Volume : 2014

Publié le : 24 juin 2014

Accepté le : 24 juin 2014

Soumis le : 14 avril 2014

Mots-clés : Computer Science - Computation and Language,Computer Science - Information Retrieval

Licence : Attribution - Pas d'Utilisation Commerciale - Partage dans les Mêmes Conditions 3.0 non transposé (CC BY-NC-SA 3.0)

Nicolas Turenne - Clustering and Relational Ambiguity: from Text Data to Natural Data

Références bibliographiques

Partager et exporter

Statistiques de consultation