Turenne, Nicolas - Clustering and Relational Ambiguity: from Text Data to Natural Data

jdmdh:13 - Journal of Data Mining & Digital Humanities, June 24, 2014, 2014
Clustering and Relational Ambiguity: from Text Data to Natural Data

Authors: Turenne, Nicolas

Text data is often seen as "take-away" materials with little noise and easy to process information. Main questions are how to get data and transform them into a good document format. But data can be sensitive to noise oftenly called ambiguities. Ambiguities are aware from a long time, mainly because polysemy is obvious in language and context is required to remove uncertainty. I claim in this paper that syntactic context is not suffisant to improve interpretation. In this paper I try to explain that firstly noise can come from natural data themselves, even involving high technology, secondly texts, seen as verified but meaningless, can spoil content of a corpus; it may lead to contradictions and background noise.


Source : oai:arXiv.org:1311.5401
Volume: 2014
Published on: June 24, 2014
Submitted on: April 14, 2014
Keywords: Computer Science - Computation and Language,Computer Science - Information Retrieval


Share

Browsing statistics

This page has been seen 6796 times.
This article's PDF has been downloaded 1323 times.