Clustering and Relational Ambiguity: from Text Data to Natural Data

Text data is often seen as"take-away"materials with little noise and easy to process information. Main questions are how to get data and transform them into a good document format. But data can be sensitive to noise oftenly called ambiguities. Ambiguities are aware from a long time, mainly because polysemy is obvious in language and context is required to remove uncertainty. I claim in this paper that syntactic context is not suffisant to improve interpretation. In this paper I try to explain that firstly noise can come from natural data themselves, even involving high technology, secondly texts, seen as verified but meaningless, can spoil content of a corpus; it may lead to contradictions and background noise.


INTRODUCTION
Human cognition call a diversity of concepts such as memory and brain anatomy, inference and reasoning, motivation, time and space, classification and clustering.Inference tries to identify good relations or properties associated to an object.In this sense it is also possible to test validity or consistency of a relation.Let be the proposition P = "a cat is a stone" is false or contradictory because a stone is not a living organism, though a cat is a living organism.P can be called paradoxal or contradictory.Sometimes society lives with contradictions such as tolerance to lots of death on roads or in wars but intolerance for death with diseases.In this paper we more specifically focuse on sources of potential contradictions which could spoil computation of information extraction.Formal semantics is attached to validate relations between a set of objects.Our focuse is not only to study issues is managing complexity of a logical proposition and how to compute if it true or false but given a text consisting of a set of sentences how to extract relations and see how they be asserted as non contradictory regards others relations extracted in others texts.So texts are the primary material of discussion.Chapter 1 presents relational ambiguities we can find in texts.We start to present a typology of logical relations.Given a type of relations, we explain how to extract such relations with markers in texts.But markers are not sufficient to detect a contradiction.A specialized language such as molecular corpus gives example of ambiguous relations (contradictory) that can not be detected with markers.Hence we show that global overview of words collocations in a corpuscan give a good signal about the structure.In our "publish or publish" new era of research and development system, production of literature is high but a non-negligible percent of papers becomes false over time.It is possible to compile from Pubmed website 4,800 papers honestly accepted but hence officially retracted.Such amount of information is original to make a corpus of real texts, written with intentional to propose arguments and content to readers, in a real natural language, but knowing that the content has been invalidated ex-post by readers.To build a random text is quite easy but the grammar and arguments will not be normalized by how argumentation is made usually by "normal writers and experts", or natural language will not confirm standard used of official grammars and texts normally used to make corpora.In this sense we can consider such texts as "wellconstructed" in the sense we can find such texts in nature (on official databases) and purely noisy.Chapter 2 presents a source of ambiguity coming from human interpretation of natural data.Scientific and technological texts are supposed to take their foundation from validated experimental devices producing experimental data.A device can be technological as a telescope in astronomy or a formular test in psychology.I present an overview of ambiguous interpretation across several sciences which should impact conclusions of practioners and the way they can restitute results in documents.We can not call it birth of controversy but ambiguous interpretability of data of specific results requiring validation by other techniques.Oftenly a controversy occurs when several techniques lead to opposite conclusions, as a protocol is supposed to be scientific when it gives a warranty of result reproductibility.

Antinomy and paradox in mathematics
In philosophy and logics, paradox has been attributed to greek rhetoric during the VII century before JC.First paradox has been the the lyer of Epimenide.It says that "a man told that he was lying.What he said was true or false?"In another way let reformulate in this way, Epimenide says « All Cretan are lyers.»This was considered by antic philosophers as a paradox.By the way, either Epimenide tell the truth, then he lyes (because he is Cretan), so its statement is false (because all Cretan lye).Either, in the contrary, Epimenide lyes by saying that, then its statement is false: there is at least one Cretan telling the truth, what is not contradictory, because it is the solution of the paradox.In modern mathematics, the logician Russel described the following paradox in 1902 formulated by this question: « is the class of all classes which are not element of themselves, element of itself?In 1919 he reformulated the statement in a vernacular language such as « The Barber of a given village shave exactly each person who does not shave himself.Question : does this barber shave himself ?».If we search a solution in a predicative analysis framework the reasoning leads to a contradiction.Let be R = {x such that x is not an element of x}, If R is element of R then R matches "x is not element of x ", hence R is not an element of R.So contradiction.If R is not element of R then R does not match "x is not element of x", that means R matches non-"x is not element of x " what is equivalent to "x is element of x " so R is element of R. Contradiction.The theory of sets permits to escape the contradiction because a set can not contain itself.

Antinomy in linguistics
Poetry is, and was for a long, a playground for using words not usually used in same context such as in French: "Dans un temps proche et très lointain" or "Je suis et je ne suis plus".More radically in any language we can find pairs of words associating contrary meanings.Most of them are verbs and adjectives such as : to move back / to move forward, to begin / to stop, to increase / to decrease, black / white, elitist / popular, fast / slow, big / small, wet / dry.We can also find what is called quasi-antonyms such as bon/terrible.According to Antoine Culioli [Culioli, 1987], contrary paires are illusion of language which better tends to construct complementary pairs in the sens of mathematical logics, such as "white", and "non-white" meaning any colour except white.For A. Cullioli, fuzzy sets should be an interesting framework but such formalism is too weak for a fine description.
Recall families of linguistic antonomy [Herrmann et al, 1986].Two lexical items are linked by antonomy relationship if it is possible to draw a symetry of their semantic features through an axis.Symetry can be defined in different ways, according nature of the support.We observe several support setting each one a different antinomy : -complementary antinomy concerns application (or non-application) of a property ( 'applicable' / 'non-applicable' , 'presence' / 'absence' ) : for instance, 'shapeless' is antonym of all having a form, the same about 'tasteless' , 'colorless' , 'odorless' , etc. about all should have taste, colour, smell, …In classical logics definition is -scalar antinomy concerns a property influencing a scalable value (high value, low value) : for instance, 'hot' , 'cold' are symmetrical value of temperature; It is explianed by existence of a « neutral value » from which the others are settled.In classical logics it can be expressed by if R is the property having a reference value (neutral or median) -dual antinomy is concerned by existence of a property or an element considered as symmetrical by usage (for instance 'sun' 'moon' , or by natural or physical properties about studied objects (for instance 'male' 'female' , 'head' 'foot' , …); Usage of textual resources sucha as corpora occurred in the domain of psychology in 1989 with studies of Charles and Miller [Charles and Miller, 1989] aiming at checking with the help of the Brown Corpus, hypothese of Deese according to two adjectives with opposite meaning are supposed to be antonyms when they are considered switchables over most of their contexts [Deese, 1965].A little bit later, [Justeson and Katz, 1991][Fellbaum, 1995][Willners, 2001][Jones, 2002] have defined a set of morpho-syntactic scheme to detect automatically antonyms candidates (see table 1).Such patterns can be also defined in another languages as French [Amsili, 2003]

Relational ambiguity in a specialized domain
Table 2 presents what should look opoosition in textual according linguistic markers and rethorical expression.If we focused on a specialized discourse, opposition could take another expression.[Reinitz et al, 1998] have studied, in molecular biology, the fly species and shown an ambiguity in the role of SmaI-BglI protein to create the stripe 6 in the fly body.
According to [Howard and Struhl, 1990] "Further deletion analysis of this region (particularly constructs ET44,30 and 31) provides clear evidence that an 600 bp region of DNA (from position -8.4 to -9.0; ET31) contains all of the elements necessary and sufficient for a relatively normal stripe 6 response (Fig. 3B).However, we note that this response seems to be displaced slightly posterior to the location of the endogenous stripe 6 at this stage." But according [Langeland et al., 1994] "The 526 bp SmaI-BglI reporter construct (6(526)lacZ) gives rise to strong lacZ stripe expression corresponding to h stripe 6." Another example of contradiction is the one pointed out by [Giles and Wren, 2008] notifying a behavior uncertainty between c-jun and c-myc genes.
According [Davidson et al, 1993] "17 bet -Estradiol had little effect on expression of c-jun, jun B, jun D, or c-fos mRNA by MCF-7 cells over 12 h, although it stimulated c-myc expression 4-fold within 30 min." But [Bhalla et al, 1993] formulated differently such as "In addition, intracellularly, mitoxantrone-induced PCD was associated with a marked induction of c-jun and significant repression of c-myc and BCL-2 oncogenes."

Comparison of real and artificial corpora
We try to compare the lexical distribution and associations between an artificial corpus and a real corpus about the same size.
A corpus is a collection of texts.
A specialized corpus is written in a human vernacular language.It has to cover all discussions of a technical field from past to present.
If we follow definition 2, a specialized corpus covers whatever people can say about the field.
A specialized corpus contains all relations of a given field.Let suppose two corpora C1 and C2 are specialized of the same field, if a relation is contained in C1 but not in C2, it means that C2 does not cover the field; to make a real corpus of the field C1 and C2 has to be merge, or C2 can be called a sub-corpus of the field.According to that we can not compare a specialized corpus with another of the same field.But we can create artififial corpora.An artifical corpus is influenced by lexical composition and grammar it uses.
Our hypothesis here is that lexical distribution and the grammar can lead to a different density of relation.Hence we expect that only a specialized corpus will give a relational structure similar to another specialized corpus only.
We used different ten corpora and one specialized corpus.We made the ten corpora in terms of the size of documents or words of the specialized corpus.Among the ten corpora, four are artificial, they are settled with a mixture model (lexical distribution and grammar):  "Corpus BD" is a real corpus is specialized about the biodiversity domain.From this list we select a subpart of 1000 words to generate sentences. "Corpus BL" is a corpus of 6,500 generated abstracts containing 150 words from randomized sequences generated randomly in the same way than "corpus SL" but with an extende lexical dictionary about 50,000 words. "Corpus NG" is a corpus of 3,971 among the 18,846 from 20 newsgroups from web forum exchanges. "Corpus RT" is a corpus of Reuters news.6025 news were kept among the 21,578 of the collection. "Corpus TW" is a corpus consisting of 50,000 tweets in English.Length of each tweet is about 15 words.It comes from Twitter database.
Distibutional study of frequent words leads to high informational signal capture through most significative occurrences under hypohesis that they repeat.Pioneering work of Georges Zipf shown a typical distribution x.y = Constant where x is the sorted rank over frequency and the y-axis is the number of occurrences of elementary lexical items in a long text or a ste of texts in a given language [Zipf, 1935].Frequency is defined by the number of occurrrences of a lexical item in the corpus.We used R platform [R Core Team, 2013], and especially tm package [Feinerer et al, 2008] and basic matrix functions, to split corpora into elementary lexical items and to sort frequent items.Punctuation, figures and word smaller than 3 caracters had been deleted.When stemming the raw text, we keep only the root form of each word and the text is less dense as seen on the table 3. Regardless the kind of corpus, with ou without stemming, words occurring one time represent between 41.9 and 54.4 % of all features, words occuring 2 or 3 times represent between 38.7 and 47.7 % of all features occurring more than one time, or between 19.5 and 21.8 % of the whole set of items (see Figure 1).Being aware of the large amount of items, and their distribution, frequency can offer an anchoring to catch strong lexical semantic signal about the content.For relevant feature extraction, a basic process relies on frequent items selection.We set a threshold to make a comparison of itemset extraction.A reasonable figure should be 5% of documents, being a minimal freqaucny for a relevant frequent item.We call it S f, this number knowing that an item can occur several times in the same document, and S d a frequency threshold in terms of strict quantity of document in which a term need to be seen.Table 3 give results with S f =240.Amount of frequent words is low, about 1-2 % of all lexical items.It may be readable quickly.It is quite powerful to get a crude idea of the content of a corpus.
Corpus We can see (table 3 and table 4) that considering the "true" corpus the amount of frequent terms is a good signal for interpreting the biodiversity domain.Nevertheless for the "false" corpus the amount is a small signal only indicating that the majority of document talk about cell biology and medicine.
Now turn to a macroscopic analysis of corpora.Lots of clustering algorithms leads to a sumarization of similarities between bags of words, one of them reveals close-in-context items within their collocations: k-nearest-neighbour algorihm (KNN).It has been created by [Cover et Hart, 1967] and leads to good results with different kinds of data.We can argue that frequent itemset extraction method of [Agrawal and Srikant, 1994] called apriori is a variant of KNN.An interesting property of this kind of algorithm is the low-level time-complexity.It is also efficient with sparse data like text data.To visualize a large global clustering we used the Igraph package implemented for large network analysis and visualization [Csardi and Nepusz, 2013].Thirteen layout algorithms are available.We especially used the Fruchterman-Reingold layout which is force-based combining attractive forces of adjacents vertices, and repulsive forces on all vertices [Fruchterman and Reingold, 1991].We also used the DrL layout (Distributed Recursive Layout) also force-based and using the VxOrd routine offering a multi-level recursive version to obtain a better layout on big graphs, and ability to add new nodes to a graph already displayed [Martin et al, 2011].

Definition 3. Data Structure
Let be a data matrix where i represents i-th line and so the word i, j represents j-th column, hence the document j, n is the number of words and m the number of documents.
Definition 4. Neighbourhood Two items and are neighbours if it exists a document , where We discussed previously that very-frequent words are interesting to extract.We want now not only to look a set of items but their relationships, and especially as a first step how this global set of relationships is featured.Visualization is a good tool fill this task because thousands of relationships are involved and no primary criteria permits to select a pool of specific or more relevant relationships.If we try to visualize the symetric data matrix of most frequent terms betweens each other for instance we get a bool of links without structural specificity; each items having the whole set of tiems as nearest neighbours.
For improving clustering efficiency we need to operate a data reduction.Algorithm below shows a reduction by the weighted margin mean.Computing the incidency matri xis based on a simple reduction by substracting means of non-null values of each line to matrix value of the same line.

Definition 5. Data Reduction
Where <M> l is a mean vector of a line from M. plays as a regulation factor to regulate the rate of nearest neighbours, in fact the number of nearest neighbours is not defined explicitly.Compute the mean of links per node, Nb_mean_link = mean(rowSums(TD)) 21: Generate the layout DRL for display with TD as adjacency matrix.
We used the Fisher's Iris dataset to validate the clustering approach.The dataset consists of 150 individuals described by 4 features, and forming 3 classes (  Our hypothesis, through visualization, aims at comparing different ranges of word frequency and at distinguishing their impact on global classification.Basically we could guess, on the one hand, that lexial items contribute equally each one to clustering.Even more we can suppose that more frequent words are more clustered than low frequent ones.On the other hand, we also could expect than « true » data (i.e.corpus BD) are quite more clustered than « false » data (i.e.corpus PM).As Zipf distribution shows it (Figure 1) range frequency can be considered as a good parameter to categorize numerically the lexical space.It is possible to define a partition of contiguous ranges depending upon the two first ranges and containing almost the same number of contexts.

Definition 6. Context
A context of a lexical item is a text area in which can be seen an occurrence of a lexical item.Let w 1 and w 2 two lexical items.If f 1 and f 2 are, respectively, the frequency for each lexical items, C= f 1 + f 2 is the total number of contexts.
For instance 3 lexical items having frequency 2 generate 6 contexts.About the corpus BD table 6 shows that a series of frequency ranges from which the first two ones are [2-5] and [6][7][8][9][10][11][12] produce 23 ranges and having in average 33,554 contexts.Let K be a granularity factor (number of ranges) and N c the averaged number of contexts per range, we observe that : Global visualization changes when we select a set of lexical items from different ranges.

Discussion
Some sciences try to learn close associations between components, we can cite chemistry and sociology.Some other sciences try to learn about more global structure like economy and astrophysics.Computational linguistics and lexical statistics are domains able to take overview of a whole set of relationships as well as focusing on specific relationships.In this chapter we try to show some results for a whole overview of closed relationships in same short documents, highlighted by some items involved in specific argumentative relationships.
Firstly we try to explain how contradictions can occur explicitly in texts as specific relationships.Secondly two corpora have been studied to extract global information.They share common properties such as: short document size, technical domain, English language, natural distribution of lexical items, corpus size.Nevertheless one corpus is a domain studied by lots of people as a scientific active domain (i.e.biodiversity), the other consists of "hoax" documents written by people as true documents.Surprisingly there is a striking similarity of global clustering visualization between hoax documents and true documents.The texts are natural langage factual information interpreted by Humans.Originally experimental data generated or pretreated prior to lead to published interpretations.In the next chapter we try to highlight interpretation locks of evidence in several areas.Beyond syntactic associations Humans chooses their words based on their understanding that can not be stable and give rise to divergent views even paradox or contradiction.

Medical Observation
The other important impediment is the ambiguous interpretation of therapeutic effects, in particular if the pretreatment stage is the delayed relaxation pattern as is usually observed in diabetic patients [Fang et al, 2004].In schizophrenia, dysphoria or psychotic symptoms should improve at the same time that negative symptoms improve, it is not clear that there has been a direct effect on negative symptoms [Kirkpatrick et al, 2006].Clinical examination different signs and functional tests are in use with at times lack of quantification and problems of interpretation [Pagenstert and Bachman, 2008].Analysis of multiexponential decays often leads to hard interpretation, confirming limited diagnostic value of relaxation times [Perea et al, 2007].Presence of multiple positive peaks before and after averaged jerks led to ambiguous interpretation of the coupling between EEG transients and EMG potentials [Canafoglia et al, 2006].Tympanograms can present ambiguous interpretation related to admittance [Palmu, et al, 2005].[Kaźmierczak et al, 2008] found that symptoms with female patients with DM gives ambiguous interpretation of electrocardiogram ECG.Interpretation of X ray images is sometimes difficult, like object boundary points at different focal distances, inconsistency between anatomical sections and X ray projections and multiplicity of shades [Blinov et al, 2011].According [Lazyuk et al, 1996] the method of digital thermography in the version developed cannot be used for estimating the functional state of a myocardium and pulmonary circulation due to problem of interpretation of the results obtained and their great variability.

Psychology drawing
Mental images can be ambiguous according geometric direction such as the top/bottom or front/back of the image [Peterson et al, 1992].Dynamic patterns induce a vivid sense of rotation in depth but with dubts either as leftward or rightward rotation about a vertical axis (corresponding to clockwise or counter-clockwise rotation) [Grossmann and Dobbins, 2005].
Luminance can have an impact such as an edge may be due to a difference in illumination, or a difference in reflectance, or both.Observers can vary the luminance of a small test [Schirillo and Shevell, 1997].Plotting a statistical test can induce false impressions.For instance the use of the Scree Plot produced an ambiguous interpretation with a possible 'elbow' appearing after eigenvalues [Maltby et al, 2008].According [Cashera et al, 2007] multimodal systems support people with different needs and different features during the interaction process however naturalness can usually produce ambiguous interpretation.[Robertson, 2000] describes a Minimum Description Length Agent Negotiation Image interpretation is fundamentally ambiguous.Interpretation involves finding the most probable interpretation.What we "See" is the most probable interpretation.[Rakoczi and Pohl, 2012] criticize reliability of existing eye tracking studies (within both and economic settings) may be impaired due to ambiguous interpretation.[Kawabata, 1993] assessed a rate to interpret correctly a picture when fixating a target.[Kim et al, 2003] give importance to memory associated to priming stimulus while the ensuing information of an ambiguous interpretation is referred to as target information.[Drogemuller,200] assume ambiguous interpretation to geometric information, as standards for civil engineers, quickly come to the fore.

cognitive imagery
The low sensitivity of experiments with the currently available techniques have resulted in much conflicting data [Carlen et al, 2006].When gesture precedes the coexpressive word by a relatively large margin, the upcoming speech cannot influence the interpretation of gesture.Thus, an ambiguous interpretation of the gesture is finalized and stabilized before the word onset [Habets et al, 2011].[Leznik et al, 2002] remarked than in most imaging studies the interpretation of imaging pattern is based on subjective criteria that are open to ambiguous interpretation.[Eagleman, 2011] reports a competition among groups of neurons typically appears only in very specific contexts, in which sensory information lends itself to ambiguous interpretation (eg, binocular rivalry).[Grasman, 2004] describes interpretaion problem about Laplacian computation in neuro-electromagnteic signal origin between spatially high passed filtered topographies.[Coq et al, 2009] noticed that a deterioration of neuronal properties would likely result in ambiguous interpretation of tactile cues and undoubtedly contributed to a decline in grasp control, ultimately resulting in failed and repeated grasp attempts, as well as increased reach and grasp times.[Kreher et al, 2008] analyzed distribution of cortical fibers showing that the inherent limitations of the spatial resolution of diffusion tensor images, the limited sharpness of the orientation density function, and the ambiguous interpretation of the anisotropy of diffusivity concerning the cortical fibre direction may lead to false positive connections.[Mayer et al, 2007] describe brain chemistry as changes most often expressed as metabolite ratios, which although useful, can lead to ambiguous interpretation of data.[Jitsev, 2010] recall that the contextual support provided by learning such high-order relations is in general of crucial importance for correct interpretation of visual stimuli embedded in a larger context (e.g., object or scene).Their local appearance is usually highly ambiguous and can be correctly interpreted only if consulting additional contextual cues mediated by the connectivity formed during the previous experience with the visual input.About pain modulation [Wilfer- Smith, 2011] says that the majority of research has concentrated on inhibition, which has led to an ambiguous interpretation of brain imaging data in visceral pain.

Physical systems
A number of flow regime classification models have been reported in the literature based on the subjective and variable visual observations, such as the Mandhane flow regime map [Cai et al, 1994].In astrophysical imagery values of parameters can induce misunderstanding [McIntosh et al, 2004].Radar imaging provides an advantage for the earth change observation independently of weather conditions, however, the recognition of some features as roads is more difficult [Chibani, 2003].In coronary applications, the position of the catheter changes a lot due to the curved nature of the arteries.This gives images that do not correspond to the expected cross section of the stent and lead to ambiguous interpretation by physicians [Brusseau et al, 1999].[Honnicke et al, 2005] report that superposition of the details arising from those three main sources of contrast can result in ambiguous interpretation of the image though mathematical image processing such as diffraction enhanced images has been widely used to solve this problem.According [Valiullin et al, 2003] NMR relaxometry is apparently a more suitable method for probing a length scale but it is often hampered by complicated interpretation of the experimental data.In their patent [Kaye and Gordon, 1998] explains that imaging microparticles should be patterned in such a manner as to ensure that ambiguous pattern interpretation cannot occur in the case of 90, 180 or 270 degree rotation from the intended viewing orientation.About solid-state imaging [November and Wilkins, 1992] indicate that measurements from a single spectral point are subject to ambiguous interpretation of magnetic field with velocity and line strength.In fact high-voltage modulators are difficult to, maintain and control reliably.Visual observations can lead to misinterpretations such as auroral substorm observations as described [Feldstein et al, 2011].[Forrester et al, 2000] presents Laser Doppler Imaging as an established technique for the two dimensional measurement of tissue perfusion; But the uncertainty of photon penetration depth leads to ambiguous interpretation of what fraction of the tissue microcirculation is being sampled.Hydrogen is very difficult to detect through X-ray diffraction, and the Fourier transform infrared spectra of hydrous ringwoodite are very broad, with ambiguous interpretation through frequency to distance relationships [Panero, 2010].

Biological systems
As a consequence of ambiguous of morphological similarities, many species have been moved between genera or even families since the earliest exhaustive classifications of liverworts [Hentschel et al, 2007].According to [Carette and Ferguson, 1992] both the programmed cell death, and in particular the epithelial-mesenchymal transformation theory of seam degeneration rely on the potentially ambiguous interpretation of a dynamic event from a series of static images.In phylogenetics, [Yu et al, 2010] pointed out an ambiguous interpretation about inference for the entire cladogram.[Palomares-Ruis et al, 2010] advocate of the phylogenetic relationships within plant-parasitic nematodes such as Longi-doridae, especially in cases where morphological characters may lead to ambiguous interpretation.[Gantchev et al, 1992] advocate of the spin-labelling technique but recall that in studying the dynamic behaviour of biological membranes an unambiguous interpretation of the spectral data is difficult.[Ivanov, 2004] wrote about echolocation of dolphins by imagery and describes that if the animal changes the spectral-time structure of echolocation pulses on purpose, the statistical processing yields an ambiguous interpretation of data on the acoustic behavior of a dolphin in the course of the detection and identification of targets.[Shin and Pierce, 2004] warn about difficult interpretation of the fluorescence signal caused by fluorescence resonance energy transfer between dyes.[ DaSilva and Oliveira, 2008] critize ERIC-PCR technique devoted to identification of strain groups, due to interpretation limitations leading to low reproducibility between laboratories.[Gorbatyuk and Andronati, 1996] point out that 1H Nuclear magnetic resonance (NMR) spectra were assigned incorrectly because of a rather ambiguous interpretation of the spectra in absence of the complementary 13C NMR spectra.[Wood et Napel, 1992] discuss about radiological imagery interpretation problems about surface orientation of the reconstructed objects though this problem can be avoided by using multiple light sources.

Chemical measure
With no detailed knowledge of the composition of reaction products, coulometric reduction can lead to different explanation [Lenglet et al, 1995].Mössbauer spectra at room temperature complemented with powder X-ray diffraction analysis of relatively iron-rich soil-samples, and of their particle size fractions (sand, silt, and clay) are compared to demonstrate the ambiguous interpretation of iron oxides mineralogy [Pizarro et al,2000].Many hypotheses have been advanced to account for the absence of organics and the possible chemicals and reactions that could account for the ambiguous biology experiments even though more reliable, each of the electrochemical techniques by themselves [Kounaves, 2003].[Kerridge and Kaltsoyannis, 2003] have carried out studies of cerocene, thorocene and protactinocene, and find that in the case of cerocene strong hybridization between the metal f δ and ligand π(e 2u ) levels can lead to an ambiguous interpretation of the degree of 'f 1 ' character in the ground state wavefunction.[Bufle and Filetalla, 1995] argue that any model can be fitted to titration curves, consequently any a priori model presently used leads to ambiguous interpretation of data.[Tumanova et al, 2005] seeks active sites of the particulate membranebound methane hydroxylase pMMOH.Data seems ambiguous due to thefact that the preparations used for crystallization were inactive.glasses formed in this system with socalled 'boron i.e. alkaline earth metal cations.[Mechinskas, 2002] mentions that analysis of certain equivalent electric circuit tends to be subjective and lead to an ambiguous interpretation of the results.For an analysis of pulsed measurements to be more objective, they suggested that the data should be transferred from a time domain into a frequency domain.Crystalline structure is often taken as a reference to establish the dominant interaction pathway which forms the basis for modeling the magnetic behavior.However, this approach can lead to ambiguous interpretation of the magnetic data, mainly for systems where there is at least one possible pathway to weak magnetic exchange interaction [Florincio et al, 2012].Molecular or supra-molecular nature of low lying valence excitations in condensed phase water lead to an ambiguous interpretation of the absorption spectrum and an unclear picture of the microscopic details that underline peak position [Cabral Do Couto and Chipman, 2012].NMR suffers from the presence of paramagnetic species which may entail ambiguous interpretation, some caution has to be exercised before NMR measurements, in particular about membrane cleanings [Xu et al, 2012].

Biological measure
The use of multiple molecular markers as aids in genetic selection programs can be spoiled due to collinearity [Gianola et al, 2006].Some DNA sequences such as 16S rRNA sequencing may occur in species harbouring multiple copies of the 16S rRNA gene, as demonstrated between the different operons in E.coli [Mollet et al, 1997].The importance of unequivocal annotation of microarray experiments is evident.The different probe and gene IDs corresponding to the two annotation releases generates uncertainties [Noth and Benecke, 2005].PCR methods can sometimes be controversial and a post-PCR control has been shown to be often essential to confirm a sequence identity in case of ambiguous recognition of specific targets [Peano et al, 2005].In some biological approaches ionophores were used for the demonstration of the electrogenic properties of the enzyme, which could lead to a problem of interpretation of electrogenicity [Eisenrauch and Bamberg, 1990].[Kloczkowski et al, 2002] recall that hydrogen bond placement can be different because of ambiguous interpretation of imperfect geometries inherent in experimental structures.Diagnosis relies on techniques, one of them is serology.In spite of the high sensitivity, routine serological tests provide results of ambiguous interpretation [Kompalic-Cristo, 2004].Occasionally, unwanted nonspecific PCR products, of-ten in the size range of the expected product, are obtained during the amplification process; this can lead to ambiguous interpretation of results in ethidium bromide-stained gel anal yses [Battles, 1995].[Moskovets et al, 2003] related a weak fragmentation of singly charged precursors in MALDI TOF/TOF-MS (compared with collision-induced fragmentation of doubly charged precursors in ESI-MS) often provides only a few fragment peaks, resulting in ambiguous interpretation.Typical and conventional methods to detect E. coli are cultivation of the organism in selective media and identification by their morphological, biochemical, and immunological characteristics.Because of ambiguous interpretation of the results [Won and Min, 2010] recommend long detection times from initiation to readout, and relatively low detection limits of the cultivating methods using selective media.To study epidermal UV absorption of leaves from chlorophyll fluorescence measurements, [Ounis et al, 2001] explain that fluorescence emission ratios (Blue/Red or Blue/Far-Red) present a limitation because they depend on two variables, which can vary independently, leading to ambiguous interpretation.

Anthropological measure
Different scores (intra class correlation or intern consistency) between interviewers result partially from different interpretation of the item and/or the explanation [Wassenberg et al, 2003].Examining social and work contexts, behavioral cues of flirting did not appear to be confined to flirting interaction [Keyton and Rhodes, 1997].Not least of all this has been because of the difficulty of defining ecological limits given a knowledge base that is usually imperfect and liable to ambiguous interpretation [Crean and Wisher, 2000].Analysis of the main elements of both regulations affirms that there is an ambiguous interpretation of the socalled Preservation Zone (Suelo de Conservacion) that represents a territory subject to preservation given its ecological value in terms of climate regulation, water recharge, forest communities, agricultural cultivation, and hilly landscape.This situation favors illegal land use occupations [Aguilar, 2008].[Wicklund, 1995] some cultures are unlikely to analyze others by reference to terms that stand for fixed behavior patterns, implying a more complex, but of course perhaps more ambiguous, interpretation and description of action.Nurses' professional role has traditionally included managing and coordinating patient care, and nursing education programs routinely stress nurses' care management role.[Cudney and VanTuyle, 2001] discovered that the scope of this role is ambiguous; interpretation differs among nurses.Ambiguity is contained into contracts between people.[Posner, 1998] mention that a court will refuse to use evidence of the parties' prior negotiations in order to interpret a written contract unless the writing is (1) incomplete, (2) ambiguous, (3) the product of fraud, mistake.The most reliable evidence of a food crisis in Africa is its rising food imports.To explain food crisis in Africa, [Jaeger, 1992] criticized principal data sources that are either unreliable, lead to ambiguous interpretation (as influence of migration policy), or are at odds with what is commonly reported.Certain legal clause can cause ambiguous undertanstanding of statutes by courts as described [Romero, 1994] about the Racketeer Influenced and Corrupt Organizations Act statute being a federal criminal statute.[Babrow et al, 1994] studied behaviors of smokers.Tests of the effects of smoking rates on self-reported behavior also buttress the claim that smoking behavior is unstable.They take attention about ambiguous interpretation of analysis of the survey.

Psychological measure
First envisioning machine design focused on epistemic fidelity ( of consistency between the physical representation of some phenomena and the expert's mental representation of this phenomena).However, because mapping physical and mental representations is an inherently ambiguous interpretation process, the users did not read representations as experts did [Dillenbourg, 1996].[Allegro, 1990] studied languistic form to understand therapeutic effect.He used inkblot as a test when he was confronted to problems of interpretation.[Charash and McKay, 2009] make a test with sixty participants in several groups were identified and engaged in a masked emotional stroop test, implicit memory task, and ambiguous interpretation task.Individuals with elevated contamination fear would show biases of attention or memory.Behavioral cues can be sources of confusion [Charlton, 2000], hence interpretation of a given cue becomes dependent upon inferences concerning intentions, dispositions and relationships.[Meichi, 2003] studied Soccer players.Players were encouraged to ask questions when they did not understand the content of the questionnaire and we were all the time there to answer the questions to avoid the confusion caused by ambiguous interpretation of terms.[Bressler et al, 2006] imagined a questionnaire examining categorization of others' sense of humor.After eliminating items deemed to have ambiguous interpretation, the final questionnaire contained 14 statements.Measuring educational development [Stevension and Evans, 1994] suffered about answers of students sucha as "i ask questions to check my results".[Kuckertz et al, 2012] talk about a measure of ambiguous interpretation to participants with body dysmorphic disorder (BDD), OCD, and healthy controls.The measure examines interpretation of ambiguous information in forms of anxiety.
According to [Soni, 2011] concept of happiness for a person depends of context.By giving determinate form to the relation with the other, as a relation of sympathy specifically, (and sentimentalism more generally) makes possible the ambiguous interpretation of happiness, as both an ineffable affect and a judgment based on the complexity and on the complexity and heterogeneity of a narrative situation.User bahavior with interface is not always clear.[Rodden et al, 2010] described for example, a rise in page views for a particular feature may occur because the feature is genuinely popular, or because a confusing interface leads users to get lost in it, clicking around to figure out how to escape.

Geological measure
In geoscientific disciplines.Interpretation difficulties occur especially if the data that have to be interpreted are of arbitrary dimension where, for instance are compared pairwise [Klose, 2006].A paleomagnetic test of the Patagonian Orocline shows an ambiguous interpretation of declination anomaly without paleohorizontal control [Rapalini, 2007].A correlated 2D crosssection can gives an ambiguous interpretation, which may be an archaeological body [El-Qady et al, 1999].[Liu and Liu, 2008] consider 2-D seismic data and low signal-to-noise ratio led to ambiguous interpretation.[Poudjom Djomani et al, 2003] explain that extent of terrane depends on ambiguous interpretation of magnetic anomalies.For instance the Birekte terrane is recognized entirely on geophysical grounds, as the basement is overlain by up to 10 km of Riphean.[Van Tuyll CI and Van de Wal, 2003] studying the Cenozoic era, in geological studies, mentioned the ambiguous interpretation of the global mean benthic oxygen isotope curve.Degtyarev et al, 2008] claims that the correlation of geological events in different structural-formational zones and leads to an ambiguous interpretation of the Early Paleozoic evolution of Northern Kazakhstan.[Metelkin et al, 2007] talk about ambiguity concerning the position of Siberia relative to the other cratons in the Late Neoproterozoic prevents from estimating the dynamics of formation of Late Precambrian oceanic basins.Outgoing longwave radiation is ambiguous according [Schnadt et al, 1998] in the frame of the composites for the recurring tropical cyclones, since these cyclones usually undergo a transition from a tropical storm to an extratropical cyclone in the vicinity of the east.Strata onlapping the fold limbs provide evidence for coeval sedimentation and contraction.In the cores of major anticlines, structural complexity and lack of seismic resolution make difficult interpretation [Cobbold et al, 2004] Medical measure Evaluation of markers has not been treated as a universally accepted criterion, which can occasionally lead to uncertainties due to variations of these parameters [Casas Pina, 1999].
Observations from the inter-rater experiment 2 were used to adjust and improve the tool.Only a few criteria were found to contribute to the heterogeneity of rater results by causing misunderstandings [Schneider et al, 2009].[Venturin et al, 2004] used published reports and recruited patients to build a common data structure in which to tabulate the information.For each patient, we added any new clinical sign that had not been included previously, thus obtaining a relational database with 103 fields.The presence of a specific sign was attributed only when it was explicitly reported and formalised in binary fashion (that is, present or not present).When a field could not be completed because of lack of information or an ambiguous interpretation, it was defined as null and was not counted.[Burek, 2005] noticed that free HCVAg (HCV core antigen) could enable the diagnosis of acute HCV infection.But some clinical situations present difficult interpretation of HBV and HCV markers because of "unusual" constellation.[Aubin and Humbert, 1995] The serologic evaluation of hepatitis B is difficult because of sometimes ambiguous interpretation of tests available.All the studies reviewed report that the diagnosis of internal hernia may be difficult because symptoms and signs may be very vague or masked by the body habitus of the obese patient, and physical examination [Iannelli et al, 2006].Interpretation of the case-control studies about myocardial infarction may be difficult because in a case-control study the relative risk cannot be calculated directly, the odds ratio is used as a surrogate when the disease is rare [Arora et al, 1999].The term hypotony has been used in many different contexts often leading to ambiguous interpretation of its clinical significance to visual function [Leen and Mills, 1999].[Xing et al, 2011] mentioned that raters introduce errors, generate ambiguous interpretation of structures, and make careless mistakes.Performance level assessment is an important aspect of interpreting reported structures.[Wassenberg-Severijnen et al, 2003] critized that different scores between interviewers resulted partially from ambiguous interpretation of the item and/or the explanation.

Economical measure
Comparison such as cost-income ratios can be not correlated with most of the other measures, this suggests it is an unreliable indicator of competition and inefficiency as a consequence of its ambiguous interpretation [Ozdincer and Ozyildirim, 2011].The proportion of sample loans that are recorded as being secured with collateral is a characteristic to compare borrowers across countries.The interpretation of collateral as a risk variable is especially ambiguous [Smith, 2003].[Starczak and Jakubiec, 2003] assert that some firms appreciate the significance of unambiguous documentation and define the details concerning the measurement strategy.But the ambiguous interpretation of design requirements still too often exists in practice.In a test, Subjects were asked about their price-quality beliefs for twentyeight product categories.However, after pre-test, items for the threshold and price-level variables were dropped due to ambiguous interpretation by subjects [Smith and Natesan, 1999].[Holcombe, 1992] points out ambiguous interpretation of the general welfare in the US Constitution.[Durand, 2007] suggests correcting the prices of inputs in produced units for quality changes, thereby concealing both product and process innovations into the measure of inputs; If measure of multifactor productivity would incorporate product innovations into the measure of inputs but leave process innovations this would give the productivity residual an ambiguous interpretation.[Stanley, 2000] notifies that some researchers admit economical models may correctly capture underlying economic relations at some point in time, but that these relations are themselves sensitive to sensitive to policy changes.Interpretation of models with data over time could be unstable.The capital to assets ratio reflects on the one hand, regulatory costs which banks try to shift onto customers, and on the other purports to measure credit risk.The resulting positive relationship between this ratio and the interest margin lends itself to ambiguous interpretation.Take, for example, two banks with the same capital/assets ratio [Gischer and Juttner, 2001].[Methanuntakul, 2010] mention barriers for high-street fashion brands to build customer value and differentiate the core values of their brands from competitors because of imbalanced strategic communication implementation particularly in the encoding process, and ambiguous interpretation of target audience behaviour as a key disseminator of brand messages.[Lonkani et al, 2012] mention that communication leads to misunderstanding of effets.The effect of announcements on stock prices has a problem of distortion and ambiguous interpretation.Distortion of stock price when an announcement is made may due to a discretionary process of interpreting relevant information.

Structuration ambiguities
Statistics [Edwards, 1994] [Huang et al, 1991][Cacciola, et al, 2003] [Thompson and Geyer, 2007] [Le Duy et al, 2011] [Garcia et al, 1998] [Hanges et al, 2005] [McClelland et al, 2000] [Mair, 2007] [Pons, 2006] computer science [Tedeschi, 2006] [Niehaus and Terry,1993] [Jukic and Vrbsky, 1997] [Meyeriu-Delius, 2009] [Shapiro et al, 1993] [Faconti et al, 2000] [Lucas et al, 2009][Stojanovic, 2005][Elfe et al, 1998][Vanackere, 2001] Linguistics [Gal et al, 2005][Rosen, 1991] [Tungsteth, 2003] [Obrębski and Stolarski, 2006] [Mayberry and Miikulainen, 1994] [Hagen, 2002] [Chung, 1998][Andrews et al, 2011][Bouma et Hopp, 2006] [Fu et al, 2000] Computation Sometimes statistical assumptions are rarely satisfied, the null hypothesis tests give ambiguous results depending on the scatter of the data [Tedeschi, 2006].Niehaus and Terry (1993) find the regression coefficients of lagged surplus variables on premiums have opposite signs for one and two periods.Sometimes data (tuples in databases) needs comparison with others tuples to precise their meanings [Jukic andVrbsky, 1997] [Lucas et al, 2009].Situation Recognition for Vehicular Traffic Scenarios, dealing with temporal contexts induces imprecises interpretations [Meyer-Delius, 2009].[Shapiro et al, 1993] points out that IDEF (Icam DEFinition for Function Modeling) may be irrelevant.Arrows may join.A join represents fan-in or merging.The relatively unrestricted branch and join structure of arrows combined with their ambiguous interpretation lead to the major obstacle in using IDEF to describe the behavior of a system.[Faconti et al, 2000] specified inter-sensory interaction to avoid ambiguous interpretation of scenes within virtual environments.Because a query of keywords is unprecise [Stojanovic, 2005] precise that the user has to do an additional processing of the list of results in order to find some useful results.Interactions in telephony, (by extension in collborative work), calls involving multiple callers and features are susceptible to certain types of interaction inducing ambiguous signal interpretation or mistaken roles of callers.The reason for these types of interaction are the various and different contexts created by the activation of each new feature and by the inclusion of each new caller into the call [Elfe et al, 1998].[Vanackere, 2001] imagined an ambiguity-adaptive logic for the creation of new collective theories.At an early stage of the construction of a theory, the domain specific terms are unavoidably vague or ambiguous.Still, the creation of the theory will never take of, if the scholars (who belong to one group) do not assume that all of them use the terms they use in a common way.

Statistics
Validity of differences scores confounds the effets of their component measures, and failure to explain variance beyond their component measures [Edwards, 1994].Confusion in endogenous switching regression model specifications can cause problems of interpretation [Huang et al, 1991].Inspection of the vibrational response of a beam with an edge nonpropagating crack by means of stochastic analysis lead to uncertainty.When the breach is small, this is revealed by numerical applications which seems unable to give any information on the position of the crack [Cacciola, 2003].[Thompson and Geyer, 2007] underlines that conventional p-values have ambiguous interpretation unless they are extreme.Two major risk areas with any and all stochastic estimating processes are identified as the unreliability of the estimates, and the ambiguous interpretation [Pons, 2006].Striving for nonredundant predictor variables or using orthogonal contrasts greatly reduces the need for larger sample sizes to achieve adequate statistical power.The reduced redundancy also allows a less, but still, ambiguous interpretation of a variable's effect [McClelland et al, 2000].[Mair, 2007] explain that in models considered as the family of nonstandard log-linear models it can arise an ambiguous interpretation of parameters.[Hanges et al, 2005] describes ambiguity about intra-class correlation coefficients (ICCs) as an index of homogeneity is precisely because ICCs can increase by either increases in within-group homogeneity or by increases in betweengroup differences.It was found that the observation of ST deviations in the standard EGG may lead to ambiguous interpretation and that limiting observation to ST-T patterns alone instead of including QRS changes further hampers correct identification of the occluded vessel [Garcia et al, 1998].According [Le Duy et al, 2011] in PRA for NNP there is still an ambiguous interpretation of the point estimate which is used to represent statistic quantity of log-normal distribution.

linguistic ambiguities
Diversity of concepts describing the meaning of data in data sources (for example, database schemata, extensible markup language [XML] document-type definitions [DTDs]) is commonly known as semantic heterogeneity a well-known obstacle to da-ta source integration [Gal et al,205].In languages, for instance for Catalan, in the heavy construction, clauses verbs are may denote independent motion or not [Rosen, 1991].In Norvegian, since instrumentals can either precede or follow directional prepositional phrases, but must precede locative prepositional phrases (PP), an instrumental PP preceding the i-PP (locative preposition i) should result in an ambiguous interpretation of the sentence [Tungsteth, 2003].Ambiguous annotation may result from ambiguous segmentation [Obrębski and Stolarski, 2006].[Mayberry and Miikulainen, 1994] have noticed that frequency-based mechanism alone is insufficient to explain all of lexical disambiguation.Rather, it suggests how disambiguation might occur at its most basic, subconscious level alluded to in the introduction.This process should be distinguished from what can be called pragmatic disambiguation, which requires higher-level inferencing.[Hagen, 2002] shows that pattern descriptions are unprecise and therefore ambiguous.She preconized to adopt a standard notation to specify patterns.[Chung, 1998] noticed than in Korean, in general, the greater the likelihood of ambiguous interpretation, the more difficult it is to switch the word order of two NPs.[Andrews et al, 2011] point out polysemic issue of annotation tagging.For instance, the tag "Java" may be used to describe a resource about the Java island or a resource about the Java programming language ; thus, users looking for resources related to the programming language may also get some irrelevant resources related to the Island (therefore, reducing the precision); [Bouma and Hopp]describe ambiguous problem in anaphoric interpretation.The pronoun sie can be ambiguous; interpretation is readily available.In other words, preferences for either interpretation are not categorical; rather, they reflect tendencies potentially based on factors like grammatical function or linear order.[Fu et al, 2000] describes Chinese lexical ambiguities as (1) word segmentation ambiguity, (2) part-of-speech ambiguity and (3) pronunciation ambiguity (viz.the problem of polyphonic words).

Discussion
In this chapter, we follow the idea of the first chapter to draw up where ambiguity and source of confusion is contained in data.Lots of sciences, producing information and data may induce an expert in a confusing position to offer a precise interpretation.We cited for each science a pool of studies which is only representative, but not exhaustive, of occuring problems.For ambiguities in texts the two effects of misinterpretation and erroneous results can play an important role.

Conclusion
In this paper we present two sources of contradictions occurring in text data.The first one is purely syntactic related to the writer's intention in its article.The second one is related factual data, obtained from experiments, and leading to elementary basis of interpretation.Goal of the paper influence how an article is written.When intention of the writer is ethically related to scientific concern of truth, amùbiguity relied only on difficulties to obtain unambiguous fact data from experiments entached by noise or lack of up-to-date cataegories to help interpretation.When intention of a writer is motivated by its carreer and reputation improvement, data is not central playing role but only rethorical discourse of the writer, leading to improper relations but explained in a same way as real facts.
In this study we tried to point out source of uncertainty to interpret relashionship in data.We did not propose a controlled process to substract noise from data; leaving out bad intention of a writer or uncertainty of data leading to contradictory interpretation.It should be a serious issue to make a corpus cleaner for concept and name entity extraction, and their relationships.
Figure 3 shows clustering drawing with range [2-5], figure 4 with range [2-3], figure 5 with range [2-20] and figure 6 with range [2-2].Choosing the range [2-9], the equipartition series gives 8 ranges.Observing visualization for different ranges (figures 3, 4, 5 and 6) we argue that density of clusters evolve closely regards to size of frequency range and number of ranges in contexts associated (table6).We observe also that density of high-frequent words clustered together, is different than low-frequent words clustered together.It seems that more frequent words reduce density of low-frequent ones in terms of class.From studies about argumentation scheme linking lexical items such as verbs, connector and noun or adjectives from the corpus BD we get a list of useful verbs for technical argumentation in scientific discourse.This set consists of 291 verbs and 705 different tokens (gerondif, past…).In the figures points associated to one of the verb list is colored in red.Several tens of verbal forms belong to clustered area as well as for corpus PM and corpus BD.It means that some verbs are deliberately useful to argumentation to this or that technical context.But some red points can be seen near dense areas.It means clearly a polysemy of verbs playing role in differents contexts.

Table 1 .
morpho-syntactic schemes to extract antonyms in English.

Table 2 .
morpho-syntactic schemes to extract antonyms in French.

Table 3 .
Part of text from corpus BD in a raw form (left) and stemmed form (right).Table1show statistics about words count over all the used corpora.

Table 1 .
Distribution over corpora of words count, lemmatized word count, words occurring three times , words occurring two times, words occurring more than one time.Grey line represent the reference corpus about biodiversity.

Table 3 .
Number of frequent stemmed and unstemmed words with frequency greater than 5% #documents .

Table 4
. Items sets of very-frequent words with threshold S f =50% of documentsdu corpus (S f =2400).At left Corpus BD; at right Corpus PM.Set of crude terms are in italic.

Table 5
). Focusing on two classes (versicolor and verginica) only one feature makes a fine-grained discriminant classification (Petal.width); for sure, usage of a value mean with features is not able to capture this difference.Hence the algorithm presented above can only discriminate two classes as seen on figure 2.

Table 5 .
mean values about Iris dataset for each class.

Table 6 .
Number of frequency ranges depending on the context size of the first two ones (upper table, Corpus PM; bottom table, corpus BD).