1. Clustering and Relational Ambiguity: from Text Data to Natural Data

Turenne, Nicolas.
Text data is often seen as "take-away" materials with little noise and easy to process information. Main questions are how to get data and transform them into a good document format. But data can be sensitive to noise oftenly called ambiguities. Ambiguities are aware from a long time, mainly because polysemy is obvious in language and context is required to remove uncertainty. I claim in this paper that syntactic context is not suffisant to improve interpretation. In this paper I try to explain that firstly noise can come from natural data themselves, even involving high technology, secondly texts, seen as verified but meaningless, can spoil content of a corpus; it may lead to contradictions and background noise.

2. A Survey of Data Mining Techniques for Social Media Analysis

Adedoyin-Olowe, Mariam ; Gaber, Mohamed Medhat ; Stahl, Frederic.
Social network has gained remarkable attention in the last decade. Accessing social network sites such as Twitter, Facebook LinkedIn and Google+ through the internet and the web 2.0 technologies has become more affordable. People are becoming more interested in and relying on social network for information, news and opinion of other users on diverse subject matters. The heavy reliance on social network sites causes them to generate massive data characterised by three computational issues namely; size, noise and dynamism. These issues often make social network data very complex to analyse manually, resulting in the pertinent use of computational means of analysing them. Data mining provides a wide range of techniques for detecting useful knowledge from massive datasets like trends, patterns and rules [44]. Data mining techniques are used for information retrieval, statistical modelling and machine learning. These techniques employ data pre-processing, data analysis, and data […]

3. Exploring Regional Development of Digital Humanities Research: A Case Study for Taiwan

Chen, Kuang-hua ; Hsueh, Bi-Shin.
This study analyzed references and source papers of the Proceedings of 2009-2012 International Conference of Digital Archives and Digital Humanities (DADH), which was held annually in Taiwan. A total of 59 sources and 1,104 references were investigated, based on descriptive analysis and subject analysis of library practices on cataloguing. Preliminary results showed historical materials, events, bureaucracies, and people of Taiwan and China in the Qing Dynasty were the major subjects in the tempo-spatial dimensions. The subject-date figure depicted a long-low head and short-high tail curve, which demonstrated both characteristics of research of humanities and application of technology in digital humanities. The dates of publication of the references spanned over 360 years, which shows a long time span in research materials. A majority of the papers (61.41%) were single-authored, which is in line with the common research practice in the humanities. Books published by general publishers […]


Baider, Fabienne.
In this research we suggest that working on a journalistic corpus with specific softwares can help studying linguistic patterns and choices which are made on the basis of political affiliation or gender stereotypes. The software SEMY for instance gives semantic profiles semi-automatically, ANTCONC gives useful KWIC abstracts and TERMOSTAT works on discourse specificities. Using all of these tools we found convergent striking asymmetries between female and male candidates in journalistic discourse (however conditionally) as far as our corpus dedicated to the 2007 and the 2012 presidential campaigns are concerned. Social gender' (i.e. stereotypical expectations about who will be a typical member of a given category) and / or political favoritism affect the representation of leadership in discourse and may affect in turn the readership, hence the electorate.

5. A New Approach to Reporting Archaeological Surveys: Connecting Rough Cilicia, Visible Past and Open Context through loose coupling and 3d codes

Matei, Sorin Adam ; Rauh, Nicholas K. ; Kansa, Eric.
The project presents the strategy adopted by the Rough Cilicia Archaeological Survey team for publishing its primary data and reports via three potentially transformative strategies for digital humanities: Loose coupling of digital data curation and publishing platforms. In loosely coupled systems, components share only a limited set of simple assumptions, which enables systems to evolve dynamically. Collaborative creation of map based narrative content. Connecting print scholarship (book, reports, article) to online resources via two-dimensional barcodes (2D codes) that can be printed on paper and can call up hyperlinks when scanned with a Smartphone. The three strategies are made possible by loosely coupling two autonomous services: Visible Past, dedicated to web collaboration and digital-print publishing and Open Context, which is a geo-historical data archiving and publishing service. The Rough Cilicia Archaeological Survey, Visible Past, and Open Context work together […]

6. Data Mining on Social Interaction Networks

Atzmueller, Martin.
Social media and social networks have already woven themselves into the very fabric of everyday life. This results in a dramatic increase of social data capturing various relations between the users and their associated artifacts, both in online networks and the real world using ubiquitous devices. In this work, we consider social interaction networks from a data mining perspective - also with a special focus on real-world face-to-face contact networks: We combine data mining and social network analysis techniques for examining the networks in order to improve our understanding of the data, the modeled behavior, and its underlying emergent processes. Furthermore, we adapt, extend and apply known predictive data mining algorithms on social interaction networks. Additionally, we present novel methods for descriptive data mining for uncovering and extracting relations and patterns for hypothesis generation and exploration, in order to provide characteristic information about the data and […]