2014

1. Data Mining on Social Interaction Networks

Atzmueller, Martin.

Social media and social networks have already woven themselves into the very fabric of everyday life. This results in a dramatic increase of social data capturing various relations between the users and their associated artifacts, both in online networks and the real world using ubiquitous devices. In this work, we consider social interaction networks from a data mining perspective - also with a special focus on real-world face-to-face contact networks: We combine data mining and social network analysis techniques for examining the networks in order to improve our understanding of the data, the modeled behavior, and its underlying emergent processes. Furthermore, we adapt, extend and apply known predictive data mining algorithms on social interaction networks. Additionally, we present novel methods for descriptive data mining for uncovering and extracting relations and patterns for hypothesis generation and exploration, in order to provide characteristic information about the data and networks. The presented approaches and methods aim at extracting valuable knowledge for enhancing the understanding of the respective data, and for supporting the users of the respective systems. We consider data from several social systems, like the social bookmarking system BibSonomy, the social resource sharing system flickr, and ubiquitous social systems: Specifically, we focus on data from the social conference guidance system Conferator and the social group interaction system MyGroup. This […]

2. A New Approach to Reporting Archaeological Surveys: Connecting Rough Cilicia, Visible Past and Open Context through loose coupling and 3d codes

Matei, Sorin Adam ; Rauh, Nicholas K. ; Kansa, Eric.

The project presents the strategy adopted by the Rough Cilicia Archaeological Survey team for publishing its primary data and reports via three potentially transformative strategies for digital humanities: Loose coupling of digital data curation and publishing platforms. In loosely coupled systems, components share only a limited set of simple assumptions, which enables systems to evolve dynamically. Collaborative creation of map based narrative content. Connecting print scholarship (book, reports, article) to online resources via two-dimensional barcodes (2D codes) that can be printed on paper and can call up hyperlinks when scanned with a Smartphone. The three strategies are made possible by loosely coupling two autonomous services: Visible Past, dedicated to web collaboration and digital-print publishing and Open Context, which is a geo-historical data archiving and publishing service. The Rough Cilicia Archaeological Survey, Visible Past, and Open Context work together to illustrate a new genre of scholarship, which combine qualitative narratives and quantitative representations of space and social phenomena. The project provides tools for collaborative creation of rich scholarly narratives that are spatially located and for connecting print publications to the digital realm. The project is a case study for utilizing the three new strategies for creating and publishing spatial humanities scholarship more broadly for ancient historians.

3. Clustering and Relational Ambiguity: from Text Data to Natural Data

Turenne, Nicolas.

Text data is often seen as "take-away" materials with little noise and easy to process information. Main questions are how to get data and transform them into a good document format. But data can be sensitive to noise oftenly called ambiguities. Ambiguities are aware from a long time, mainly because polysemy is obvious in language and context is required to remove uncertainty. I claim in this paper that syntactic context is not suffisant to improve interpretation. In this paper I try to explain that firstly noise can come from natural data themselves, even involving high technology, secondly texts, seen as verified but meaningless, can spoil content of a corpus; it may lead to contradictions and background noise.

4. ANALYSING JOURNALISTIC DISCOURSE AND FINDING OPINIONS SEMI-AUTOMATICALLY?: A CASE STUDY OF THE 2007 AND 2012 PRESIDENTIAL FRENCH CAMPAIGNS

Baider, Fabienne.

This research study tested three different NLP technologies to analyze representative journalistic discourse used in the 2007 and 2012 presidential campaigns in France. The analysis focused on the discourse in relation to the candidate's gender and/ or political party. Our findings suggest that using specific software to examine a journalistic corpus can reveal linguistic patterns and choices made on the basis of political affiliation and/or gender stereotypes. These conclusions are drawn from quantitative and qualitative analysis carried out with three different software programs: SEMY, which semi-automatically provides semantic profiles; ANTCONC, which provides useful Keywords in Context (KWIC) or abstracts of texts, as well as collocations; TERMOSTAT, which reveals discourse specificities, frequencies and the most common morpho-syntactic patterns. Analysis of our data point to convergent asymmetries between female and male candidates in journalistic discourse (however conditionally) for the 2007 and the 2012 French presidential campaigns. We conclude that social gender (i.e., stereotypical expectations of who will be a typical member of a given category) and / or political favoritism may affect the representation of leadership in discourse, which, in turn, may influence the readership, hence the electorate. Thus the study recommends the use of corpus linguistic tools for the semi-automatic investigation of political texts.

5. Exploring Regional Development of Digital Humanities Research: A Case Study for Taiwan

Chen, Kuang-hua ; Hsueh, Bi-Shin.

This study analyzed references and source papers of the Proceedings of 2009-2012 International Conference of Digital Archives and Digital Humanities (DADH), which was held annually in Taiwan. A total of 59 sources and 1,104 references were investigated, based on descriptive analysis and subject analysis of library practices on cataloguing. Preliminary results showed historical materials, events, bureaucracies, and people of Taiwan and China in the Qing Dynasty were the major subjects in the tempo-spatial dimensions. The subject-date figure depicted a long-low head and short-high tail curve, which demonstrated both characteristics of research of humanities and application of technology in digital humanities. The dates of publication of the references spanned over 360 years, which shows a long time span in research materials. A majority of the papers (61.41%) were single-authored, which is in line with the common research practice in the humanities. Books published by general publishers were the major type of references, and this was the same as that of established humanities research. The next step of this study will focus on the comparison of characteristics of both sources and references of international journals with those reported in this article.

6. A Survey of Data Mining Techniques for Social Media Analysis

Adedoyin-Olowe, Mariam ; Gaber, Mohamed Medhat ; Stahl, Frederic.

Social network has gained remarkable attention in the last decade. Accessing social network sites such as Twitter, Facebook LinkedIn and Google+ through the internet and the web 2.0 technologies has become more affordable. People are becoming more interested in and relying on social network for information, news and opinion of other users on diverse subject matters. The heavy reliance on social network sites causes them to generate massive data characterised by three computational issues namely; size, noise and dynamism. These issues often make social network data very complex to analyse manually, resulting in the pertinent use of computational means of analysing them. Data mining provides a wide range of techniques for detecting useful knowledge from massive datasets like trends, patterns and rules [44]. Data mining techniques are used for information retrieval, statistical modelling and machine learning. These techniques employ data pre-processing, data analysis, and data interpretation processes in the course of data analysis. This survey discusses different data mining techniques used in mining diverse aspects of the social network over decades going from the historical techniques to the up-to-date models, including our novel technique named TRCM. All the techniques covered in this survey are listed in the Table.1 including the tools employed as well as names of their authors.