1. Data Mining on Social Interaction Networks

Martin Atzmueller.
Social media and social networks have already woven themselves into the veryfabric of everyday life. This results in a dramatic increase of social datacapturing various relations between the users and their associated artifacts,both in online networks and the real world using ubiquitous devices. In thiswork, we consider social interaction networks from a data mining perspective -also with a special focus on real-world face-to-face contact networks: Wecombine data mining and social network analysis techniques for examining thenetworks in order to improve our understanding of the data, the modeledbehavior, and its underlying emergent processes. Furthermore, we adapt, extendand apply known predictive data mining algorithms on social interactionnetworks. Additionally, we present novel methods for descriptive data miningfor uncovering and extracting relations and patterns for hypothesis generationand exploration, in order to provide characteristic information about the dataand networks. The presented approaches and methods aim at extracting valuableknowledge for enhancing the understanding of the respective data, and forsupporting the users of the respective systems. We consider data from severalsocial systems, like the social bookmarking system BibSonomy, the socialresource sharing system flickr, and ubiquitous social systems: Specifically, wefocus on data from the social conference guidance system Conferator and thesocial group interaction system MyGroup. This work first gives a […]

2. A New Approach to Reporting Archaeological Surveys: Connecting Rough Cilicia, Visible Past and Open Context through loose coupling and 3d codes

Sorin Adam Matei ; Nicholas K. Rauh ; Eric Kansa.
The project presents the strategy adopted by the Rough Cilicia ArchaeologicalSurvey team for publishing its primary data and reports via three potentiallytransformative strategies for digital humanities: Loose coupling of digitaldata curation and publishing platforms. In loosely coupled systems, componentsshare only a limited set of simple assumptions, which enables systems to evolvedynamically. Collaborative creation of map based narrative content. Connectingprint scholarship (book, reports, article) to online resources viatwo-dimensional barcodes (2D codes) that can be printed on paper and can callup hyperlinks when scanned with a Smartphone. The three strategies are madepossible by loosely coupling two autonomous services: Visible Past, dedicatedto web collaboration and digital-print publishing and Open Context, which is ageo-historical data archiving and publishing service. The Rough CiliciaArchaeological Survey, Visible Past, and Open Context work together toillustrate a new genre of scholarship, which combine qualitative narratives andquantitative representations of space and social phenomena. The projectprovides tools for collaborative creation of rich scholarly narratives that arespatially located and for connecting print publications to the digital realm.The project is a case study for utilizing the three new strategies for creatingand publishing spatial humanities scholarship more broadly for ancienthistorians.

3. Clustering and Relational Ambiguity: from Text Data to Natural Data

Nicolas Turenne.
Text data is often seen as "take-away" materials with little noise and easyto process information. Main questions are how to get data and transform theminto a good document format. But data can be sensitive to noise oftenly calledambiguities. Ambiguities are aware from a long time, mainly because polysemy isobvious in language and context is required to remove uncertainty. I claim inthis paper that syntactic context is not suffisant to improve interpretation.In this paper I try to explain that firstly noise can come from natural datathemselves, even involving high technology, secondly texts, seen as verifiedbut meaningless, can spoil content of a corpus; it may lead to contradictionsand background noise.


Fabienne Baider.
In this research we suggest that working on a journalistic corpus with specific softwares can help studying linguistic patterns and choices which are made on the basis of political affiliation or gender stereotypes. The software SEMY for instance gives semantic profiles semi-automatically, ANTCONC gives useful KWIC abstracts and TERMOSTAT works on discourse specificities. Using all of these tools we found convergent striking asymmetries between female and male candidates in journalistic discourse (however conditionally) as far as our corpus dedicated to the 2007 and the 2012 presidential campaigns are concerned. Social gender' (i.e. stereotypical expectations about who will be a typical member of a given category) and / or political favoritism affect the representation of leadership in discourse and may affect in turn the readership, hence the electorate.

5. Exploring Regional Development of Digital Humanities Research: A Case Study for Taiwan

Kuang-hua Chen ; Bi-Shin Hsueh.
This study analyzed references and source papers of the Proceedings of2009-2012 International Conference of Digital Archives and Digital Humanities(DADH), which was held annually in Taiwan. A total of 59 sources and 1,104references were investigated, based on descriptive analysis and subjectanalysis of library practices on cataloguing. Preliminary results showedhistorical materials, events, bureaucracies, and people of Taiwan and China inthe Qing Dynasty were the major subjects in the tempo-spatial dimensions. Thesubject-date figure depicted a long-low head and short-high tail curve, whichdemonstrated both characteristics of research of humanities and application oftechnology in digital humanities. The dates of publication of the referencesspanned over 360 years, which shows a long time span in research materials. Amajority of the papers (61.41%) were single-authored, which is in line with thecommon research practice in the humanities. Books published by generalpublishers were the major type of references, and this was the same as that ofestablished humanities research. The next step of this study will focus on thecomparison of characteristics of both sources and references of internationaljournals with those reported in this article.

6. A Survey of Data Mining Techniques for Social Media Analysis

Mariam Adedoyin-Olowe ; Mohamed Medhat Gaber ; Frederic Stahl.
Social network has gained remarkable attention in the last decade. Accessingsocial network sites such as Twitter, Facebook LinkedIn and Google+ through theinternet and the web 2.0 technologies has become more affordable. People arebecoming more interested in and relying on social network for information, newsand opinion of other users on diverse subject matters. The heavy reliance onsocial network sites causes them to generate massive data characterised bythree computational issues namely; size, noise and dynamism. These issues oftenmake social network data very complex to analyse manually, resulting in thepertinent use of computational means of analysing them. Data mining provides awide range of techniques for detecting useful knowledge from massive datasetslike trends, patterns and rules [44]. Data mining techniques are used forinformation retrieval, statistical modelling and machine learning. Thesetechniques employ data pre-processing, data analysis, and data interpretationprocesses in the course of data analysis. This survey discusses different datamining techniques used in mining diverse aspects of the social network overdecades going from the historical techniques to the up-to-date models,including our novel technique named TRCM. All the techniques covered in thissurvey are listed in the Table.1 including the tools employed as well as namesof their authors.