Social media and social networks have already woven themselves into the very
fabric of everyday life. This results in a dramatic increase of social data
capturing various relations between the users and their associated artifacts,
both in online networks and the real world using ubiquitous devices. In this
work, we consider social interaction networks from a data mining perspective -
also with a special focus on real-world face-to-face contact networks: We
combine data mining and social network analysis techniques for examining the
networks in order to improve our understanding of the data, the modeled
behavior, and its underlying emergent processes. Furthermore, we adapt, extend
and apply known predictive data mining algorithms on social interaction
networks. Additionally, we present novel methods for descriptive data mining
for uncovering and extracting relations and patterns for hypothesis generation
and exploration, in order to provide characteristic information about the data
and networks. The presented approaches and methods aim at extracting valuable
knowledge for enhancing the understanding of the respective data, and for
supporting the users of the respective systems. We consider data from several
social systems, like the social bookmarking system BibSonomy, the social
resource sharing system flickr, and ubiquitous social systems: Specifically, we
focus on data from the social conference guidance system Conferator and the
social group interaction system MyGroup. This […]
The project presents the strategy adopted by the Rough Cilicia Archaeological
Survey team for publishing its primary data and reports via three potentially
transformative strategies for digital humanities: Loose coupling of digital
data curation and publishing platforms. In loosely coupled systems, components
share only a limited set of simple assumptions, which enables systems to evolve
dynamically. Collaborative creation of map based narrative content. Connecting
print scholarship (book, reports, article) to online resources via
two-dimensional barcodes (2D codes) that can be printed on paper and can call
up hyperlinks when scanned with a Smartphone. The three strategies are made
possible by loosely coupling two autonomous services: Visible Past, dedicated
to web collaboration and digital-print publishing and Open Context, which is a
geo-historical data archiving and publishing service. The Rough Cilicia
Archaeological Survey, Visible Past, and Open Context work together to
illustrate a new genre of scholarship, which combine qualitative narratives and
quantitative representations of space and social phenomena. The project
provides tools for collaborative creation of rich scholarly narratives that are
spatially located and for connecting print publications to the digital realm.
The project is a case study for utilizing the three new strategies for creating
and publishing spatial humanities scholarship more broadly for ancient
historians.
Text data is often seen as "take-away" materials with little noise and easy
to process information. Main questions are how to get data and transform them
into a good document format. But data can be sensitive to noise oftenly called
ambiguities. Ambiguities are aware from a long time, mainly because polysemy is
obvious in language and context is required to remove uncertainty. I claim in
this paper that syntactic context is not suffisant to improve interpretation.
In this paper I try to explain that firstly noise can come from natural data
themselves, even involving high technology, secondly texts, seen as verified
but meaningless, can spoil content of a corpus; it may lead to contradictions
and background noise.
In this research we suggest that working on a journalistic corpus with specific softwares can help studying linguistic patterns and choices which are made on the basis of political affiliation or gender stereotypes. The software SEMY for instance gives semantic profiles semi-automatically, ANTCONC gives useful KWIC abstracts and TERMOSTAT works on discourse specificities. Using all of these tools we found convergent striking asymmetries between female and male candidates in journalistic discourse (however conditionally) as far as our corpus dedicated to the 2007 and the 2012 presidential campaigns are concerned. Social gender' (i.e. stereotypical expectations about who will be a typical member of a given category) and / or political favoritism affect the representation of leadership in discourse and may affect in turn the readership, hence the electorate.
This study analyzed references and source papers of the Proceedings of
2009-2012 International Conference of Digital Archives and Digital Humanities
(DADH), which was held annually in Taiwan. A total of 59 sources and 1,104
references were investigated, based on descriptive analysis and subject
analysis of library practices on cataloguing. Preliminary results showed
historical materials, events, bureaucracies, and people of Taiwan and China in
the Qing Dynasty were the major subjects in the tempo-spatial dimensions. The
subject-date figure depicted a long-low head and short-high tail curve, which
demonstrated both characteristics of research of humanities and application of
technology in digital humanities. The dates of publication of the references
spanned over 360 years, which shows a long time span in research materials. A
majority of the papers (61.41%) were single-authored, which is in line with the
common research practice in the humanities. Books published by general
publishers were the major type of references, and this was the same as that of
established humanities research. The next step of this study will focus on the
comparison of characteristics of both sources and references of international
journals with those reported in this article.
Social network has gained remarkable attention in the last decade. Accessing
social network sites such as Twitter, Facebook LinkedIn and Google+ through the
internet and the web 2.0 technologies has become more affordable. People are
becoming more interested in and relying on social network for information, news
and opinion of other users on diverse subject matters. The heavy reliance on
social network sites causes them to generate massive data characterised by
three computational issues namely; size, noise and dynamism. These issues often
make social network data very complex to analyse manually, resulting in the
pertinent use of computational means of analysing them. Data mining provides a
wide range of techniques for detecting useful knowledge from massive datasets
like trends, patterns and rules [44]. Data mining techniques are used for
information retrieval, statistical modelling and machine learning. These
techniques employ data pre-processing, data analysis, and data interpretation
processes in the course of data analysis. This survey discusses different data
mining techniques used in mining diverse aspects of the social network over
decades going from the historical techniques to the up-to-date models,
including our novel technique named TRCM. All the techniques covered in this
survey are listed in the Table.1 including the tools employed as well as names
of their authors.