1. Computer Analysis of Architecture Using Automatic Image Understanding

Fan Wei ; Yuan Li ; Lior Shamir.
In the past few years, computer vision and pattern recognition systems have been becoming increasingly more powerful, expanding the range of automatic tasks enabled by machine vision. Here we show that computer analysis of building images can perform quantitative analysis of architecture, and quantify similarities between city architectural styles in a quantitative fashion. Images of buildings from 18 cities and three countries were acquired using Google StreetView, and were used to train a machine vision system to automatically identify the location of the imaged building based on the image visual content. Experimental results show that the automatic computer analysis can automatically identify the geographical location of the StreetView image. More importantly, the algorithm was able to group the cities and countries and provide a phylogeny of the similarities between architectural styles as captured by StreetView images. These results demonstrate that computer vision and pattern recognition algorithms can perform the complex cognitive task of analyzing images of buildings, and can be used to measure and quantify visual similarities and differences between different styles of architectures. This experiment provides a new paradigm for studying architecture, based on a quantitative approach that can enhance the traditional manual observation and analysis. The source code used for the analysis is open and publicly available.

2. How the Taiwanese Do China Studies: Applications of Text Mining

Hsuan-Lei Shao ; Sieh-Chuen Huang ; Yun-Cheng Tsai.
With the rapid evolution of cross-strait situation, "Mainland China" as a subject of social science study has evoked the voice of "Rethinking China Study" among intelligentsia recently. This essay tried to apply an automatic content analysis tool (CATAR) to the journal "Mainland China Studies" (1998-2015) in order to observe the research trends based on the clustering of text from the title and abstract of each paper in the journal. The results showed that the 473 articles published by the journal were clustered into seven salient topics. From the publication number of each topic over time (including "volume of publications", "percentage of publications"), there are two major topics of this journal while other topics varied over time widely. The contribution of this study includes: 1. We could group each "independent" study into a meaningful topic, as a small scale experiment verified that this topic clustering is feasible. 2. This essay reveals the salient research topics and their trends for the Taiwan journal "Mainland China Studies". 3. Various topical keywords were identified, providing easy access to the past study. 4. The yearly trends of the identified topics could be viewed as signature of future research directions.

3. Smarter Round Robin Scheduling Algorithm for Cloud Computing and Big Data

Cloud Computing and Big Data are the upcoming Information Technology (IT) computing models. These groundbreaking paradigms are leading IT to a new set of rules that aims to change computing resources delivery and exploitation model, thus creating a novel business market that is exponentially growing and attracting more and more investments from both providers and end users that are looking forward to make profits from these innovative models of computing. In the same context, researchers and investigators are wrestling time in order to develop, test and optimize Cloud Computing and Big Data platforms, whereas several studies are ongoing to determine and enhance the essential aspects of these computing models especially compute resources allocation. The processing power scheduling is crucial when it comes to Cloud Computing and Big Data because of the data growth management and delivery design proposed by these new computing models, that requires faster responses from platforms and applications. Hence originates the importance of developing high efficient scheduling algorithms that are compliant with these computing models platforms and infrastructures requirement.

4. Cursive Arabic Handwriting Recognition System Without Explicit Segmentation Based on Hidden Markov Models

Mouhcine Rabi ; Mustapha Amrouch ; Zouhair Mahani.
In this paper we present a system for offline recognition cursive Arabic handwritten text which is analytical without explicit segmentation based on Hidden Markov Models (HMMs). Extraction features preceded by baseline estimation are statistical and geometric to integrate both the peculiarities of the text and the pixel distribution characteristics in the word image. These features are modelled using hidden Markov models. The HMM-based classifiercontains a training module and a recognition module. The training module estimates the parameters of each of the character HMMs uses the Baum-Welchalgorithm. In the recognition phase, feature vectors extracted from an image are passed to a network of word lexicon entries formed of character models. The character sequence providing the maximumlikelihood identifies the recognized entry. If required, the recognition can generate N best output hypotheses rather than just the single best one. To determine the best output hypotheses, the Viterbi algorithm is used.The experiments on images of the benchmark IFN/ENIT database show that the proposed system improves recognition.

5. A novel approach based on segmentation for securing medical image processing over cloud

Mbarek Marwan ; Ali Kartit ; Hassan Ouahmane.
Healthcare professionals require advanced image processing software to enhance the quality of clinical decisions. However, any investment in sophisticated local applications would dramatically increase healthcare costs. To address this issue, medical providers are interested in adopting cloud technology. In spite of its multiple advantages, outsourcing computations to an external provider arises several challenges. In fact, security is the major factor hindering the widespread acceptance of this new concept. Recently, various solutions have been suggested to fulfill healthcare demands. Though, ensuring privacy and high performance needs more improvements to meet the healthcare sector requirements. To this end, we propose a framework based on segmentation approach to secure cloud-based medical image processing in the healthcare system.

6. A Secured Data Processing Technique for Effective Utilization of Cloud Computing

Mbarek Marwan ; Ali Kartit ; Hassan Ouahmane.
Digital humanities require IT Infrastructure and sophisticated analytical tools, including datavisualization, data mining, statistics, text mining and information retrieval. Regarding funding, tobuild a local data center will necessitate substantial investments. Fortunately, there is another optionthat will help researchers take advantage of these IT services to access, use and share informationeasily. Cloud services ideally offer on-demand software and resources over the Internet to read andanalyze ancient documents. More interestingly, billing system is completely flexible and based onresource usage and Quality of Service (QoS) level. In spite of its multiple advantages, outsourcingcomputations to an external provider arises several challenges. Specifically, security is the majorfactor hindering the widespread acceptance of this new concept. As a case study, we review the use ofcloud computing to process digital images safely. Recently, various solutions have been suggested tosecure data processing in cloud environement. Though, ensuring privacy and high performance needsmore improvements to protect the organization's most sensitive data. To this end, we propose aframework based on segmentation and watermarking techniques to ensure data privacy. In this respect,segementation algorithm is used to to protect client's data against untauhorized access, whilewatermarking method determines and maintains ownership. Consequentely, this framework willincrease the speed of […]

7. Applying ontologies to data integration systems for bank credit risk management

Jalil Elhassouni ; Mehdi Bazzi ; Abderrahim Qadi ; Mohamed Haziti.
This paper proposes an ontological integration model for credit risk management. It is based on three ontologies; one is global describing credit risk management process and two other locals, the first, describes the credit granting process, and the second presents the concepts necessary for the monitoring of credit system. This paper also presents the technique used for matching between global ontology and local ontologies.

8. Text Alignment in Ancient Greek and Georgian: A Case-Study on the First Homily of Gregory of Nazianzus

Tamara Pataridze ; Bastien Kindt.
This paper discusses the word level alignment of lemmatised bitext consisting of the Oratio I of Gregory of Nazianzus in its Greek model and Georgian translation. This study shows how the direct and empirical observations offered by an aligned text enable an accurate analysis of techniques of translation and many philological parameters of the text.

9. Recurrent Pattern Modelling in a Corpus of Armenian Manuscript Colophons

Emmanuel Van Elverdinghe.
Colophons of Armenian manuscripts are replete with yet untapped riches. Formulae are not the least among them: these recurrent stereotypical patterns conceal many clues as to the schools and networks of production and diffusion of books in Armenian communities. This paper proposes a methodology for exploiting these sources, as elaborated in the framework of a PhD research project about Armenian colophon formulae. Firstly, the reader is briefly introduced to the corpus of Armenian colophons and then, to the purposes of our project. In the third place, we describe our methodology, relying on lemmatization and modelling of patterns into automata. Fourthly and finally, the whole process is illustrated by a basic case study, the occasion of which is taken to outline the kind of results that can be achieved by combining this methodology with a philologico-historical approach to colophons.
Section: Towards a Digital Ecosystem: NLP. Corpus infrastructure. Methods for Retrieving Texts and Computing Text Similarities

10. Processing Tools for Greek and Other Languages of the Christian Middle East

Bastien Kindt.
This paper presents some computer tools and linguistic resources of the GREgORI project. These developments allow automated processing of texts written in the main languages of the Christian Middel East, such as Greek, Arabic, Syriac, Armenian and Georgian. The main goal is to provide scholars with tools (lemmatized indexes and concordances) making corpus-based linguistic information available. It focuses on the questions of text processing, lemmatization, information retrieval, and bitext alignment.
Section: Project presentations

11. Identification of Parallel Passages Across a Large Hebrew/Aramaic Corpus

Avi Shmidman ; Moshe Koppel ; Ely Porat.
We propose a method for efficiently finding all parallel passages in a large corpus, even if the passages are not quite identical due to rephrasing and orthographic variation. The key ideas are the representation of each word in the corpus by its two most infrequent letters, finding matched pairs of strings of four or five words that differ by at most one word and then identifying clusters of such matched pairs. Using this method, over 4600 parallel pairs of passages were identified in the Babylonian Talmud, a Hebrew-Aramaic corpus of over 1.8 million words, in just over 30 seconds. Empirical comparisons on sample data indicate that the coverage obtained by our method is essentially the same as that obtained using slow exhaustive methods.
Section: Towards a Digital Ecosystem: NLP. Corpus infrastructure. Methods for Retrieving Texts and Computing Text Similarities

12. Visualizing linguistic variation in a network of Latin documents and scribes

Timo Korkiakangas ; Matti Lassila.
This article explores whether and how network visualization can benefit philological and historical-linguistic study. This is illustrated with a corpus-based investigation of scribes' language use in a lemmatized and morphologically annotated corpus of documentary Latin (Late Latin Charter Treebank, LLCT2). We extract four continuous linguistic variables from LLCT2 and utilize a gradient colour palette in Gephi to visualize the variable values as node attributes in a trimodal network which consists of the documents, writers, and writing locations underlying the same corpus. We call this network the "LLCT2 network". The geographical coordinates of the location nodes form an approximate map, which allows for drawing geographical conclusions. The linguistic variables are examined both separately and as a sum variable, and the visualizations presented as static images and as interactive Sigma.js visualizations. The variables represent different domains of language competence of scribes who learnt written Latin practically as a second-language. The results show that the network visualization of linguistic features helps in observing patterns which support linguistic-philological argumentation and which risk passing unnoticed with traditional methods. However, the approach is subject to the same limitations as all visualization techniques: the human eye can only perceive a certain, relatively small amount of information at a time.
Section: Visualisation of intertextuality and text reuse

13. Prosopographical data analysis. Application to the Angevin officers (XIII–XV centuries)

Anne Tchounikine ; Maryvonne Miquel ; Thierry Pécout ; Jean-Luc Bonnaud.
The EUROPANGE project, involving both medievalists and computer scientists, aims to study the emergence of a corps of administrators in the Angevin controlled territories in the XIII–XV centuries. Our project attempts to analyze the officers' careers, shared relation networks and strategies based on the study of individual biographies. In this paper, we describe methods and tools designed to analyze these prosopographical data. These include OLAP analyzes and network analyzes associated with cartographic and chronological visualization tools.