2018

1. Computer Analysis of Architecture Using Automatic Image Understanding

Fan Wei ; Yuan Li ; Lior Shamir.

In the past few years, computer vision and pattern recognition systems have been becoming increasingly more powerful, expanding the range of automatic tasks enabled by machine vision. Here we show that computer analysis of building images can perform quantitative analysis of architecture, and quantify similarities between city architectural styles in a quantitative fashion. Images of buildings from 18 cities and three countries were acquired using Google StreetView, and were used to train a machine vision system to automatically identify the location of the imaged building based on the image visual content. Experimental results show that the automatic computer analysis can automatically identify the geographical location of the StreetView image. More importantly, the algorithm was able to group the cities and countries and provide a phylogeny of the similarities between architectural styles as captured by StreetView images. These results demonstrate that computer vision and pattern recognition algorithms can perform the complex cognitive task of analyzing images of buildings, and can be used to measure and quantify visual similarities and differences between different styles of architectures. This experiment provides a new paradigm for studying architecture, based on a quantitative approach that can enhance the traditional manual observation and analysis. The source code used for the analysis is open and publicly available.

2. How the Taiwanese Do China Studies: Applications of Text Mining

Hsuan-Lei Shao ; Sieh-Chuen Huang ; Yun-Cheng Tsai.

With the rapid evolution of cross-strait situation, "Mainland China" as a subject of social science study has evoked the voice of "Rethinking China Study" among intelligentsia recently. This essay tried to apply an automatic content analysis tool (CATAR) to the journal "Mainland China Studies" (1998-2015) in order to observe the research trends based on the clustering of text from the title and abstract of each paper in the journal. The results showed that the 473 articles published by the journal were clustered into seven salient topics. From the publication number of each topic over time (including "volume of publications", "percentage of publications"), there are two major topics of this journal while other topics varied over time widely. The contribution of this study includes: 1. We could group each "independent" study into a meaningful topic, as a small scale experiment verified that this topic clustering is feasible. 2. This essay reveals the salient research topics and their trends for the Taiwan journal "Mainland China Studies". 3. Various topical keywords were identified, providing easy access to the past study. 4. The yearly trends of the identified topics could be viewed as signature of future research directions.

3. Algorithme de planification intelligent Round Robin pour le Cloud Computing et Big Data

Hicham Gibet Tani ; Chaker El Amrani.

Cloud Computing et Big Data sont les prochains modèles informatiques. Ces paradigmes révolutionnaires conduisent l'informatique à un nouveau jeu de règles qui vise à changer la livraison des ressources informatiques et le modèle d'exploitation, créant ainsi un monde d'affaires nouveau qui croît de façon exponentielle et attire de plus en plus d'investissements des fournisseurs et des utilisateurs finaux qui attendent Amener profit de ces modèles innovants de l'informatique. Dans le même contexte, les chercheurs combattent pour développer, tester et optimiser les plates-formes Cloud Computing et Big Data, alors que plusieurs études sont en cours pour déterminer et améliorer les aspects essentiels de ces modèles informatiques, en particulier l'allocation des ressources. La planification de la puissance de traitement est cruciale quand il s'agit de Cloud Computing et Big Data en raison de la gestion de la croissance des données et la conception de livraison proposée par ces nouveaux modèles informatiques, qui nécessite des réponses plus rapides des plates-formes et des applications. D'où l'origine de l'importance de développer des algorithmes d'ordonnancement efficaces qui sont conformes à ces plates-formes de modèles informatiques et aux exigences d'infrastructure.

4. Cursive Arabic Handwriting Recognition System Without Explicit Segmentation Based on Hidden Markov Models

Mouhcine Rabi ; Mustapha Amrouch ; Zouhair Mahani.

In this paper we present a system for offline recognition cursive Arabic handwritten text which is analytical without explicit segmentation based on Hidden Markov Models (HMMs). Extraction features preceded by baseline estimation are statistical and geometric to integrate both the peculiarities of the text and the pixel distribution characteristics in the word image. These features are modelled using hidden Markov models. The HMM-based classifiercontains a training module and a recognition module. The training module estimates the parameters of each of the character HMMs uses the Baum-Welchalgorithm. In the recognition phase, feature vectors extracted from an image are passed to a network of word lexicon entries formed of character models. The character sequence providing the maximumlikelihood identifies the recognized entry. If required, the recognition can generate N best output hypotheses rather than just the single best one. To determine the best output hypotheses, the Viterbi algorithm is used.The experiments on images of the benchmark IFN/ENIT database show that the proposed system improves recognition.

5. A Secured Data Processing Technique for Effective Utilization of Cloud Computing

Mbarek Marwan ; Ali Kartit ; Hassan Ouahmane.

Digital humanities require IT Infrastructure and sophisticated analytical tools, including datavisualization, data mining, statistics, text mining and information retrieval. Regarding funding, tobuild a local data center will necessitate substantial investments. Fortunately, there is another optionthat will help researchers take advantage of these IT services to access, use and share informationeasily. Cloud services ideally offer on-demand software and resources over the Internet to read andanalyze ancient documents. More interestingly, billing system is completely flexible and based onresource usage and Quality of Service (QoS) level. In spite of its multiple advantages, outsourcingcomputations to an external provider arises several challenges. Specifically, security is the majorfactor hindering the widespread acceptance of this new concept. As a case study, we review the use ofcloud computing to process digital images safely. Recently, various solutions have been suggested tosecure data processing in cloud environement. Though, ensuring privacy and high performance needsmore improvements to protect the organization's most sensitive data. To this end, we propose aframework based on segmentation and watermarking techniques to ensure data privacy. In this respect,segementation algorithm is used to to protect client's data against untauhorized access, whilewatermarking method determines and maintains ownership. Consequentely, this framework willincrease the speed of […]

6. A Secured Data Processing Technique for Effective Utilization of Cloud Computing

Mbarek Marwan ; Ali Kartit ; Hassan Ouahmane.

Digital humanities require IT Infrastructure and sophisticated analytical tools, including datavisualization, data mining, statistics, text mining and information retrieval. Regarding funding, tobuild a local data center will necessitate substantial investments. Fortunately, there is another optionthat will help researchers take advantage of these IT services to access, use and share informationeasily. Cloud services ideally offer on-demand software and resources over the Internet to read andanalyze ancient documents. More interestingly, billing system is completely flexible and based onresource usage and Quality of Service (QoS) level. In spite of its multiple advantages, outsourcingcomputations to an external provider arises several challenges. Specifically, security is the majorfactor hindering the widespread acceptance of this new concept. As a case study, we review the use ofcloud computing to process digital images safely. Recently, various solutions have been suggested tosecure data processing in cloud environement. Though, ensuring privacy and high performance needsmore improvements to protect the organization's most sensitive data. To this end, we propose aframework based on segmentation and watermarking techniques to ensure data privacy. In this respect,segementation algorithm is used to to protect client's data against untauhorized access, whilewatermarking method determines and maintains ownership. Consequentely, this framework willincrease the speed of […]

7. Applying ontologies to data integration systems for bank credit risk management

Jalil Elhassouni ; Mehdi Bazzi ; Abderrahim Qadi ; Mohamed Haziti.

This paper proposes an ontological integration model for credit risk management. It is based on three ontologies; one is global describing credit risk management process and two other locals, the first, describes the credit granting process, and the second presents the concepts necessary for the monitoring of credit system. This paper also presents the technique used for matching between global ontology and local ontologies.

8. Text Alignment in Ancient Greek and Georgian: A Case-Study on the First Homily of Gregory of Nazianzus

Tamara Pataridze ; Bastien Kindt.

This paper discusses the word level alignment of lemmatised bitext consisting of the Oratio I of Gregory of Nazianzus in its Greek model and Georgian translation. This study shows how the direct and empirical observations offered by an aligned text enable an accurate analysis of techniques of translation and many philological parameters of the text.

9. Recurrent Pattern Modelling in a Corpus of Armenian Manuscript Colophons

Emmanuel van Elverdinghe.

Colophons of Armenian manuscripts are replete with yet untapped riches. Formulae are not the least among them: these recurrent stereotypical patterns conceal many clues as to the schools and networks of production and diffusion of books in Armenian communities. This paper proposes a methodology for exploiting these sources, as elaborated in the framework of a PhD research project about Armenian colophon formulae. Firstly, the reader is briefly introduced to the corpus of Armenian colophons and then, to the purposes of our project. In the third place, we describe our methodology, relying on lemmatization and modelling of patterns into automata. Fourthly and finally, the whole process is illustrated by a basic case study, the occasion of which is taken to outline the kind of results that can be achieved by combining this methodology with a philologico-historical approach to colophons.

Rubrique : Vers un écosystème numérique : NLP. Infrastructure de corpus. Méthodes de récupération des textes et de calcul des similarités de textes

10. Processing Tools for Greek and Other Languages of the Christian Middle East

Bastien Kindt.

This paper presents some computer tools and linguistic resources of the GREgORI project. These developments allow automated processing of texts written in the main languages of the Christian Middel East, such as Greek, Arabic, Syriac, Armenian and Georgian. The main goal is to provide scholars with tools (lemmatized indexes and concordances) making corpus-based linguistic information available. It focuses on the questions of text processing, lemmatization, information retrieval, and bitext alignment.

Rubrique : Présentations de projets

11. Identification of Parallel Passages Across a Large Hebrew/Aramaic Corpus

Avi Shmidman ; Moshe Koppel ; Ely Porat.

We propose a method for efficiently finding all parallel passages in a large corpus, even if the passages are not quite identical due to rephrasing and orthographic variation. The key ideas are the representation of each word in the corpus by its two most infrequent letters, finding matched pairs of strings of four or five words that differ by at most one word and then identifying clusters of such matched pairs. Using this method, over 4600 parallel pairs of passages were identified in the Babylonian Talmud, a Hebrew-Aramaic corpus of over 1.8 million words, in just over 30 seconds. Empirical comparisons on sample data indicate that the coverage obtained by our method is essentially the same as that obtained using slow exhaustive methods.

Rubrique : Vers un écosystème numérique : NLP. Infrastructure de corpus. Méthodes de récupération des textes et de calcul des similarités de textes

12. Visualizing linguistic variation in a network of Latin documents and scribes

Timo Korkiakangas ; Matti Lassila.

This article explores whether and how network visualization can benefit philological and historical-linguistic study. This is illustrated with a corpus-based investigation of scribes' language use in a lemmatized and morphologically annotated corpus of documentary Latin (Late Latin Charter Treebank, LLCT2). We extract four continuous linguistic variables from LLCT2 and utilize a gradient colour palette in Gephi to visualize the variable values as node attributes in a trimodal network which consists of the documents, writers, and writing locations underlying the same corpus. We call this network the "LLCT2 network". The geographical coordinates of the location nodes form an approximate map, which allows for drawing geographical conclusions. The linguistic variables are examined both separately and as a sum variable, and the visualizations presented as static images and as interactive Sigma.js visualizations. The variables represent different domains of language competence of scribes who learnt written Latin practically as a second-language. The results show that the network visualization of linguistic features helps in observing patterns which support linguistic-philological argumentation and which risk passing unnoticed with traditional methods. However, the approach is subject to the same limitations as all visualization techniques: the human eye can only perceive a certain, relatively small amount of information at a time.

Rubrique : Visualisation de l'intertextualité et de la réutilisation des textes

13. Prosopographical data analysis. Application to the Angevin officers (XIII–XV centuries)

Anne Tchounikine ; Maryvonne Miquel ; Thierry Pécout ; Jean-Luc Bonnaud.

The EUROPANGE project, involving both medievalists and computer scientists, aims to study the emergence of a corps of administrators in the Angevin controlled territories in the XIII–XV centuries. Our project attempts to analyze the officers' careers, shared relation networks and strategies based on the study of individual biographies. In this paper, we describe methods and tools designed to analyze these prosopographical data. These include OLAP analyzes and network analyzes associated with cartographic and chronological visualization tools.