2024


1. ArchEthno - a new tool for sharing research materials and a new method for archiving your own research

Florence Weber ; Carlo Zwölf ; Arnaud Trouche ; Agnès Tricoche ; José Sastre.
The archiving of ethnographic material is generally considered a blind spot in ethnographic working methods which place more importance on actual investigations and analysis than on how archives are constructed. A team of computer scientists and ethnographers has built an initial tool for sharing ethnographic materials, based on an SQL relational data model that suited the first survey processed but proved difficult to transpose to other surveys. The team developed a new tool based on dynamic vocabularies of concepts which breaks down archiving into three stages. Firstly ethnographers can select and contextualise their survey materials; secondly they structure them in a database according to the research question discovered during their survey; finally, they share this data with other researchers subject to the opinion of an ethics committee whose members are competent in ethnography.

2. Toward Automatic Typography Analysis: Serif Classification and Font Similarities

Syed Talal Wasim ; Romain Collaud ; Lara Défayes ; Nicolas Henchoz ; Mathieu Salzmann ; Delphine Ribes Lemay.
Whether a document is of historical or contemporary significance, typography plays a crucial role in its composition. From the early days of modern printing, typographic techniques have evolved and transformed, resulting in changes to the features of typography. By analyzing these features, we can gain insights into specific time periods, geographical locations, and messages conveyed through typography. Therefore, in this paper, we aim to investigate the feasibility of training a model to classify serif typeswithout knowledge of the font and character. We also investigate how to train a vectorial-based image model able to group together fonts with similar features. Specifically, we compare the use of state-of-theart image classification methods, such as the EfficientNet-B2 and the Vision Transformer Base model with different patch sizes, and the state-of-the-art fine-grained image classification method, TransFG, on the serif classification task. We also evaluate the use of the DeepSVG model to learn to group fonts with similar features. Our investigation reveals that fine-grained image classification methods are better suited for the serif classification tasks and that leveraging the character labels helps to learn more meaningful font similarities.This repository contains: - Paper published in the Journal of data mining and digital humanities:WasimEtAl_Toward_Automatic_Typography_Analysis__Serif_Classification_and_Font_Similarities.pdf - Two datasets: The first […]

3. Incorporating Crowdsourced Annotator Distributions into Ensemble Modeling to Improve Classification Trustworthiness for Ancient Greek Papyri

Graham West ; Matthew I. Swindall ; Ben Keener ; Timothy Player ; Alex C. Williams ; James H. Brusuelas ; John F. Wallin.
Performing classification on noisy, crowdsourced image datasets can provechallenging even for the best neural networks. Two issues which complicate theproblem on such datasets are class imbalance and ground-truth uncertainty inlabeling. The AL-ALL and AL-PUB datasets - consisting of tightly cropped,individual characters from images of ancient Greek papyri - are stronglyaffected by both issues. The application of ensemble modeling to such datasetscan help identify images where the ground-truth is questionable and quantifythe trustworthiness of those samples. As such, we apply stacked generalizationconsisting of nearly identical ResNets with different loss functions: oneutilizing sparse cross-entropy (CXE) and the other Kullback-Liebler Divergence(KLD). Both networks use labels drawn from a crowd-sourced consensus. Thisconsensus is derived from a Normalized Distribution of Annotations (NDA) basedon all annotations for a given character in the dataset. For the secondnetwork, the KLD is calculated with respect to the NDA. For our ensemble model,we apply a k-nearest neighbors model to the outputs of the CXE and KLDnetworks. Individually, the ResNet models have approximately 93% accuracy,while the ensemble model achieves an accuracy of > 95%, increasing theclassification trustworthiness. We also perform an analysis of the Shannonentropy of the various models' output distributions to measure classificationuncertainty. Our results suggest that entropy is useful for predicting […]