Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

Raphaël Barman; Maud Ehrmann; Simon Clematide; Sofia Ares Oliveira; Frédéric Kaplan

doi:10.46298/jdmdh.6107

Raphaël Barman ; Maud Ehrmann ; Simon Clematide ; Sofia Ares Oliveira ; Frédéric Kaplan - Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

jdmdh:6107 - Journal of Data Mining & Digital Humanities, 19 janvier 2021, HistoInformatique - https://doi.org/10.46298/jdmdh.6107

Combining Visual and Textual Features for Semantic Segmentation of Historical NewspapersArticle

Auteurs : Raphaël Barman ; Maud Ehrmann ; Simon Clematide ; Sofia Ares Oliveira ; Frédéric Kaplan

The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration.
Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of finer-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce a multimodal approach for the semantic segmentation of historical newspapers that combines visual and textual features. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate, among others, the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to high material variance.

https://doi.org/10.46298/jdmdh.6107

Source : arXiv.org:2002.06144

Volume : HistoInformatique

Rubrique : HistoInformatique

Publié le : 19 janvier 2021

Accepté le : 3 juillet 2020

Soumis le : 17 février 2020

Mots-clés : Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language, Computer Science - Information Retrieval, Computer Science - Machine Learning

Licence : Attribution - Partage dans les Mêmes Conditions 4.0 International (CC BY-SA 4.0)

Financement :

Source : OpenAIRE Graph

Media Monitoring of the Past; Financeur: Swiss National Science Foundation; Code: 173719

Datasets

Référence

Ridge, M., Colavizza, G., Brake, L., Ehrmann, M., Moreux, J.-P., & Prescott, A. (2019). The Past, Present and Future of Digital Scholarship with Newspaper Collections (Versions 2.0) [Dataset]. DataverseNL. 10.34894/6G9YB8 ¹

Barman, R., Ehrmann, M., Clematide, S., & . (2021). Datasets and Models for Historical Newspaper Article Segmentation (Version 0.1) [Dataset]. Zenodo. 10.5281/ZENODO.3706862 ¹

Barman, R., Ehrmann, M., Clematide, S., & . (2021). Datasets and Models for Historical Newspaper Article Segmentation (Version 0.1) [Dataset]. Zenodo. 10.5281/ZENODO.3706863 ¹

1 ScholeXplorer

Références bibliographiques

29 Documents citant cet article

Faeze Zakaryapour Sayyad;Irida Shallari;Seyed Jalaleddin Mousavirad;Mattias O’Nils;Faisal Z. Qureshi, 2025, AdVision: An efficient and effective deep learning based advertisement detector for printed media, Machine Learning with Applications, 21, pp. 100686, 10.1016/j.mlwa.2025.100686, https://doi.org/10.1016/j.mlwa.2025.100686.

Luis-Gil Moreno-Jimenez;Victoria Eyharabide, 2025, A model-based on embeddings and contextual analysis for relationship recognition in noisy historical writings, 2025 International Conference on Advanced Machine Learning and Data Science (AMLDS), pp. 169-174, 10.1109/amlds63918.2025.11159402.

Prasidh Srikumar;Ajoy Mondal;C. V. Jawahar, 2025, UniLayDet: Simple Multi-dataset Document Layout Analysis, Lecture notes in computer science, pp. 40-57, 10.1007/978-3-032-04614-7_3.

Wenjun Sun;Nancy Girdhar;Hanh Thi Hong Tran;Carlos-Emiliano González-Gallardo;Mickaël Coustaty;Antoine Doucet, 2025, Ar-Q-Former: Historical Newspaper Article Separation Based on Multimodal Transformer Structure, dCOBISS.SI Digital Repository, pp. 476-492, 10.1007/978-3-032-04624-6_28, https://link.springer.com/chapter/10.1007/978-3-032-04624-6_28.

Yi Liu;Leen-Kiat Soh;Elizabeth Lorang, 2025, Integrating Textual-Based and Visual-Based Features in Poem Detection for Digitized Historical Newspaper Document Images, Journal on Computing and Cultural Heritage, 18, 3, pp. 1-28, 10.1145/3746404.

Atul Kumar;Gurpreet Singh Lehal, 2024, Faster CNN-Based Layout Analysis of Punjabi Newspapers Using the Custom Dataset, Smart innovation, systems and technologies, pp. 123-137, 10.1007/978-981-99-7711-6_11.

Devang Kumar Bharti;Shikhar Bhandari;Shikhar Shukla;Apeksha Koul, 2024, Conversational Interface for Textual and Visual Data Interpretation using customized CNN, 2024 International Conference on Computing and Intelligent Reality Technologies (ICCIRT), pp. 1-5, 10.1109/iccirt59484.2024.10922067.

Emad Sami Jaha, 2024, Comparative Semantic Document Layout Analysis for Enhanced Document Image Retrieval, IEEE Access, 12, pp. 150451-150467, 10.1109/access.2024.3479990, https://doi.org/10.1109/access.2024.3479990.

Faeze Zakaryapour Sayyad;Irida Shallari;Seyed Jalaleddin Mousavirad;Mattias O’Nils, 2024, Model Evaluation and Selection for Robust and Efficient Advertisement Detection in Print Media, Communications in computer and information science, pp. 211-224, 10.1007/978-3-031-70906-7_18.

Nancy Girdhar;Deepak Sharma;Mickaël Coustaty;Antoine Doucet, 2024, Leveraging Transfer Learning for Article Segmentation in Historical Newspapers, SPIRE - Sciences Po Institutional REpository, pp. 222-238, 10.1007/978-3-031-72437-4_13, https://hal.science/hal-04676213.

Nancy Girdhar;Mickaël Coustaty;Antoine Doucet, 2024, Digitizing History: Transitioning Historical Paper Documents to Digital Content for Information Retrieval and Mining—A Comprehensive Survey, SPIRE - Sciences Po Institutional REpository, 11, 5, pp. 6151-6180, 10.1109/tcss.2024.3378419, https://hal.science/hal-04676753.

Rachele Franceschini;Ascanio Rosi;Filippo Catani;Nicola Casagli, 2024, Detecting information from Twitter on landslide hazards in Italy using deep learning models, Geoenvironmental Disasters, 11, 1, 10.1186/s40677-024-00279-4, https://doi.org/10.1186/s40677-024-00279-4.

Zongcheng Yue;Chun-Yan Lo;Ran Wu;Longyu Ma;Chiu-Wing Sham, 2024, Urban Aquatic Scene Expansion for Semantic Segmentation in Cityscapes, Urban Science, 8, 2, pp. 23, 10.3390/urbansci8020023, https://doi.org/10.3390/urbansci8020023.

Cheng Da;Chuwei Luo;Qi Zheng;Cong Yao, 2023, Vision Grid Transformer for Document Layout Analysis, 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 19405-19415, 10.1109/iccv51070.2023.01783.

G. Bharatha Sreeja;T. M. Inbamalar;S. Kalaivani;T. D. Subha;Chettiyar Vani Vivekanand;et al., 2023, A Study of COVID-19 and Its Detection Methods Using Imaging Techniques, Lecture notes in electrical engineering, pp. 9-16, 10.1007/978-981-19-9748-8_2.

Weilong Zhang;Chongyang Zhang;Zhihan Ning;Guopeng Wang;Yingjie Bai;et al., 2023, M2SH: A Hybrid Approach to Table Structure Recognition using Two-Stage Multi-Modality Feature Fusion, 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 1, pp. 791-798, 10.1109/smc53992.2023.10394093.

Yangchun Li;Wei Jiang;Shouyou Song, 2023, Review of Semi-Structured Document Information Extraction Techniques Based on Deep Learning, 2023 2nd International Conference on Machine Learning, Cloud Computing and Intelligent Mining (MLCCIM), pp. 112-119, 10.1109/mlccim60412.2023.00022.

Enrique Mas-Candela;Antonio Ríos-Vila;Jorge Calvo-Zaragoza, 2022, A First Approach to Image Transformation Sequence Retrieval, Lecture notes in computer science, pp. 321-332, 10.1007/978-3-031-04881-4_26.

Hyuntae Kim;Jongyun Choi;Soyoung Park;Yuchul Jung, 2022, Layout Aware Semantic Element Extraction for Sustainable Science & Technology Decision Support, Sustainability, 14, 5, pp. 2802, 10.3390/su14052802, https://doi.org/10.3390/su14052802.

Mohamed Abdel-Basset;Reda Mohamed;Mohamed Abouhawwash, 2022, A new fusion of whale optimizer algorithm with Kapur’s entropy for multi-threshold image segmentation: analysis and validations, Artificial Intelligence Review, 55, 8, pp. 6389-6459, 10.1007/s10462-022-10157-w, https://doi.org/10.1007/s10462-022-10157-w.

Gustavo Candela;Rafael C. Carrasco, 2021, Discovering emerging topics in textual corpora of galleries, libraries, archives, and museums institutions, Journal of the Association for Information Science and Technology, 73, 6, pp. 820-833, 10.1002/asi.24583, https://doi.org/10.1002/asi.24583.

Melodie Boillet;Martin Maarand;Thierry Paquet;Christopher Kermorvant, 2021, Including Keyword Position in Image-based Models for Act Segmentation of Historical Registers, arXiv (Cornell University), pp. 31-36, 10.1145/3476887.3476905, http://arxiv.org/abs/2109.08477.

Mohamed Abdel-Basset;Reda Mohamed;Mohamed Abouhawwash, 2021, Hybrid marine predators algorithm for image segmentation: analysis and validations, Artificial Intelligence Review, 55, 4, pp. 3315-3367, 10.1007/s10462-021-10086-0.

Mohamed Kerroumi;Othmane Sayem;Aymen Shabou, 2021, VisualWordGrid: Information Extraction from Scanned Documents Using a Multimodal Approach, arXiv (Cornell University), pp. 389-402, 10.1007/978-3-030-86159-9_28, http://arxiv.org/abs/2010.02358.

Peng Zhang;Can Li;Liang Qiao;Zhanzhan Cheng;Shiliang Pu;et al., 2021, VSR: A Unified Framework for Document Layout Analysis Combining Vision, Semantics and Relations, Lecture notes in computer science, pp. 115-130, 10.1007/978-3-030-86549-8_8.

S. P. Sharan;Sowmya Aitha;Amandeep Kumar;Abhishek Trivedi;Aaron Augustine;Ravi Kiran Sarvadevabhatla, 2021, Palmira: A Deep Deformable Network for Instance Segmentation of Dense and Uneven Layouts in Handwritten Manuscripts, Lecture notes in computer science, pp. 477-491, 10.1007/978-3-030-86331-9_31.

Sowmya Aitha;Sindhu Bollampalli;Ravi Kiran Sarvadevabhatla, 2021, Deformable deep networks for instance segmentation of overlapping multi page handwritten documents, Proceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing, pp. 1-9, 10.1145/3490035.3490278.

Weihong Lin;Qifang Gao;Lei Sun;Zhuoyao Zhong;Kai Hu;et al., 2021, ViBERTgrid: A Jointly Trained Multi-modal 2D Document Representation for Key Information Extraction from Documents, Lecture notes in computer science, pp. 548-563, 10.1007/978-3-030-86549-8_35.

Rupinder Pal Kaur;M. K. Jindal;Munish Kumar, 2020, Text and graphics segmentation of newspapers printed in Gurmukhi script: a hybrid approach, The Visual Computer, 37, 7, pp. 1637-1659, 10.1007/s00371-020-01927-0.

Sources : OpenCitations, OpenAlex & Crossref

Partager et exporter

Statistiques de consultation

Cette page a été consultée 4781 fois.

Le PDF de cet article a été téléchargé 2119 fois.