Sumiko Teng - Ambiguity in Crisis: A Multimodal and Synthetic Data Approach to Classification

jdmdh:16262 - Journal of Data Mining & Digital Humanities, 14 octobre 2025, NLP4DH - https://doi.org/10.46298/jdmdh.16262
Ambiguity in Crisis: A Multimodal and Synthetic Data Approach to ClassificationArticle

Auteurs : Teng, Sumiko 1

Social media platforms, such as Twitter (now X), play a crucial role during crises by enabling real-time information sharing. However, the multimodal data can be ambiguous with misalignment of labels cross-modality. Being able to classify informative and not informative tweets can help in crisis response, yet they can be ambiguous and unbalanced in datasets, impairing model performance. This study explores the effectiveness of multimodal learning approaches for classifying crisis-related tweets regardless of ambiguity and addressing class imbalance through synthetic data augmentation using generative artificial intelligence (AI). Experimental results demonstrate that multimodal models consistently outperform unimodal ones, particularly on ambiguous tweets where label misalignment between modalities is prevalent. Furthermore, the addition of synthetic data significantly boosts macro F1 scores, indicating improved performance on the minority class.


Volume : NLP4DH
Publié le : 14 octobre 2025
Accepté le : 30 août 2025
Soumis le : 31 juillet 2025

Fichiers

Nom Taille
nlp4dh_journal.pdf
md5 : a3bd8a7f7d895727e6112640d5abadaa
1.24 MB