Transformer-based models for multimodal irony detection

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/128661
Información del item - Informació de l'item - Item information
Título: Transformer-based models for multimodal irony detection
Autor/es: Tomás, David | Ortega-Bueno, Reynier | Zhang, Guobiao | Rosso, Paolo | Schifanella, Rossano
Grupo/s de investigación o GITE: Procesamiento del Lenguaje y Sistemas de Información (GPLSI)
Centro, Departamento o Servicio: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos
Palabras clave: Irony detection | Transformer | Multimodality | Image text fusion
Fecha de publicación: 19-oct-2022
Editor: Springer Nature
Cita bibliográfica: Journal of Ambient Intelligence and Humanized Computing. 2023, 14: 7399-7410. https://doi.org/10.1007/s12652-022-04447-y
Resumen: Irony is nowadays a pervasive phenomenon in social networks. The multimodal functionalities of these platforms (i.e., the possibility to attach audio, video, and images to textual information) are increasingly leading their users to employ combinations of information in different formats to express their ironic thoughts. The present work focuses on the study of irony detection in social media posts involving image and text. To this end, a transformer architecture for the fusion of textual and image information is proposed. The model leverages disentangled text attention with visual transformers, improving F1-score up to 9% over previous existing works in the field and current state-of-the-art visio-linguistic transformers. The proposed architecture was evaluated in three different multimodal datasets gathered from Twitter and Tumblr. The results revealed that, in many situations, the text-only version of the architecture was able to capture the ironic nature of the message without using visual information. This phenomenon was further analysed, leading to the identification of linguistic patterns that could provide the context necessary for irony detection without the need for additional visual information.
Patrocinador/es: Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This work was partially supported by the Spanish Ministry of Science and Innovation and Fondo Europeo de Desarrollo Regional (FEDER) in the framework of project “Technological Resources for Intelligent VIral AnaLysis through NLP (TRIVIAL)” (PID2021-122263OB-C22).
URI: http://hdl.handle.net/10045/128661
ISSN: 1868-5137 (Print) | 1868-5145 (Online)
DOI: 10.1007/s12652-022-04447-y
Idioma: eng
Tipo: info:eu-repo/semantics/article
Derechos: © The Author(s) 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Revisión científica: si
Versión del editor: https://doi.org/10.1007/s12652-022-04447-y
Aparece en las colecciones:INV - GPLSI - Artículos de Revistas

Archivos en este ítem:
Archivos en este ítem:
Archivo Descripción TamañoFormato 
ThumbnailTomas_etal_2023_JAmbientIntellHumanComput.pdf896,27 kBAdobe PDFAbrir Vista previa


Este ítem está licenciado bajo Licencia Creative Commons Creative Commons