Detecting Misleading Headlines Through the Automatic Recognition of Contradiction in Spanish

Please use this identifier to cite or link to this item: http://hdl.handle.net/10045/136242
Información del item - Informació de l'item - Item information
Title: Detecting Misleading Headlines Through the Automatic Recognition of Contradiction in Spanish
Authors: Sepúlveda-Torres, Robiert | Bonet-Jover, Alba | Saquete Boró, Estela
Research Group/s: Procesamiento del Lenguaje y Sistemas de Información (GPLSI)
Center, Department or Service: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos
Keywords: Annotation Guideline | Contradiction Detection | Dataset Annotation | Deep Learning Techniques | Disinformation Detection | Human Language Technologies | Natural Language Processing
Issue Date: 14-Jul-2023
Publisher: IEEE
Citation: IEEE Access. 2023, 11: 72007-72026. https://doi.org/10.1109/ACCESS.2023.3295781
Abstract: Misleading headlines are part of the disinformation problem. Headlines should give a concise summary of the news story helping the reader to decide whether to read the body text of the article, which is why headline accuracy is a crucial element of a news story. This work focuses on detecting misleading headlines through the automatic identification of contradiction between the headline and body text of a news item. When the contradiction is detected, the reader is alerted to the lack of precision or trustworthiness of the headline in relation to the body text. To facilitate the automatic detection of misleading headlines, a new Spanish dataset is created (ES_Headline_Contradiction) for the purpose of identifying contradictory information between a headline and its body text. This dataset annotates the semantic relationship between headlines and body text by categorising the relation between texts as compatible , contradictory and unrelated . Furthermore, another novel aspect of this dataset is that it distinguishes between different types of contradictions, thereby enabling a more fine-grain identification of them. The dataset was built via a novel semi-automatic methodology, which resulted in a more cost-efficient development process. The results of the experiments show that pre-trained language models can be fine-tuned with this dataset, producing very encouraging results for detecting incongruency or non-relation between headline and body text.
Sponsor: This research work is funded by MCIN/AEI/ 10.13039/501100011033 and, as appropriate, by “ERDF A way of making Europe”, by the “European Union” or by the “European Union NextGenerationEU/PRTR” through the project TRIVIAL: Technological Resources for Intelligent VIral AnaLysis through NLP (PID2021-122263OB-C22) and the project SOCIALTRUST: Assessing trustworthiness in digital media (PDC2022-133146-C22). Also funded by Generalitat Valenciana through the project NL4DISMIS: Natural Language Technologies for dealing with dis- and misinformation (CIPROM/2021/21), and the grant ACIF/2020/177.
URI: http://hdl.handle.net/10045/136242
ISSN: 2169-3536
DOI: 10.1109/ACCESS.2023.3295781
Language: eng
Type: info:eu-repo/semantics/article
Rights: This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
Peer Review: si
Publisher version: https://doi.org/10.1109/ACCESS.2023.3295781
Appears in Collections:INV - GPLSI - Artículos de Revistas

Files in This Item:
Files in This Item:
File Description SizeFormat 
ThumbnailSepulveda-Torres_etal_2023_IEEEAccess.pdf1,24 MBAdobe PDFOpen Preview


This item is licensed under a Creative Commons License Creative Commons