Applying Human-in-the-Loop to construct a dataset for determining content reliability to combat fake news

Bonet-Jover, Alba; Sepúlveda-Torres, Robiert; Saquete Boró, Estela; Martínez-Barco, Patricio; Piad-Morffis, Alejandro; Estévez-Velarde, Suilan

Applying Human-in-the-Loop to construct a dataset for determining content reliability to combat fake news

Please use this identifier to cite or link to this item: http://hdl.handle.net/10045/137336

Información del item - Informació de l'item - Item information
Title:	Applying Human-in-the-Loop to construct a dataset for determining content reliability to combat fake news
Authors:	Bonet-Jover, Alba \| Sepúlveda-Torres, Robiert \| Saquete Boró, Estela \| Martínez-Barco, Patricio \| Piad-Morffis, Alejandro \| Estévez-Velarde, Suilan
Research Group/s:	Procesamiento del Lenguaje y Sistemas de Información (GPLSI)
Center, Department or Service:	Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos \| Universidad de Alicante. Instituto Universitario de Investigación Informática
Keywords:	Natural language processing \| Fake news detection \| Assisted annotation \| Dataset construction \| Human-in-the-Loop Artificial Intelligence \| Active learning
Issue Date:	20-Sep-2023
Publisher:	Elsevier
Citation:	Engineering Applications of Artificial Intelligence. 2023, 126(Part D): 107152. https://doi.org/10.1016/j.engappai.2023.107152
Abstract:	Annotated corpora are indispensable tools to train computational models in Natural Language Processing. However, in the case of more complex semantic annotation processes, it is a costly, arduous, and time-consuming task, resulting in a shortage of resources to train Machine Learning and Deep Learning algorithms. In consideration, this work proposes a methodology, based on the human-in-the-loop paradigm, for semi-automatic annotation of complex tasks. This methodology is applied in the construction of a reliability dataset of Spanish news so as to combat disinformation and fake news. We obtain a high quality resource by implementing the proposed methodology for semi-automatic annotation, increasing annotator efficacy and speed, with fewer examples. The methodology consists of three incremental phases and results in the construction of the RUN dataset. The annotation quality of the resource was evaluated through time-reduction (annotation time reduction of almost 64% with respect to the fully manual annotation), annotation quality (measuring consistency of annotation and inter-annotator agreement), and performance by training a model with RUN semi-automatic dataset (Accuracy 95% F1 95%), validating the suitability of the proposal.
Sponsor:	This research work is funded by MCIN/AEI/10.13039/501100011033 and, as appropriate, by “ERDF A way of making Europe”, by the “European Union” or by the “European Union NextGenerationEU/PRTR” through the project TRIVIAL: Technological Resources for Intelligent VIral AnaLysis through NLP (PID2021-122263OB-C22) and the project SOCIALTRUST: Assessing trustworthiness in digital media (PDC2022-133146-C22). It is also funded by Generalitat Valenciana, Spain through the project NL4DISMIS: Natural Language Technologies for dealing with dis- and misinformation (CIPROM/2021/21), and the grant ACIF/2020/177.
URI:	http://hdl.handle.net/10045/137336
ISSN:	0952-1976 (Print) \| 1873-6769 (Online)
DOI:	10.1016/j.engappai.2023.107152
Language:	eng
Type:	info:eu-repo/semantics/article
Rights:	© 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer Review:	si
Publisher version:	https://doi.org/10.1016/j.engappai.2023.107152
Appears in Collections:	INV - GPLSI - Artículos de Revistas

Files in This Item:

Files in This Item:
File	Description	Size	Format
Bonet-Jover_etal_2023_EngApplArtifIntellig.pdf		2,08 MB	Adobe PDF	Open Preview Close preview

See citations in Google Scholar

Show full item record

This item is licensed under a Creative Commons License