Analysing the Problem of Automatic Evaluation of Language Generation Systems

Martínez-Murillo, Iván; Moreda, Paloma; Lloret, Elena

Analysing the Problem of Automatic Evaluation of Language Generation Systems

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/142187

Información del item - Informació de l'item - Item information
Título:	Analysing the Problem of Automatic Evaluation of Language Generation Systems
Título alternativo:	Analizando el Problema de la Evaluación Automática de los Sistemas de Generación de Lenguaje
Autor/es:	Martínez-Murillo, Iván \| Moreda, Paloma \| Lloret, Elena
Grupo/s de investigación o GITE:	Procesamiento del Lenguaje y Sistemas de Información (GPLSI)
Centro, Departamento o Servicio:	Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos
Palabras clave:	Natural Language Generation \| Evaluation metrics \| NLG architectures \| Language models \| Generación de Lenguaje Natural \| Métricas de evaluación \| Arquitecturas de generación \| Modelos de lenguaje
Fecha de publicación:	mar-2024
Editor:	Sociedad Española para el Procesamiento del Lenguaje Natural
Cita bibliográfica:	Procesamiento del Lenguaje Natural. 2024, 72: 123-136. https://doi.org/10.26342/2024-72-9
Resumen:	Automatic text evaluation metrics are widely used to measure the performance of a Natural Language Generation (NLG) system. However, these metrics have several limitations. This article empirically analyses the problem with current evaluation metrics, such as their lack of ability to measure the semantic quality of a text or their high dependence on the texts they are compared against. Additionally, traditional NLG systems are compared against more recent systems based on neural networks. Finally, an experiment with GPT-4 is proposed to determine if it is a reliable source for evaluating the validity of a text. From the results obtained, it can be concluded that with the current automatic metrics, the improvement of neural systems compared to traditional ones is not so significant. On the other hand, if we analyse the qualitative aspects of the texts generated, this improvement is reflected. \| Las métricas automáticas de evaluación de texto se utilizan ampliamente para medir el rendimiento de un sistema de Generación de Lenguaje Natural (GLN). Sin embargo, estas métricas tienen varias limitaciones. Este artículo propone un estudio empírico donde se analiza el problema que tienen las métricas de evaluación actuales, como la falta capacidad que tienen estos sistemas de medir la calidad semántica de un texto, o la alta dependencia que tienen estas métricas sobre los textos contra los que se comparan. Además, se comparan sistemas de GLN tradicionales contra sistemas más actuales basados en redes neuronales. Finalmente, se propone una experimentación con GPT-4 para determinar si es una fuente fiable para evaluar la calidad de un texto. A partir de los resultados obtenidos, se puede concluir que con las métricas automáticas actuales la mejora de los sistemas neuronales frente a los tradicionales no es tan significativa. En cambio, si se analizan los aspectos cualitativos de los textos generados, sí que se refleja esa mejora.
Patrocinador/es:	The research work conducted is part of the R&D projects “CORTEX: Conscious Text Generation” (PID2021-123956OB-I00), funded by MCIN/AEI/10.13039/501100011033/ and by “ERDF A way of making Europe”; “CLEAR.TEXT:Enhancing the modernization public sector organizations by deploying Natural Language Processing to make their digital content CLEARER to those with cognitive disabilities” (TED2021-130707B-I00), funded by MCIN/AEI/10.13039/501100011033 and “European Union NextGenerationEU/PRTR”; and the project “NL4DISMIS: Natural Language Technologies for dealing with dis- and misinformation with grant reference (CIPROM/2021/21)” funded by the Generalitat Valenciana. Moreover, it has been also partially funded by the European Commission ICT COST Action “Multi-task, Multilingual, Multi-modal Language Generation” (CA18231).
URI:	http://hdl.handle.net/10045/142187
ISSN:	1135-5948
DOI:	10.26342/2024-72-9
Idioma:	eng
Tipo:	info:eu-repo/semantics/article
Derechos:	© Sociedad Española para el Procesamiento del Lenguaje Natural. Distribuido bajo Licencia Creative Commons Reconocimiento-NoComercial-SinObraDerivada 4.0
Revisión científica:	si
Versión del editor:	https://doi.org/10.26342/2024-72-9
Aparece en las colecciones:	INV - GPLSI - Artículos de Revistas

Archivos en este ítem:

Archivos en este ítem:
Archivo	Descripción	Tamaño	Formato
Martinez-Murillo_etal_2024_PLN.pdf		1,71 MB	Adobe PDF	Abrir Vista previa Cerrar vista previa

Ver citas en Google Académico

Muestra el registro completo