Analysing the Problem of Automatic Evaluation of Language Generation Systems

Martínez-Murillo, Iván; Moreda, Paloma; Lloret, Elena

Analysing the Problem of Automatic Evaluation of Language Generation Systems

Empreu sempre aquest identificador per citar o enllaçar aquest ítem http://hdl.handle.net/10045/142187

Información del item - Informació de l'item - Item information
Títol:	Analysing the Problem of Automatic Evaluation of Language Generation Systems
Títol alternatiu:	Analizando el Problema de la Evaluación Automática de los Sistemas de Generación de Lenguaje
Autors:	Martínez-Murillo, Iván \| Moreda, Paloma \| Lloret, Elena
Grups d'investigació o GITE:	Procesamiento del Lenguaje y Sistemas de Información (GPLSI)
Centre, Departament o Servei:	Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos
Paraules clau:	Natural Language Generation \| Evaluation metrics \| NLG architectures \| Language models \| Generación de Lenguaje Natural \| Métricas de evaluación \| Arquitecturas de generación \| Modelos de lenguaje
Data de publicació:	de març-2024
Editor:	Sociedad Española para el Procesamiento del Lenguaje Natural
Citació bibliogràfica:	Procesamiento del Lenguaje Natural. 2024, 72: 123-136. https://doi.org/10.26342/2024-72-9
Resum:	Automatic text evaluation metrics are widely used to measure the performance of a Natural Language Generation (NLG) system. However, these metrics have several limitations. This article empirically analyses the problem with current evaluation metrics, such as their lack of ability to measure the semantic quality of a text or their high dependence on the texts they are compared against. Additionally, traditional NLG systems are compared against more recent systems based on neural networks. Finally, an experiment with GPT-4 is proposed to determine if it is a reliable source for evaluating the validity of a text. From the results obtained, it can be concluded that with the current automatic metrics, the improvement of neural systems compared to traditional ones is not so significant. On the other hand, if we analyse the qualitative aspects of the texts generated, this improvement is reflected. \| Las métricas automáticas de evaluación de texto se utilizan ampliamente para medir el rendimiento de un sistema de Generación de Lenguaje Natural (GLN). Sin embargo, estas métricas tienen varias limitaciones. Este artículo propone un estudio empírico donde se analiza el problema que tienen las métricas de evaluación actuales, como la falta capacidad que tienen estos sistemas de medir la calidad semántica de un texto, o la alta dependencia que tienen estas métricas sobre los textos contra los que se comparan. Además, se comparan sistemas de GLN tradicionales contra sistemas más actuales basados en redes neuronales. Finalmente, se propone una experimentación con GPT-4 para determinar si es una fuente fiable para evaluar la calidad de un texto. A partir de los resultados obtenidos, se puede concluir que con las métricas automáticas actuales la mejora de los sistemas neuronales frente a los tradicionales no es tan significativa. En cambio, si se analizan los aspectos cualitativos de los textos generados, sí que se refleja esa mejora.
Patrocinadors:	The research work conducted is part of the R&D projects “CORTEX: Conscious Text Generation” (PID2021-123956OB-I00), funded by MCIN/AEI/10.13039/501100011033/ and by “ERDF A way of making Europe”; “CLEAR.TEXT:Enhancing the modernization public sector organizations by deploying Natural Language Processing to make their digital content CLEARER to those with cognitive disabilities” (TED2021-130707B-I00), funded by MCIN/AEI/10.13039/501100011033 and “European Union NextGenerationEU/PRTR”; and the project “NL4DISMIS: Natural Language Technologies for dealing with dis- and misinformation with grant reference (CIPROM/2021/21)” funded by the Generalitat Valenciana. Moreover, it has been also partially funded by the European Commission ICT COST Action “Multi-task, Multilingual, Multi-modal Language Generation” (CA18231).
URI:	http://hdl.handle.net/10045/142187
ISSN:	1135-5948
DOI:	10.26342/2024-72-9
Idioma:	eng
Tipus:	info:eu-repo/semantics/article
Drets:	© Sociedad Española para el Procesamiento del Lenguaje Natural. Distribuido bajo Licencia Creative Commons Reconocimiento-NoComercial-SinObraDerivada 4.0
Revisió científica:	si
Versió de l'editor:	https://doi.org/10.26342/2024-72-9
Apareix a la col·lecció:	INV - GPLSI - Artículos de Revistas

Arxius per aquest ítem:

Arxius per aquest ítem:
Arxiu	Descripció	Tamany	Format
Martinez-Murillo_etal_2024_PLN.pdf		1,71 MB	Adobe PDF	Obrir Vista prèvia Tancar vista prèvia

Veure citacions a Google Académic

Mostrar el registre complet de l'ítem

Tots els documents dipositats a RUA estan protegits per drets d'autors. Alguns drets reservats.