Optimizing Data-Driven Models for Summarization as Parallel Tasks

Zamuda, Aleš; Lloret, Elena

Optimizing Data-Driven Models for Summarization as Parallel Tasks

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/106331

Información del item - Informació de l'item - Item information
Título:	Optimizing Data-Driven Models for Summarization as Parallel Tasks
Autor/es:	Zamuda, Aleš \| Lloret, Elena
Grupo/s de investigación o GITE:	Procesamiento del Lenguaje y Sistemas de Información (GPLSI)
Centro, Departamento o Servicio:	Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos
Palabras clave:	Text Summarization \| Discrete Optimization \| Distributed Computing \| Data-Driven Model \| Differential Evolution
Área/s de conocimiento:	Lenguajes y Sistemas Informáticos
Fecha de publicación:	abr-2020
Editor:	Elsevier
Cita bibliográfica:	Journal of Computational Science. 2020, 42: 101101. doi:10.1016/j.jocs.2020.101101
Resumen:	This paper presents tackling of a hard optimization problem of computational linguistics, specifically automatic multi-document text summarization, using grid computing. The main challenge of multi-document summarization is to extract the most relevant and unique information effectively and efficiently from a set of topic-related documents, constrained to a specified length. In the Big Data/Text era, where the information increases exponentially, optimization becomes essential in selection of the most representative sentences for generating the best summaries. Therefore, a data-driven summarization model is proposed and optimized during a run of Differential Evolution (DE). Different DE runs are distributed to a grid in parallel as optimization tasks, seeking high processing throughput despite the demanding complexity of the linguistic model, especially on longer multi-documents where DE improves results given more iterations. Namely, parallelization and the grid enable, running several independent DE runs at same time within fixed real-time budget. Such approach results in improving a Document Understanding Conference (DUC) benchmark recall metric over a previous setting.
Patrocinador/es:	This paper is based upon work from COST Action IC1406High-Performance Modelling and Simulation for Big Data Appli-cations (cHiPSet), supported by COST (European Cooperation inScience and Technology). This paper is also based upon workfrom COST Actions CA15140 “Improving Applicability of Nature-Inspired Optimisation by Joining Theory and Practice (ImAppNIO)”, and CA18231 “Multi3Generation: Multi-task, Multilingual, Multi-modal Language Generation”, both supported by COST. The author AZ acknowledges the financial support from the Slovenian Research Agency (Research Core Funding No. P2-0041). AZ also acknowledges EU support under Project No. 5442-24/2017/6 (HPC – RIVR). AZ also acknowledges the EU Interreg Alpine Space project SmartVillages and Erasmus TSM grant. The author EL acknowledges the financial support by the Generalitat Valenciana through the Research Project PROMETEU/2018/089, and by the Spanish Government through the INTEGER project (RTI2018-094649-B-I00), and network RED iGLN (TIN2017-90773-REDT).
URI:	http://hdl.handle.net/10045/106331
ISSN:	1877-7503 (Print) \| 1877-7511 (Online)
DOI:	10.1016/j.jocs.2020.101101
Idioma:	eng
Tipo:	info:eu-repo/semantics/article
Derechos:	© 2020 Elsevier B.V.
Revisión científica:	si
Versión del editor:	https://doi.org/10.1016/j.jocs.2020.101101
Aparece en las colecciones:	INV - GPLSI - Artículos de Revistas

Archivos en este ítem:

Archivos en este ítem:
Archivo	Descripción	Tamaño	Formato
Zamuda_Lloret_2020_JComputSci_final.pdf	Versión final (acceso restringido)	6,38 MB	Adobe PDF	Abrir Solicitar una copia
Zamuda_Lloret_2020_JComputSci_accepted.pdf	Accepted Manuscript (acceso abierto)	3,68 MB	Adobe PDF	Abrir Vista previa Cerrar vista previa

Ver citas en Google Académico

Muestra el registro completo