Optimizing Data-Driven Models for Summarization as Parallel Tasks

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/106331
Información del item - Informació de l'item - Item information
Título: Optimizing Data-Driven Models for Summarization as Parallel Tasks
Autor/es: Zamuda, Aleš | Lloret, Elena
Grupo/s de investigación o GITE: Procesamiento del Lenguaje y Sistemas de Información (GPLSI)
Centro, Departamento o Servicio: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos
Palabras clave: Text Summarization | Discrete Optimization | Distributed Computing | Data-Driven Model | Differential Evolution
Área/s de conocimiento: Lenguajes y Sistemas Informáticos
Fecha de publicación: abr-2020
Editor: Elsevier
Cita bibliográfica: Journal of Computational Science. 2020, 42: 101101. doi:10.1016/j.jocs.2020.101101
Resumen: This paper presents tackling of a hard optimization problem of computational linguistics, specifically automatic multi-document text summarization, using grid computing. The main challenge of multi-document summarization is to extract the most relevant and unique information effectively and efficiently from a set of topic-related documents, constrained to a specified length. In the Big Data/Text era, where the information increases exponentially, optimization becomes essential in selection of the most representative sentences for generating the best summaries. Therefore, a data-driven summarization model is proposed and optimized during a run of Differential Evolution (DE). Different DE runs are distributed to a grid in parallel as optimization tasks, seeking high processing throughput despite the demanding complexity of the linguistic model, especially on longer multi-documents where DE improves results given more iterations. Namely, parallelization and the grid enable, running several independent DE runs at same time within fixed real-time budget. Such approach results in improving a Document Understanding Conference (DUC) benchmark recall metric over a previous setting.
Patrocinador/es: This paper is based upon work from COST Action IC1406High-Performance Modelling and Simulation for Big Data Appli-cations (cHiPSet), supported by COST (European Cooperation inScience and Technology). This paper is also based upon workfrom COST Actions CA15140 “Improving Applicability of Nature-Inspired Optimisation by Joining Theory and Practice (ImAppNIO)”, and CA18231 “Multi3Generation: Multi-task, Multilingual, Multi-modal Language Generation”, both supported by COST. The author AZ acknowledges the financial support from the Slovenian Research Agency (Research Core Funding No. P2-0041). AZ also acknowledges EU support under Project No. 5442-24/2017/6 (HPC – RIVR). AZ also acknowledges the EU Interreg Alpine Space project SmartVillages and Erasmus TSM grant. The author EL acknowledges the financial support by the Generalitat Valenciana through the Research Project PROMETEU/2018/089, and by the Spanish Government through the INTEGER project (RTI2018-094649-B-I00), and network RED iGLN (TIN2017-90773-REDT).
URI: http://hdl.handle.net/10045/106331
ISSN: 1877-7503 (Print) | 1877-7511 (Online)
DOI: 10.1016/j.jocs.2020.101101
Idioma: eng
Tipo: info:eu-repo/semantics/article
Derechos: © 2020 Elsevier B.V.
Revisión científica: si
Versión del editor: https://doi.org/10.1016/j.jocs.2020.101101
Aparece en las colecciones:INV - GPLSI - Artículos de Revistas

Archivos en este ítem:
Archivos en este ítem:
Archivo Descripción TamañoFormato 
ThumbnailZamuda_Lloret_2020_JComputSci_final.pdfVersión final (acceso restringido)6,38 MBAdobe PDFAbrir    Solicitar una copia
ThumbnailZamuda_Lloret_2020_JComputSci_accepted.pdfAccepted Manuscript (acceso abierto)3,68 MBAdobe PDFAbrir Vista previa


Todos los documentos en RUA están protegidos por derechos de autor. Algunos derechos reservados.