Boosting bitext compression
Por favor, use este identificador para citar o enlazar este ítem:
http://hdl.handle.net/10045/27537
Título: | Boosting bitext compression |
---|---|
Autor/es: | Adiego Rodríguez, Joaquín | Martínez Prieto, Miguel Ángel | Hoyos Torío, Javier E. | Sánchez-Martínez, Felipe |
Grupo/s de investigación o GITE: | Transducens |
Centro, Departamento o Servicio: | Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos |
Palabras clave: | Compression boosting | Bitext compression |
Área/s de conocimiento: | Lenguajes y Sistemas Informáticos |
Fecha de publicación: | 2011 |
Editor: | Springer Berlin / Heidelberg |
Cita bibliográfica: | ADIEGO, Joaquín, et al. "Boosting bitext compression". En: Trends in Practical Applications of Agents and Multiagent Systems : 9th International Conference on Practical Applications of Agents and Multiagent Systems. Berlin : Springer, 2011. (Advances in Intelligent and Soft Computing; 90). ISBN 978-3-642-19930-1, pp. 109-116 |
Resumen: | Bilingual parallel corpora, also know as bitexts, convey the same information in two different languages. This implies that when modelling bitexts one can take advantage of the fact that there exists a relation between both texts; the text alignment task allow to establish such relationship. In this paper we propose different approaches that use words and biwords (pairs made of two words, each one from a different text) as representation symbolic units. The properties of these approaches are analysed from a statistical point of view and tested as a preprocessing step to general purpose compressors. The results obtained suggest interesting conclusions concerning the use of both words and biwords. When encoded models are used as compression boosters we achieve compression ratios improving state-of-the-art compressors up to 6.5 percentage points, being up to 40% faster. |
Patrocinador/es: | Work supported by the Spanish Government through projects TIN2009-14009-C02-01 and TIN2009-14009-C02-02; and by the Millennium Institute for Cell Dynamics and Biotechnology (ICDB) (Grant ICM P05-001-F). |
URI: | http://hdl.handle.net/10045/27537 |
ISBN: | 978-3-642-19930-1 |
ISSN: | 1867-5662 (Print) | 1867-5670 (Online) |
DOI: | 10.1007/978-3-642-19931-8_14 |
Idioma: | eng |
Tipo: | info:eu-repo/semantics/conferenceObject |
Derechos: | The original publication is available at www.springerlink.com |
Revisión científica: | si |
Versión del editor: | http://dx.doi.org/10.1007/978-3-642-19931-8_14 |
Aparece en las colecciones: | INV - TRANSDUCENS - Comunicaciones a Congresos, Conferencias, etc. |
Archivos en este ítem:
Archivo | Descripción | Tamaño | Formato | |
---|---|---|---|---|
adiego11a.pdf | Versión revisada (acceso abierto) | 118,78 kB | Adobe PDF | Abrir Vista previa |
adiego11a_final.pdf | Versión final (acceso restringido) | 168,55 kB | Adobe PDF | Abrir Solicitar una copia |
Todos los documentos en RUA están protegidos por derechos de autor. Algunos derechos reservados.