Sentence selection for improving the tuning process of a statistical machine translation system

López Ludeña, Verónica; San Segundo Hernández, Rubén; Montero Martínez, Juan Manuel; Lorenzo Trueba, Jaime

Sentence selection for improving the tuning process of a statistical machine translation system

Empreu sempre aquest identificador per citar o enllaçar aquest ítem http://hdl.handle.net/10045/22030

Información del item - Informació de l'item - Item information
Títol:	Sentence selection for improving the tuning process of a statistical machine translation system
Títol alternatiu:	Selección de frases para la mejora del proceso de ajuste de un sistema de traducción estadística
Autors:	López Ludeña, Verónica \| San Segundo Hernández, Rubén \| Montero Martínez, Juan Manuel \| Lorenzo Trueba, Jaime
Paraules clau:	Traducción estadística \| Selección de corpus \| Traducción basada en subfrases \| Traducción español-inglés \| Ajuste de pesos \| Statistical machine translation \| Corpus selection \| Phrase-based translation \| Spanish into English translation \| Weight tuning
Àrees de coneixement:	Lenguajes y Sistemas Informáticos
Data de publicació:	de març-2012
Editor:	Sociedad Española para el Procesamiento del Lenguaje Natural
Citació bibliogràfica:	LÓPEZ-LUDEÑA, Verónica, et al. “Sentence selection for improving the tuning process of a statistical machine translation system”. Procesamiento del Lenguaje Natural. N. 48 (2012). ISSN 1135-5948, pp. 51-56
Resum:	Este artículo describe una estrategia de selección de frases para hacer el ajuste de un sistema de traducción estadístico basado en el decodificador Moses que traduce del español al inglés. En este trabajo proponemos dos posibilidades para realizar esta selección de las frases del corpus de validación que más se parecen a las frases que queremos traducir (frases de test en lengua origen). Con esta selección podemos obtener unos mejores pesos de los modelos para emplearlos después en el proceso de traducción y, por tanto, mejorar los resultados. Concretamente, con el método de selección basado en la medida de similitud propuesta en este artículo, mejoramos la medida BLEU del 27,17% con el corpus de validación completo al 27,27% seleccionando las frases para el ajuste. Estos resultados se acercan a los del experimento ORACLE: se utilizan las mismas frases de test para hacer el ajuste de los pesos. En este caso, el BLEU obtenido es de 27,51%. \| This paper describes a sentence selection strategy for tuning a statistical machine translation system based on Moses that translates Spanish into English. This work proposes two techniques that allow selecting the more similar source sentences of the development corpus to the sentences to translate (source test sentences). With this selection, better model weights are obtained to be used later in the translation process and therefore, to obtain better translation results. In particular, with the similarity selection method proposed in this paper, experiments report a BLEU improvement from 27.17%, with the complete development set, to 27.27% BLEU, selecting the sentences for tuning. This result is closer to the result obtained for the ORACLE experiment: BLEU of 27.51%. The ORACLE experiment consists of using the same test set for tuning the system weights.
Patrocinadors:	The work leading to these results has received funding from the European Union under grant agreement n° 287678. It has also been supported by some domestic projects: TIMPANO (TIN2011-28169-C05-03), ITALIHA (CAM-UPM), INAPRA (MICINN, DPI2010-21247-C02-02), SD-TEAM (MEC, TIN2008-06856-C05-03) and MA2VICMR (Comunidad Autónoma de Madrid, S2009/TIC-1542) projects.
URI:	http://hdl.handle.net/10045/22030
ISSN:	1135-5948
Idioma:	eng
Tipus:	info:eu-repo/semantics/article
Revisió científica:	si
Apareix a la col·lecció:	Procesamiento del Lenguaje Natural - Nº 48 (2012)

Arxius per aquest ítem:

Arxius per aquest ítem:
Arxiu	Descripció	Tamany	Format
PLN_48_06.pdf		677,62 kB	Adobe PDF	Obrir Vista prèvia Tancar vista prèvia

Veure citacions a Google Académic

Mostrar el registre complet de l'ítem

Tots els documents dipositats a RUA estan protegits per drets d'autors. Alguns drets reservats.