Cooperative unsupervised training of the part-of-speech taggers in a bidirectional machine translation system

Sánchez-Martínez, Felipe; Pérez-Ortiz, Juan Antonio; Forcada, Mikel L.

Cooperative unsupervised training of the part-of-speech taggers in a bidirectional machine translation system

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/27522

Información del item - Informació de l'item - Item information
Título:	Cooperative unsupervised training of the part-of-speech taggers in a bidirectional machine translation system
Autor/es:	Sánchez-Martínez, Felipe \| Pérez-Ortiz, Juan Antonio \| Forcada, Mikel L.
Grupo/s de investigación o GITE:	Transducens
Centro, Departamento o Servicio:	Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos
Palabras clave:	Machine translation \| Part-of-speech taggers \| Unsupervised training
Área/s de conocimiento:	Lenguajes y Sistemas Informáticos
Fecha de publicación:	oct-2004
Editor:	International Conference on Theoretical and Methodological Issues in Machine Translation
Cita bibliográfica:	SÁNCHEZ-MARTÍNEZ, Felipe; PÉREZ-ORTIZ, Juan Antonio; FORCADA, Mikel L. "Cooperative unsupervised training of the part-of-speech taggers in a bidirectional machine translation system". En: Proceedings of the 10th International Conference on Theoretical and Methodological Issues in Machine Translation [Recurso electrónico] : Baltimore, MD, October 4-6, 2004, pp. 135-144
Resumen:	When building a machine translation system, the embedded part-of-speech (PoS) tagger deserves special attention, since PoS ambiguities are one of the main sources of mistranslations, specially when related languages are involved. The standard statistical approach for PoS tagging are hidden Markov models (HMM) properly trained by collecting statistics from source-language texts. In the case of bidirectional machine translation systems, this kind of training is often individually performed on each PoS tagger without taking into account the other language, that is, the corresponding target language. But target-language information may help to improve performance. In this paper, a new method is proposed which trains both PoS taggers simultaneously using mutual interaction: at every iteration, the parameters of the HMM corresponding to one of the languages are refined by using the statistical data supplied by the current HMM for the other language. Both models bootstrap by learning cooperatively in an unsupervised manner and require only monolingual texts; no aligned texts are needed. Preliminary results are promising and surpass those of traditional unsupervised approaches.
URI:	http://hdl.handle.net/10045/27522
Idioma:	eng
Tipo:	info:eu-repo/semantics/conferenceObject
Revisión científica:	si
Aparece en las colecciones:	INV - TRANSDUCENS - Comunicaciones a Congresos, Conferencias, etc.

Archivos en este ítem:

Archivos en este ítem:
Archivo	Descripción	Tamaño	Formato
sanchez04a.pdf		76,04 kB	Adobe PDF	Abrir Vista previa Cerrar vista previa

Ver citas en Google Académico

Muestra el registro completo