Chunk and clause identification for Basque by Filtering and Ranking with Perceptrons

Alegría Loinaz, Iñaki; Arrieta Cortajarena, Bertol; Carreras Pérez, Xavier; Díaz de Ilarraza Sánchez, Arantza; Uria Garin, Larraitz

Chunk and clause identification for Basque by Filtering and Ranking with Perceptrons

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/8054

Registro completo de metadatos

Registro completo de metadatos
Campo DC	Valor	Idioma
dc.contributor.author	Alegría Loinaz, Iñaki	-
dc.contributor.author	Arrieta Cortajarena, Bertol	-
dc.contributor.author	Carreras Pérez, Xavier	-
dc.contributor.author	Díaz de Ilarraza Sánchez, Arantza	-
dc.contributor.author	Uria Garin, Larraitz	-
dc.date.accessioned	2008-10-13T07:22:42Z	-
dc.date.available	2008-10-13T07:22:42Z	-
dc.date.issued	2008-09	-
dc.identifier.citation	ALEGRÍA LOINAZ, Iñaki, et al. “Chunk and clause identification for Basque by Filtering and Ranking with Perceptrons”. Procesamiento del lenguaje natural. N. 41 (sept. 2008). ISSN 1135-5948, pp. 5-12	en
dc.identifier.issn	1135-5948	-
dc.identifier.uri	http://hdl.handle.net/10045/8054	-
dc.description.abstract	Este artículo presenta sistemas de identificación de chunks y cláusulas para el euskera, combinando gramáticas basadas en reglas con técnicas de aprendizaje automático. Más concretamente, se utiliza el modelo de Filtrado y Ranking con el Perceptron (Carreras, Màrquez y Castro, 2005): un modelo de aprendizaje que permite identificar estructuras sintácticas parciales en la oración, con resultados óptimos para estas tareas en inglés. Este modelo permite incorporar nuevos atributos, y posibilita así el uso de información de diferentes fuentes. De esta manera, hemos añadido información lingüística en los algoritmos de aprendizaje. Así, los resultados del identificador de chunks han mejorado considerablemente y se ha compensado la influencia del relativamente pequeño corpus de entrenamiento que disponemos para el euskera. En cuanto a la identificación de cláusulas, los primeros resultados no son demasiado buenos, debido probablemente al orden libre del euskera y al pequeño corpus del que disponemos actualmente.	en
dc.description.abstract	This paper presents systems for syntactic chunking and clause identification for Basque, combining rule-based grammars with machine-learning techniques. Precisely, we used Filtering-Ranking with Perceptrons (Carreras, Màrquez and Castro, 2005): a learning model that recognizes partial syntactic structures in sentences, obtaining state-of-the-art performance for these tasks in English. This model allows incorporating a rich set of features to represent syntactic phrases, making possible to use information from different sources. We used this property in order to include more linguistic features in the learning model and the results obtained in chunking have been improved greatly. This way, we have made up for the relatively small training data available for Basque to learn a chunking model. In the case of clause identification, our preliminary results are low, which suggest that this is due to the free order of Basque and to the small corpus available.	en
dc.description.sponsorship	Research partly funded by the Basque Government (Department of Education, University and Research, IT-397-07), the Spanish Ministry of Education and Science (TIN2007-63173) and the ETORTEK-ANHITZ project from the Basque Government (Department of Culture and Industry, IE06- 185).	en
dc.language	eng	en
dc.publisher	Sociedad Española para el Procesamiento del Lenguaje Natural	en
dc.relation.ispartof	Procesamiento del lenguaje natural. N. 41 (septiembre 2008); pp. 5-12	en
dc.subject	Lengua vasca	en
dc.subject	Análisis parcial	en
dc.subject	Chunking	en
dc.subject	Indentificación de cláusulas	en
dc.subject	Aprendizaje automático	en
dc.subject	Aprendizaje discriminatorio	en
dc.subject	Perceptron	en
dc.subject	Basque language	en
dc.subject	Shallow parsing	en
dc.subject	Clause identification	en
dc.subject	Machine learning	en
dc.subject	Discriminative learning	en
dc.title	Chunk and clause identification for Basque by Filtering and Ranking with Perceptrons	en
dc.title.alternative	Identificación de cláusulas y chunks para el Euskera, usando Filtrado y Ranking con el Perceptron	en
dc.type	info:eu-repo/semantics/article	en
dc.rights.accessRights	info:eu-repo/semantics/openAccess	-
Aparece en las colecciones:	Procesamiento del Lenguaje Natural - Nº 41 (septiembre 2008)

Archivos en este ítem:

Archivos en este ítem:
Archivo	Descripción	Tamaño	Formato
PLN_41_01.pdf		222,99 kB	Adobe PDF	Abrir Vista previa Cerrar vista previa

Ver citas en Google Académico

Muestra el registro sencillo