Improving an automatically extracted corpus for UMLS Metathesaurus word sense disambiguation

Jimeno-Yepes, Antonio; Aronson, Alan R.

Improving an automatically extracted corpus for UMLS Metathesaurus word sense disambiguation

Please use this identifier to cite or link to this item: http://hdl.handle.net/10045/85169

Información del item - Informació de l'item - Item information
Title:	Improving an automatically extracted corpus for UMLS Metathesaurus word sense disambiguation
Other Titles:	Mejora de un corpus extraído automáticamente para desambiguar términos del UMLS Metathesaurus
Authors:	Jimeno-Yepes, Antonio \| Aronson, Alan R.
Keywords:	Desambiguación \| Extracción de terminología \| Dominio Biomédico \| Estadísticas de corpus \| Categorización Semántica \| Word Sense Disambiguation \| Term Extraction \| Biomedical Domain \| Corpus statistics \| Semantic Categorization
Knowledge Area:	Lenguajes y Sistemas Informáticos
Issue Date:	Oct-2010
Publisher:	Sociedad Española para el Procesamiento del Lenguaje Natural
Citation:	Jimeno-Yepes, Antonio; Aronson, Alan R. “Improving an automatically extracted corpus for UMLS Metathesaurus word sense disambiguation”. Procesamiento del Lenguaje Natural. N. 45 (2010). ISSN 1135-5948
Abstract:	Anotar a mano un conjunto de ejemplos para entrenar métodos de aprendizaje automático para desambiguar anotaciones con conceptos del UMLS Metathesaurus no es posible debido a su elevado coste. En este artículo, evaluamos dos métodos para mejorar la calidad de un corpus obtenido de manera automática. El primer método busca términos específicos y el segundo filtra falsos positivos. La combinación de los dos métodos obtiene una mejora de 6% en F-measure y un 8% en recall, comparado con el corpus original extraído de manera automática. \| Manually annotated data is expensive, so manually covering a large terminological resource like the UMLS Metathesaurus is infeasible. In this paper, we evaluate two approaches used to improve the quality of an automatically extracted corpus to train statistical learners to performWSD. The first one contributes to more specific terms while the second filters out false positives. Using both approaches, we have obtained an improvement on the original automatic extracted corpus of approximately 6% in F-measure and 8% in recall.
URI:	http://hdl.handle.net/10045/85169
ISSN:	1135-5948
Language:	eng
Type:	info:eu-repo/semantics/article
Peer Review:	si
Appears in Collections:	Procesamiento del Lenguaje Natural - Nº 45 (2010)

Files in This Item:

Files in This Item:
File	Description	Size	Format
PLN_45_239-242.pdf		606,14 kB	Adobe PDF	Open Preview Close preview

See citations in Google Scholar

Show full item record