Creación de un corpus de noticias de gran tamaño en español para el análisis diacrónico y diatópico del uso del lenguaje

Razgovorov, Pavel; Tomás, David

Creación de un corpus de noticias de gran tamaño en español para el análisis diacrónico y diatópico del uso del lenguaje

Please use this identifier to cite or link to this item: http://hdl.handle.net/10045/89930

Información del item - Informació de l'item - Item information
Title:	Creación de un corpus de noticias de gran tamaño en español para el análisis diacrónico y diatópico del uso del lenguaje
Other Titles:	Creation of a large news corpus in Spanish for the diachronic and diatopic analysis of the use of language
Authors:	Razgovorov, Pavel \| Tomás, David
Research Group/s:	Procesamiento del Lenguaje y Sistemas de Información (GPLSI)
Center, Department or Service:	Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos
Keywords:	Corpus \| Minería de texto \| Análisis diacrónico \| Análisis diatópico \| Text mining \| Diachronic analysis \| Diatopic analysis
Knowledge Area:	Lenguajes y Sistemas Informáticos
Issue Date:	Mar-2019
Publisher:	Sociedad Española para el Procesamiento del Lenguaje Natural
Citation:	Procesamiento del Lenguaje Natural. 2019, 62: 29-36. doi:10.26342/2019-62-3
Abstract:	Este artículo describe el proceso llevado a cabo para desarrollar un corpus de noticias periodísticas de gran tamaño en español. Todos los textos recopilados están ubicados tanto temporal como geográficamente. Esto lo convierte en un recurso de gran utilidad para trabajos en el ámbito de la lingüística, la sociología y el periodismo de datos, permitiendo tanto el estudio diacrónico y diatópico del uso del lenguaje como el seguimiento de la evolución de determinados eventos. El corpus se puede descargar libremente empleando el software que se ha desarrollado como parte de este trabajo. El artículo se completa con un análisis estadístico del corpus y con la presentación de dos casos de estudio que muestran su potencial a la hora de analizar sucesos. \| This article describes the process carried out to develop a large corpus of news stories in Spanish. The collected texts are located both temporally and geographically. This makes it a very useful resource to work with in the field of linguistics, sociology and data journalism, allowing the diachronic and diatopic study of the use of language and tracking the evolution of specific events. The corpus can be freely downloaded using the software developed as part of this work. The article includes a statistical analysis of the corpus and two case studies that show its potential for event analysis.
URI:	http://hdl.handle.net/10045/89930
ISSN:	1135-5948
DOI:	10.26342/2019-62-3
Language:	spa
Type:	info:eu-repo/semantics/article
Rights:	© Sociedad Española para el Procesamiento del Lenguaje Natural
Peer Review:	si
Publisher version:	https://doi.org/10.26342/2019-62-3
Appears in Collections:	Procesamiento del Lenguaje Natural - Nº 62 (2019) INV - GPLSI - Artículos de Revistas

Files in This Item:

Files in This Item:
File	Description	Size	Format
PLN_62_03.pdf		1,21 MB	Adobe PDF	Open Preview Close preview

See citations in Google Scholar

Show full item record