Uso de técnicas basadas en one-shot learning para la identificación del locutor

Chica, Juan; Salamea Palacios, Christian

Uso de técnicas basadas en one-shot learning para la identificación del locutor

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/104718

Información del item - Informació de l'item - Item information
Título:	Uso de técnicas basadas en one-shot learning para la identificación del locutor
Título alternativo:	Speaker Identification using techniques based on one-shot learning
Autor/es:	Chica, Juan \| Salamea Palacios, Christian
Palabras clave:	Identificación del locutor \| Independiente de texto \| Meta Learning \| N-way clasification \| One-Shot learning \| Redes Neuronales Siamesas \| Voxceleb1 \| Speaker Identification \| Text independent \| Siamese Neural Network
Área/s de conocimiento:	Lenguajes y Sistemas Informáticos
Fecha de publicación:	mar-2020
Editor:	Sociedad Española para el Procesamiento del Lenguaje Natural
Cita bibliográfica:	Procesamiento del Lenguaje Natural. 2020, 64: 101-108. doi:10.26342/2020-64-12
Resumen:	Un sistema para la identificación de locutor, para ser eficaz requiere una extensa cantidad de muestras de audio por cada locutor que no siempre es fácil de obtener. En contraste, sistemas basados en Meta-learning (en español, aprender a aprender) como one-shot learning utilizan una única muestra para diferenciar entre clases. En este trabajo se evalúa el potencial de un sistema de meta-learning para la identificación del locutor independiente del texto. En la experimentación se utilizan: espectrograma de mel, i-vectores y re muestreo (downsampling) para procesar el audio y obtener un vector de características. Este vector es la entrada de una red neuronal siamesa que se encarga de realizar la identificación. El mejor resultado se obtuvo al diferenciar entre 4 locutores con una exactitud de 0.9. Los resultados mostraron que el uso de técnicas basadas en one-shot learning tiene gran potencial para ser usados en la identificación del locutor y podrían ser muy útiles en ambientes reales como la biometría oámbitos forenses por su versatilidad. \| A speaker identification system in order to be effective requires a large number of audio samples of each speaker, which are not always accessible or easy to collect. In contrast, systems based on meta-learning like one-shot learning, use a single sample to differentiate between classes. This work evaluates the potential of applying the meta-learning approach to text-independent speaker identification tasks. In the experimentation mel spectrogram, i-vectors and resample (downsampling) are used to both process the audio signal and to obtain a feature vector. This feature vector is the input of a siamese neural network that is responsible for performing the identification task. The best result was obtained by differentiating between 4 speakers with an accuracy of 0.9. The obtained results show that one-shot learning approaches have great potential to be used speaker identification and could be very useful in a real field like biometrics or forensic because of its versatility.
URI:	http://hdl.handle.net/10045/104718
ISSN:	1135-5948
DOI:	10.26342/2020-64-12
Idioma:	spa
Tipo:	info:eu-repo/semantics/article
Derechos:	© Sociedad Española para el Procesamiento del Lenguaje Natural
Revisión científica:	si
Versión del editor:	https://doi.org/10.26342/2020-64-12
Aparece en las colecciones:	Procesamiento del Lenguaje Natural - Nº 64 (2020)

Archivos en este ítem:

Archivos en este ítem:
Archivo	Descripción	Tamaño	Formato
PLN_64_12.pdf		1,05 MB	Adobe PDF	Abrir Vista previa Cerrar vista previa

Ver citas en Google Académico

Muestra el registro completo