Action segmentation and understanding in RGB videos with convolutional neural networks

Ivorra-Piqueres, David

Action segmentation and understanding in RGB videos with convolutional neural networks

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/76997

Información del item - Informació de l'item - Item information
Título:	Action segmentation and understanding in RGB videos with convolutional neural networks
Autor/es:	Ivorra-Piqueres, David
Director de la investigación:	Garcia-Rodriguez, Jose \| Garcia-Garcia, Alberto
Centro, Departamento o Servicio:	Universidad de Alicante. Departamento de Tecnología Informática y Computación
Palabras clave:	Action \| Segmentation \| Convolutional \| CNN \| Deep learning \| Neural network \| Machine learning \| Video \| RGB \| Recognition
Área/s de conocimiento:	Arquitectura y Tecnología de Computadores
Fecha de publicación:	2-jul-2018
Fecha de lectura:	19-jun-2018
Resumen:	In this work, we propose three techniques for accelerating a modern action recognition pipeline. For reaching this point, we carried out an extensive study about the current state of action recognition. First, the traditional action recognition methods based on handcrafted features where reviewed, along with the ones focused on using machine learning as well as those which use deep learning techniques. Valuable insights were extracted from them, as well as the difficulties the task of action recognition carries out. Subsequently, we explored numerous video datasets available for properly train deep models in the task of understanding human actions. Then, several video action recognition works from the past three years where thoroughly studied. Coming back to the three proposed techniques, we first selected two of the reviewed deep learning works. Specifically, (1) Temporal Segment Networks (TSN), a Convolutional Neural Network (CNN) framework that makes use of a small number of video frames for obtaining robust predictions which have allowed to win the first place in the 2016 ActivityNet challenge; (2) MotionNet, a network that is capable of inferring optical flow from Red Green and Blue (RGB) frames by means of simple and transposed convolutions. As the third proposal we selected NVIDIA Video Loader (NVVL), a new Graphics Processing Unit (GPU) video decoding software developed by NVIDIA. Useful for efficiently reducing the data transfer and storage bottleneck of video deep learning applications. Finally, we combine the first and third technologies. For this, we trained the RGB stream of the TSN network in videos loaded with NVVL on a subset of daily actions from the University of Central Florida 101 (UCF101) dataset. Thus proving their validity for their integration into a deep learning action recognition system that is both faster and effective.
URI:	http://hdl.handle.net/10045/76997
Idioma:	eng
Tipo:	info:eu-repo/semantics/bachelorThesis
Derechos:	Licencia Creative Commons Reconocimiento-CompartirIgual 4.0
Aparece en las colecciones:	Grado en Ingeniería Informática - Trabajos Fin de Grado

Archivos en este ítem:

Archivos en este ítem:
Archivo	Descripción	Tamaño	Formato
Action_segmentation_and_understanding_in_RGB_videos_wi_Ivorra_Piqueres_David.pdf		16,4 MB	Adobe PDF	Abrir Vista previa Cerrar vista previa

Ver citas en Google Académico

Muestra el registro completo