Efficient gesture recognition for the assistance of visually impaired people using multi-head neural networks

Alashhab, Samer; Gallego, Antonio-Javier; Lozano, Miguel Angel

Efficient gesture recognition for the assistance of visually impaired people using multi-head neural networks

Empreu sempre aquest identificador per citar o enllaçar aquest ítem http://hdl.handle.net/10045/125123

Información del item - Informació de l'item - Item information
Títol:	Efficient gesture recognition for the assistance of visually impaired people using multi-head neural networks
Autors:	Alashhab, Samer \| Gallego, Antonio-Javier \| Lozano, Miguel Angel
Grups d'investigació o GITE:	Reconocimiento de Formas e Inteligencia Artificial \| Laboratorio de Investigación en Visión Móvil (MVRLab)
Centre, Departament o Servei:	Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos \| Universidad de Alicante. Departamento de Ciencia de la Computación e Inteligencia Artificial
Paraules clau:	Multi-head architectures \| Hand gesture detection \| Visual impairments \| Deep Neural Networks
Data de publicació:	13-de juliol-2022
Editor:	Elsevier
Citació bibliogràfica:	Engineering Applications of Artificial Intelligence. 2022, 114: 105188. https://doi.org/10.1016/j.engappai.2022.105188
Resum:	Existing research for the assistance of visually impaired people mainly focus on solving a single task (such as reading a text or detecting an obstacle), hence forcing the user to switch applications to perform other actions. This paper proposes an interactive system for mobile devices controlled by hand gestures that allow the user to control the device and use several assistance tools by making simple static and dynamic hand gestures (e.g., pointing a finger at an object will show a description of it). The system is based on a multi-head neural network, which initially detects and classifies the gestures, and subsequently, depending on the gesture detected, performs a second stage that carries out the corresponding action. This architecture optimizes the resources required to perform different tasks, it takes advantage of the information obtained from an initial backbone to perform different processes in a second stage. To train and evaluate the system, a dataset with about 40k images was manually compiled and labeled including different types of hand gestures, backgrounds (indoors and outdoors), lighting conditions, etc. This dataset contains synthetic gestures (whose objective is to pre-train the system to improve the results) and real images captured using different mobile phones. The comparison made with nearly 50 state-of-the-art methods shows competitive results as regards the different actions performed by the system, such as the accuracy of classification and localization of gestures, or the generation of descriptions for objects and scenes.
URI:	http://hdl.handle.net/10045/125123
ISSN:	0952-1976 (Print) \| 1873-6769 (Online)
DOI:	10.1016/j.engappai.2022.105188
Idioma:	eng
Tipus:	info:eu-repo/semantics/article
Drets:	© 2022 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Revisió científica:	si
Versió de l'editor:	https://doi.org/10.1016/j.engappai.2022.105188
Apareix a la col·lecció:	INV - MVRLab - Artículos de Revistas INV - GRFIA - Artículos de Revistas

Arxius per aquest ítem:

Arxius per aquest ítem:
Arxiu	Descripció	Tamany	Format
Alashhab_etal_2022_EngApplArtificialIntelligence.pdf		3,48 MB	Adobe PDF	Obrir Vista prèvia Tancar vista prèvia

Veure citacions a Google Académic

Mostrar el registre complet de l'ítem

Aquest ítem està subjecte a una llicència de Creative Commons Llicència Creative Commons