Decoupling music notation to improve end-to-end Optical Music Recognition

Alfaro-Contreras, María; Ríos-Vila, Antonio; Valero-Mas, Jose J.; Iñesta, José M.; Calvo-Zaragoza, Jorge

Decoupling music notation to improve end-to-end Optical Music Recognition

Please use this identifier to cite or link to this item: http://hdl.handle.net/10045/123164

Información del item - Informació de l'item - Item information
Title:	Decoupling music notation to improve end-to-end Optical Music Recognition
Authors:	Alfaro-Contreras, María \| Ríos-Vila, Antonio \| Valero-Mas, Jose J. \| Iñesta, José M. \| Calvo-Zaragoza, Jorge
Research Group/s:	Reconocimiento de Formas e Inteligencia Artificial
Center, Department or Service:	Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos \| Universidad de Alicante. Instituto Universitario de Investigación Informática
Keywords:	Optical Music Recognition \| Deep Learning \| Connectionist Temporal Classification \| Sequence Labeling
Knowledge Area:	Lenguajes y Sistemas Informáticos
Issue Date:	26-Apr-2022
Publisher:	Elsevier
Citation:	Pattern Recognition Letters. 2022, 158: 157-163. https://doi.org/10.1016/j.patrec.2022.04.032
Abstract:	Inspired by the Text Recognition field, end-to-end schemes based on Convolutional Recurrent Neural Networks (CRNN) trained with the Connectionist Temporal Classification (CTC) loss function are considered one of the current state-of-the-art techniques for staff-level Optical Music Recognition (OMR). Unlike text symbols, music-notation elements may be defined as a combination of (i) a shape primitive located in (ii) a certain position in a staff. However, this double nature is generally neglected in the learning process, as each combination is treated as a single token. In this work, we study whether exploiting such particularity of music notation actually benefits the recognition performance and, if so, which approach is the most appropriate. For that, we thoroughly review existing specific approaches that explore this premise and propose different combinations of them. Furthermore, considering the limitations observed in such approaches, a novel decoding strategy specifically designed for OMR is proposed. The results obtained with four different corpora of historical manuscripts show the relevance of leveraging this double nature of music notation since it outperforms the standard approaches where it is ignored. In addition, the proposed decoding leads to significant reductions in the error rates with respect to the other cases.
Sponsor:	This paper is part of the project I+D+i PID2020-118447RA-I00 (MultiScore), funded by MCIN/AEI/10.13039/501100011033. The first author is supported by grant FPU19/04957 from the Spanish Ministerio de Universidades. The second author is supported by grant ACIF/2021/356 from “Programa I+D+i de la Generalitat Valenciana“. The third author is supported by grant APOSTD/2020/256 from “Programa I+D+i de la Generalitat Valenciana”.
URI:	http://hdl.handle.net/10045/123164
ISSN:	0167-8655 (Print) \| 1872-7344 (Online)
DOI:	10.1016/j.patrec.2022.04.032
Language:	eng
Type:	info:eu-repo/semantics/article
Rights:	© 2022 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer Review:	si
Publisher version:	https://doi.org/10.1016/j.patrec.2022.04.032
Appears in Collections:	INV - GRFIA - Artículos de Revistas

Files in This Item:

Files in This Item:
File	Description	Size	Format
Alfaro-Contreras_etal_2022_PatternRecognLett.pdf		1,07 MB	Adobe PDF	Open Preview Close preview

See citations in Google Scholar

Show full item record

This item is licensed under a Creative Commons License