End-to-End Neural Optical Music Recognition of Monophonic Scores
Empreu sempre aquest identificador per citar o enllaçar aquest ítem
http://hdl.handle.net/10045/74947
Títol: | End-to-End Neural Optical Music Recognition of Monophonic Scores |
---|---|
Autors: | Calvo-Zaragoza, Jorge | Rizo, David |
Grups d'investigació o GITE: | Reconocimiento de Formas e Inteligencia Artificial |
Centre, Departament o Servei: | Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos |
Paraules clau: | Optical Music Recognition | End-to-end recognition | Deep Learning | Music score images |
Àrees de coneixement: | Lenguajes y Sistemas Informáticos |
Data de publicació: | 11-d’abril-2018 |
Editor: | MDPI |
Citació bibliogràfica: | Calvo-Zaragoza J, Rizo D. End-to-End Neural Optical Music Recognition of Monophonic Scores. Applied Sciences. 2018; 8(4):606. doi:10.3390/app8040606 |
Resum: | Optical Music Recognition is a field of research that investigates how to computationally decode music notation from images. Despite the efforts made so far, there are hardly any complete solutions to the problem. In this work, we study the use of neural networks that work in an end-to-end manner. This is achieved by using a neural model that combines the capabilities of convolutional neural networks, which work on the input image, and recurrent neural networks, which deal with the sequential nature of the problem. Thanks to the use of the the so-called Connectionist Temporal Classification loss function, these models can be directly trained from input images accompanied by their corresponding transcripts into music symbol sequences. We also present the Printed Music Scores dataset, containing more than 80,000 monodic single-staff real scores in common western notation, that is used to train and evaluate the neural approach. In our experiments, it is demonstrated that this formulation can be carried out successfully. Additionally, we study several considerations about the codification of the output musical sequences, the convergence and scalability of the neural models, as well as the ability of this approach to locate symbols in the input score.Optical Music Recognition is a field of research that investigates how to computationally decode music notation from images. Despite the efforts made so far, there are hardly any complete solutions to the problem. In this work, we study the use of neural networks that work in an end-to-end manner. This is achieved by using a neural model that combines the capabilities of convolutional neural networks, which work on the input image, and recurrent neural networks, which deal with the sequential nature of the problem. Thanks to the use of the the so-called Connectionist Temporal Classification loss function, these models can be directly trained from input images accompanied by their corresponding transcripts into music symbol sequences. We also present the Printed Music Scores dataset, containing more than 80,000 monodic single-staff real scores in common western notation, that is used to train and evaluate the neural approach. In our experiments, it is demonstrated that this formulation can be carried out successfully. Additionally, we study several considerations about the codification of the output musical sequences, the convergence and scalability of the neural models, as well as the ability of this approach to locate symbols in the input score. |
Patrocinadors: | This work was supported by the Social Sciences and Humanities Research Council of Canada, and the Spanish Ministerio de Economía y Competitividad through Project HISPAMUS Ref. No. TIN2017-86576-R (supported by UE FEDER funds). |
URI: | http://hdl.handle.net/10045/74947 |
ISSN: | 2076-3417 |
DOI: | 10.3390/app8040606 |
Idioma: | eng |
Tipus: | info:eu-repo/semantics/article |
Drets: | © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
Revisió científica: | si |
Versió de l'editor: | https://doi.org/10.3390/app8040606 |
Apareix a la col·lecció: | INV - GRFIA - Artículos de Revistas |
Arxius per aquest ítem:
Arxiu | Descripció | Tamany | Format | |
---|---|---|---|---|
2018_Calvo_Rizo_ApplSci.pdf | 3,68 MB | Adobe PDF | Obrir Vista prèvia | |
Aquest ítem està subjecte a una llicència de Creative Commons Llicència Creative Commons