Bilingual dictionary generation and enrichment via graph exploration

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/127200
Información del item - Informació de l'item - Item information
Título: Bilingual dictionary generation and enrichment via graph exploration
Autor/es: Goel, Shashwata | Gracia, Jorge | Forcada, Mikel L.
Grupo/s de investigación o GITE: Transducens
Centro, Departamento o Servicio: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos
Palabras clave: Bilingual dictionaries | RDF | Apertium | Graph | Linguistic linked data | Evaluation methods | Polysemy
Fecha de publicación: 7-sep-2022
Editor: IOS Press
Cita bibliográfica: Semantic Web. 2022, 13(6): 1103-1132. https://doi.org/10.3233/SW-222899
Resumen: In recent years, we have witnessed a steady growth of linguistic information represented and exposed as linked data on the Web. Such linguistic linked data have stimulated the development and use of openly available linguistic knowledge graphs, as is the case with the Apertium RDF, a collection of interconnected bilingual dictionaries represented and accessible through Semantic Web standards. In this work, we explore techniques that exploit the graph nature of bilingual dictionaries to automatically infer new links (translations). We build upon a cycle density based method: partitioning the graph into biconnected components for a speed-up, and simplifying the pipeline through a careful structural analysis that reduces hyperparameter tuning requirements. We also analyse the shortcomings of traditional evaluation metrics used for translation inference and propose to complement them with new ones, both-word precision (BWP) and both-word recall (BWR), aimed at being more informative of algorithmic improvements. Over twenty-seven language pairs, our algorithm produces dictionaries about 70% the size of existing Apertium RDF dictionaries at a high BWP of 85% from scratch within a minute. Human evaluation shows that 78% of the additional translations generated for dictionary enrichment are correct as well. We further describe an interesting use-case: inferring synonyms within a single language, on which our initial human-based evaluation shows an average accuracy of 84%. We release our tool as free/open-source software which can not only be applied to RDF data and Apertium dictionaries, but is also easily usable for other formats and communities.
Patrocinador/es: This work was partially funded by the Prêt-à-LLOD project within the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 825182. This article is also based upon work from COST Action CA18209 NexusLinguarum, “European network for Web-centred linguistic data science”, supported by COST (European Cooperation in Science and Technology). It has been also partially supported by the Spanish projects TIN2016-78011-C4-3-R and PID2020-113903RB-I00 (AEI/FEDER, UE), by DGA/FEDER, and by the Agencia Estatal de Investigación of the Spanish Ministry of Economy and Competitiveness and the European Social Fund through the “Ramón y Cajal” program (RYC2019-028112-I).
URI: http://hdl.handle.net/10045/127200
ISSN: 1570-0844 (Print) | 2210-4968 (Online)
DOI: 10.3233/SW-222899
Idioma: eng
Tipo: info:eu-repo/semantics/article
Derechos: © 2022 – The authors. Published by IOS Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0).
Revisión científica: si
Versión del editor: https://doi.org/10.3233/SW-222899
Aparece en las colecciones:Investigaciones financiadas por la UE
INV - TRANSDUCENS - Artículos de Revistas

Archivos en este ítem:
Archivos en este ítem:
Archivo Descripción TamañoFormato 
ThumbnailGoel_etal_2022_SemanticWeb.pdf691,7 kBAdobe PDFAbrir Vista previa


Este ítem está licenciado bajo Licencia Creative Commons Creative Commons