Contextual word embeddings for tabular data search and integration
Por favor, use este identificador para citar o enlazar este ítem:
http://hdl.handle.net/10045/130001
Registro completo de metadatos
Campo DC | Valor | Idioma |
---|---|---|
dc.contributor | Procesamiento del Lenguaje y Sistemas de Información (GPLSI) | es_ES |
dc.contributor | Web and Knowledge (WaKe) | es_ES |
dc.contributor.author | Pilaluisa, José | - |
dc.contributor.author | Tomás, David | - |
dc.contributor.author | Navarro Colorado, Borja | - |
dc.contributor.author | Mazón, Jose-Norberto | - |
dc.contributor.other | Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos | es_ES |
dc.date.accessioned | 2022-12-01T07:53:16Z | - |
dc.date.available | 2022-12-01T07:53:16Z | - |
dc.date.issued | 2022-11-30 | - |
dc.identifier.citation | Neural Computing and Applications. 2023, 35: 9319-9333. https://doi.org/10.1007/s00521-022-08066-8 | es_ES |
dc.identifier.issn | 0941-0643 (Print) | - |
dc.identifier.issn | 1433-3058 (Online) | - |
dc.identifier.uri | http://hdl.handle.net/10045/130001 | - |
dc.description.abstract | This paper presents a new approach to retrieve and further integrate tabular datasets (collections of rows and columns) using union and join operations. In this work, both processes were carried out using a similarity measure based on contextual word embeddings, which allows finding semantically similar tables and overcome the recall problem of lexical approaches based on string similarity. This work is the first attempt to use contextual word embeddings in the whole pipeline of table search and integration, including for the first time their use in the join operation. A comprehensive analysis of their performance was carried out on both retrieving and integrating tabular datasets, comparing them with context-free models. Column headings and cell values were used as contextual information and their impact on each task was evaluated. The results revealed that contextual models significantly outperform context-free models and a traditional weighting schema in ad hoc table retrieval. In the data integration task, contextual models also improved the results on union operation compared to context-free approaches. | es_ES |
dc.description.sponsorship | Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This research has been partially funded by project “Desarrollo de un ecosistema de datos abiertos para transformar el sector turístico” (GVA-COVID19/2021/103) funded by Conselleria de Innovación, Universidades, Ciencia y Sociedad Digital de la Generalitat Valenciana (Spain); and by projects “CHAN-TWIN” (TED2021-130890B-C21), “COnscious natuRal TEXt generation (CORTEX)” (PID2021-123956OB-I00) and “Technological Resources for Intelligent VIral AnaLysis through NLP (TRIVIAL)” (PID2021-122263OB-C22), funded by MCIN/AEI/ 10.13039/501100011033 and by the European Union NextGenerationEU/PRTR. | es_ES |
dc.language | eng | es_ES |
dc.publisher | Springer Nature | es_ES |
dc.rights | © The Author(s) 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. | es_ES |
dc.subject | Tabular data | es_ES |
dc.subject | Contextual word embedding | es_ES |
dc.subject | Information search | es_ES |
dc.subject | Data integration | es_ES |
dc.subject | Open data | es_ES |
dc.title | Contextual word embeddings for tabular data search and integration | es_ES |
dc.type | info:eu-repo/semantics/article | es_ES |
dc.peerreviewed | si | es_ES |
dc.identifier.doi | 10.1007/s00521-022-08066-8 | - |
dc.relation.publisherversion | https://doi.org/10.1007/s00521-022-08066-8 | es_ES |
dc.rights.accessRights | info:eu-repo/semantics/openAccess | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/TED2021-130890B-C21 | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2021-123956OB-I00 | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2021-122263OB-C22 | es_ES |
Aparece en las colecciones: | INV - GPLSI - Artículos de Revistas INV - WaKe - Artículos de Revistas |
Archivos en este ítem:
Archivo | Descripción | Tamaño | Formato | |
---|---|---|---|---|
Pilaluisa_etal_2023_NeuralComputApplic.pdf | 549,63 kB | Adobe PDF | Abrir Vista previa | |
Este ítem está licenciado bajo Licencia Creative Commons