Contextual word embeddings for tabular data search and integration

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/130001
Registro completo de metadatos
Registro completo de metadatos
Campo DCValorIdioma
dc.contributorProcesamiento del Lenguaje y Sistemas de Información (GPLSI)es_ES
dc.contributorWeb and Knowledge (WaKe)es_ES
dc.contributor.authorPilaluisa, José-
dc.contributor.authorTomás, David-
dc.contributor.authorNavarro Colorado, Borja-
dc.contributor.authorMazón, Jose-Norberto-
dc.contributor.otherUniversidad de Alicante. Departamento de Lenguajes y Sistemas Informáticoses_ES
dc.date.accessioned2022-12-01T07:53:16Z-
dc.date.available2022-12-01T07:53:16Z-
dc.date.issued2022-11-30-
dc.identifier.citationNeural Computing and Applications. 2023, 35: 9319-9333. https://doi.org/10.1007/s00521-022-08066-8es_ES
dc.identifier.issn0941-0643 (Print)-
dc.identifier.issn1433-3058 (Online)-
dc.identifier.urihttp://hdl.handle.net/10045/130001-
dc.description.abstractThis paper presents a new approach to retrieve and further integrate tabular datasets (collections of rows and columns) using union and join operations. In this work, both processes were carried out using a similarity measure based on contextual word embeddings, which allows finding semantically similar tables and overcome the recall problem of lexical approaches based on string similarity. This work is the first attempt to use contextual word embeddings in the whole pipeline of table search and integration, including for the first time their use in the join operation. A comprehensive analysis of their performance was carried out on both retrieving and integrating tabular datasets, comparing them with context-free models. Column headings and cell values were used as contextual information and their impact on each task was evaluated. The results revealed that contextual models significantly outperform context-free models and a traditional weighting schema in ad hoc table retrieval. In the data integration task, contextual models also improved the results on union operation compared to context-free approaches.es_ES
dc.description.sponsorshipOpen Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This research has been partially funded by project “Desarrollo de un ecosistema de datos abiertos para transformar el sector turístico” (GVA-COVID19/2021/103) funded by Conselleria de Innovación, Universidades, Ciencia y Sociedad Digital de la Generalitat Valenciana (Spain); and by projects “CHAN-TWIN” (TED2021-130890B-C21), “COnscious natuRal TEXt generation (CORTEX)” (PID2021-123956OB-I00) and “Technological Resources for Intelligent VIral AnaLysis through NLP (TRIVIAL)” (PID2021-122263OB-C22), funded by MCIN/AEI/ 10.13039/501100011033 and by the European Union NextGenerationEU/PRTR.es_ES
dc.languageenges_ES
dc.publisherSpringer Naturees_ES
dc.rights© The Author(s) 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.es_ES
dc.subjectTabular dataes_ES
dc.subjectContextual word embeddinges_ES
dc.subjectInformation searches_ES
dc.subjectData integrationes_ES
dc.subjectOpen dataes_ES
dc.titleContextual word embeddings for tabular data search and integrationes_ES
dc.typeinfo:eu-repo/semantics/articlees_ES
dc.peerreviewedsies_ES
dc.identifier.doi10.1007/s00521-022-08066-8-
dc.relation.publisherversionhttps://doi.org/10.1007/s00521-022-08066-8es_ES
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses_ES
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/TED2021-130890B-C21es_ES
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2021-123956OB-I00es_ES
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2021-122263OB-C22es_ES
Aparece en las colecciones:INV - GPLSI - Artículos de Revistas
INV - WaKe - Artículos de Revistas

Archivos en este ítem:
Archivos en este ítem:
Archivo Descripción TamañoFormato 
ThumbnailPilaluisa_etal_2023_NeuralComputApplic.pdf549,63 kBAdobe PDFAbrir Vista previa


Este ítem está licenciado bajo Licencia Creative Commons Creative Commons