Tabular open government data search for data spaces based on word embeddings

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/133987
Registro completo de metadatos
Registro completo de metadatos
Campo DCValorIdioma
dc.contributorProcesamiento del Lenguaje y Sistemas de Información (GPLSI)es_ES
dc.contributorWeb and Knowledge (WaKe)es_ES
dc.contributor.authorBerenguer, Alberto-
dc.contributor.authorTomás, David-
dc.contributor.authorMazón, Jose-Norberto-
dc.contributor.otherUniversidad de Alicante. Departamento de Lenguajes y Sistemas Informáticoses_ES
dc.date.accessioned2023-05-02T09:12:01Z-
dc.date.available2023-05-02T09:12:01Z-
dc.date.issued2023-04-04-
dc.identifier.citationProceedings of the 25th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data (DOLAP), co-located with the 26th International Conference on Extending Database Technology and the 26th International Conference on Database Theory (EDBT/ICDT 2023), Ioannina, Greece, March 28, 2023. CEUR Workshop Proceedings, Vol-3369, 61-70es_ES
dc.identifier.issn1613-0073-
dc.identifier.urihttp://hdl.handle.net/10045/133987-
dc.description.abstractNowadays, data spaces are envisioned as a prominent mechanism for data sharing, boosting growth and creating value. Open government data providers should be considered as important participants in data space reference infrastructures, since open data portal initiatives are adopted by most of the governments to supply their public sector information. However, open data is mostly published in the form of tabular data such as spreadsheets or CSV files. Therefore, reusing open data in data space is challenging due to the friction that may occur when combining the use of data shared in data spaces and the use of tabular data published in open government portals. To alleviate this situation, tabular open data search engines can be a promising solution. Actually, most open data portals allow reusers to retrieve and federate tabular open data by means of a keyword-based search engine over metadata. Unfortunately, these search engines rely on the (not so often good enough) metadata quality, which must be complete, descriptive, and representative of the content. Moreover, keyword-based search is not always an adequate solution for retrieving open data, since it does not consider their tabular nature and search results can be useless for reusers (e.g., when they attempt to find data useful for extending rows or columns of a given tabular dataset). To overcome these problems, this paper presents an approach that uses word embeddings for tabular open data search based on unionability and joinability. Our approach could be seamlessly integrated in a data space infrastructure. A prototype of our approach has been developed. Finally, both, an intrinsic and an extrinsic evaluation with end users, have been carried out.es_ES
dc.description.sponsorshipThis work is part of the project TED2021-130890B-C21, funded by MCIN/AEI/10.13039/501100011033 and by the European Union NextGenerationEU/PRTR. Also, this work is partially funded by GVA-COVID19/2021/103 project from “Conselleria de Innovación, Universidades, Ciencia y Sociedad Digital de la Generalitat Valenciana”. Alberto Berenguer has a contract for predoctoral training with the “Generalitat Valenciana” and the European Social Fund, funded by the grant ACIF/2021/507.es_ES
dc.languageenges_ES
dc.publisherCEURes_ES
dc.rights© 2023 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).es_ES
dc.subjectOpen government dataes_ES
dc.subjectTabular data searches_ES
dc.subjectData spaceses_ES
dc.subjectWord embeddingses_ES
dc.titleTabular open government data search for data spaces based on word embeddingses_ES
dc.typeinfo:eu-repo/semantics/conferenceObjectes_ES
dc.peerreviewedsies_ES
dc.relation.publisherversionhttps://ceur-ws.org/Vol-3369/es_ES
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses_ES
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/TED2021-130890B-C21es_ES
Aparece en las colecciones:INV - WaKe - Comunicaciones a Congresos, Conferencias, etc.
INV - GPLSI - Comunicaciones a Congresos, Conferencias, etc.

Archivos en este ítem:
Archivos en este ítem:
Archivo Descripción TamañoFormato 
ThumbnailBerenguer_etal_2023_CEUR.pdf1,02 MBAdobe PDFAbrir Vista previa


Este ítem está licenciado bajo Licencia Creative Commons Creative Commons