A Data Analytics Methodology to Visually Analyze the impact of Bias and Rebalancing

Empreu sempre aquest identificador per citar o enllaçar aquest ítem http://hdl.handle.net/10045/134700
Información del item - Informació de l'item - Item information
Títol: A Data Analytics Methodology to Visually Analyze the impact of Bias and Rebalancing
Autors: Lavalle, Ana | Maté, Alejandro | Trujillo, Juan | Teruel, Miguel A.
Grups d'investigació o GITE: Lucentia
Centre, Departament o Servei: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos
Paraules clau: Data Analytics | Data Bias | Data Visualization | Model-driven development | Requirements Engineering | Artificial Intelligence
Data de publicació: 24-de maig-2023
Editor: IEEE
Citació bibliogràfica: IEEE Access. 2023, 11: 56691-56702. https://doi.org/10.1109/ACCESS.2023.3279732
Resum: Data Analytics have become a key component of many business processes which influence several aspects of our daily life. Indeed, any misinterpretation or flaw in the outputs of Data Analytics results can cause significant damage, specialy when dealing with one of the most often overlooked issues, namely the unaware use of biased data. When data bias goes unadverted, it warps the meaning of data, having a devastating effect on Data Analytics results. Although it is widely argued that the most common manner to deal with data bias is to rebalance biased datasets, it is not an aseptic transformation, leading to several potentially undesired side-effects that will probably harm the result of data analyses. Therefore, in order to analyze the underlying bias in datasets, in this work we present (i) a comprehensive methodology based on visualization techniques, which assists users in the definition of their analytical requirements to detect and visually represent the data bias automatically helping them to find out whether it is appropriate to artificially rebalance their dataset or not; (ii) a novel metamodel for visually representing bias; (iii) a motivating real-world running example used to analyze the impact of bias in Data Analytics and (iv) an assessment of the improvements introduced by our proposal through a complete real-world case study by using a Fire Department Calls for Service dataset, thus demonstrating that rebalancing datasets is not always the best option. It is crucial to study the context where the decisions are going to be taken. Moreover, it is also important to do a pre-analysis with the aim of knowing the nature of the datasets and how biased they are.
Patrocinadors: This work has been co-funded by the AETHER-UA project (PID2020-112540RB-C43) funded by Spanish Ministry of Science and Innovation and the BALLADEER (PROMETEO /2021/088) project funded by the Conselleria de Innovación, Universidades, Ciencia y Sociedad Digital (Generalitat Valenciana).
URI: http://hdl.handle.net/10045/134700
ISSN: 2169-3536
DOI: 10.1109/ACCESS.2023.3279732
Idioma: eng
Tipus: info:eu-repo/semantics/article
Drets: This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
Revisió científica: si
Versió de l'editor: https://doi.org/10.1109/ACCESS.2023.3279732
Apareix a la col·lecció: INV - LUCENTIA - Artículos de Revistas

Arxius per aquest ítem:
Arxius per aquest ítem:
Arxiu Descripció Tamany Format  
ThumbnailLavalle_etal_2023_IEEEAccess.pdf1,89 MBAdobe PDFObrir Vista prèvia


Aquest ítem està subjecte a una llicència de Creative Commons Llicència Creative Commons Creative Commons