A Data Analytics Methodology to Visually Analyze the impact of Bias and Rebalancing
Empreu sempre aquest identificador per citar o enllaçar aquest ítem
http://hdl.handle.net/10045/134700
Títol: | A Data Analytics Methodology to Visually Analyze the impact of Bias and Rebalancing |
---|---|
Autors: | Lavalle, Ana | Maté, Alejandro | Trujillo, Juan | Teruel, Miguel A. |
Grups d'investigació o GITE: | Lucentia |
Centre, Departament o Servei: | Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos |
Paraules clau: | Data Analytics | Data Bias | Data Visualization | Model-driven development | Requirements Engineering | Artificial Intelligence |
Data de publicació: | 24-de maig-2023 |
Editor: | IEEE |
Citació bibliogràfica: | IEEE Access. 2023, 11: 56691-56702. https://doi.org/10.1109/ACCESS.2023.3279732 |
Resum: | Data Analytics have become a key component of many business processes which influence several aspects of our daily life. Indeed, any misinterpretation or flaw in the outputs of Data Analytics results can cause significant damage, specialy when dealing with one of the most often overlooked issues, namely the unaware use of biased data. When data bias goes unadverted, it warps the meaning of data, having a devastating effect on Data Analytics results. Although it is widely argued that the most common manner to deal with data bias is to rebalance biased datasets, it is not an aseptic transformation, leading to several potentially undesired side-effects that will probably harm the result of data analyses. Therefore, in order to analyze the underlying bias in datasets, in this work we present (i) a comprehensive methodology based on visualization techniques, which assists users in the definition of their analytical requirements to detect and visually represent the data bias automatically helping them to find out whether it is appropriate to artificially rebalance their dataset or not; (ii) a novel metamodel for visually representing bias; (iii) a motivating real-world running example used to analyze the impact of bias in Data Analytics and (iv) an assessment of the improvements introduced by our proposal through a complete real-world case study by using a Fire Department Calls for Service dataset, thus demonstrating that rebalancing datasets is not always the best option. It is crucial to study the context where the decisions are going to be taken. Moreover, it is also important to do a pre-analysis with the aim of knowing the nature of the datasets and how biased they are. |
Patrocinadors: | This work has been co-funded by the AETHER-UA project (PID2020-112540RB-C43) funded by Spanish Ministry of Science and Innovation and the BALLADEER (PROMETEO /2021/088) project funded by the Conselleria de Innovación, Universidades, Ciencia y Sociedad Digital (Generalitat Valenciana). |
URI: | http://hdl.handle.net/10045/134700 |
ISSN: | 2169-3536 |
DOI: | 10.1109/ACCESS.2023.3279732 |
Idioma: | eng |
Tipus: | info:eu-repo/semantics/article |
Drets: | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ |
Revisió científica: | si |
Versió de l'editor: | https://doi.org/10.1109/ACCESS.2023.3279732 |
Apareix a la col·lecció: | INV - LUCENTIA - Artículos de Revistas |
Arxius per aquest ítem:
Arxiu | Descripció | Tamany | Format | |
---|---|---|---|---|
Lavalle_etal_2023_IEEEAccess.pdf | 1,89 MB | Adobe PDF | Obrir Vista prèvia | |
Aquest ítem està subjecte a una llicència de Creative Commons Llicència Creative Commons