In this study the behaviour of several classifier predictive measures is investigated under different conditions of class imbalance and classification hardness. The investigation has been conducted through an extensive comparative analysis where several classification algorithms (i.e. 8 algorithm-level methods and 4 hybrid methods) have been applied to artificial data sets generated for multi-classification problems in multi-dimensional space and covering a wide range of class imbalance and instance hardness levels. Specifically, the data generation process has been controlled through a set of properties providing the characteristics of the generated data (i.e., number of attributes, p= 3, 5, 7; number of classes, k= 2, 3, 5; class frequency distributions, representing 6 increasing levels of Imbalance Ratio; instance type frequency distributions, representing 4 increasing levels of instance hardness). Study results highlight that, although the investigated performance measures quite agree for easy classification tasks (i.e. with balanced datasets containing only easy-to-classify instances), their behaviour significantly differs when dealing with difficult classification tasks (i.e. increasing class imbalance and instance hardness) which is a rule in many real-word classification problems.
The behaviour of classifier performance measures when dealing with class imbalance and instance hardness / Vanacore, Amalia; Ciardiello, Armando. - (2023), pp. 38-38. ( International Conference on Data Science ICDS 2023 - Multidimensional Perspectives: From Statistical Learning to Data Science Applications Santiago, Chile November 8-10, 2023).
The behaviour of classifier performance measures when dealing with class imbalance and instance hardness.
amalia vanacore
Primo
;ARMANDO CIARDIELLOSecondo
2023
Abstract
In this study the behaviour of several classifier predictive measures is investigated under different conditions of class imbalance and classification hardness. The investigation has been conducted through an extensive comparative analysis where several classification algorithms (i.e. 8 algorithm-level methods and 4 hybrid methods) have been applied to artificial data sets generated for multi-classification problems in multi-dimensional space and covering a wide range of class imbalance and instance hardness levels. Specifically, the data generation process has been controlled through a set of properties providing the characteristics of the generated data (i.e., number of attributes, p= 3, 5, 7; number of classes, k= 2, 3, 5; class frequency distributions, representing 6 increasing levels of Imbalance Ratio; instance type frequency distributions, representing 4 increasing levels of instance hardness). Study results highlight that, although the investigated performance measures quite agree for easy classification tasks (i.e. with balanced datasets containing only easy-to-classify instances), their behaviour significantly differs when dealing with difficult classification tasks (i.e. increasing class imbalance and instance hardness) which is a rule in many real-word classification problems.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


