Cluster analysis is a broadly used unsupervised data analysis technique for finding groups of homogeneous units in a data set. Probabilistic distance clustering adjusted for cluster size (PDQ), discussed in this contribution, falls within the broad category of clustering methods initially developed to deal with continuous data; it has the advantage of fuzzy membership and robustness. However, a common issue in clustering deals with treating mixed-type data: continuous and categorical, which are among the most common types of data. This paper extends PDQ for mixed-type data using different dissimilarities for different kinds of variables. At first, the PDQ for mixed-type data is defined, and then a simulation design shows its advantages compared to some state-of-the-art techniques. Ultimately, it is used on a real data set. The conclusion includes some future developments.
Clustering mixed-type data using a probabilistic distance algorithm / Tortora, Cristina; Palumbo, Francesco. - In: APPLIED SOFT COMPUTING. - ISSN 1568-4946. - 130:(2022). [10.1016/j.asoc.2022.109704]
Clustering mixed-type data using a probabilistic distance algorithm
Palumbo, FrancescoSecondo
2022
Abstract
Cluster analysis is a broadly used unsupervised data analysis technique for finding groups of homogeneous units in a data set. Probabilistic distance clustering adjusted for cluster size (PDQ), discussed in this contribution, falls within the broad category of clustering methods initially developed to deal with continuous data; it has the advantage of fuzzy membership and robustness. However, a common issue in clustering deals with treating mixed-type data: continuous and categorical, which are among the most common types of data. This paper extends PDQ for mixed-type data using different dissimilarities for different kinds of variables. At first, the PDQ for mixed-type data is defined, and then a simulation design shows its advantages compared to some state-of-the-art techniques. Ultimately, it is used on a real data set. The conclusion includes some future developments.| File | Dimensione | Formato | |
|---|---|---|---|
|
1-s2.0-S1568494622007530-main.pdf
accesso aperto
Tipologia:
Versione Editoriale (PDF)
Licenza:
Copyright dell'editore
Dimensione
1.31 MB
Formato
Adobe PDF
|
1.31 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


