Cluster analysis is a broadly used unsupervised data analysis technique for finding groups of homogeneous units in a data set. Probabilistic distance clustering adjusted for cluster size (PDQ), discussed in this contribution, falls within the broad category of clustering methods initially developed to deal with continuous data; it has the advantage of fuzzy membership and robustness. However, a common issue in clustering deals with treating mixed-type data: continuous and categorical, which are among the most common types of data. This paper extends PDQ for mixed-type data using different dissimilarities for different kinds of variables. At first, the PDQ for mixed-type data is defined, and then a simulation design shows its advantages compared to some state-of-the-art techniques. Ultimately, it is used on a real data set. The conclusion includes some future developments.

Clustering mixed-type data using a probabilistic distance algorithm / Tortora, Cristina; Palumbo, Francesco. - In: APPLIED SOFT COMPUTING. - ISSN 1568-4946. - 130:(2022). [10.1016/j.asoc.2022.109704]

Clustering mixed-type data using a probabilistic distance algorithm

Palumbo, Francesco
Secondo
2022

Abstract

Cluster analysis is a broadly used unsupervised data analysis technique for finding groups of homogeneous units in a data set. Probabilistic distance clustering adjusted for cluster size (PDQ), discussed in this contribution, falls within the broad category of clustering methods initially developed to deal with continuous data; it has the advantage of fuzzy membership and robustness. However, a common issue in clustering deals with treating mixed-type data: continuous and categorical, which are among the most common types of data. This paper extends PDQ for mixed-type data using different dissimilarities for different kinds of variables. At first, the PDQ for mixed-type data is defined, and then a simulation design shows its advantages compared to some state-of-the-art techniques. Ultimately, it is used on a real data set. The conclusion includes some future developments.
2022
Clustering mixed-type data using a probabilistic distance algorithm / Tortora, Cristina; Palumbo, Francesco. - In: APPLIED SOFT COMPUTING. - ISSN 1568-4946. - 130:(2022). [10.1016/j.asoc.2022.109704]
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S1568494622007530-main.pdf

accesso aperto

Tipologia: Versione Editoriale (PDF)
Licenza: Copyright dell'editore
Dimensione 1.31 MB
Formato Adobe PDF
1.31 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/988472
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 12
  • ???jsp.display-item.citation.isi??? 12
social impact