A new dissimilarity measure for cluster analysis is presented and used in the context of probabilistic distance (PD) cluster-ing. The basic assumption of PD-clustering is that for each unit, the product between the probability of the unit belonging to a cluster and the distance between the unit and the cluster is constant. This constant is a measure of the classifiability of the point, and the sum of the constant over units is called joint distance function (JDF). The parameters that minimize the JDF maximize the classifiability of the units. The new dissimilarity measure is based on the use of symmetric density functions and allows the method to find clusters characterized by different variances and correlation among variables. The multivariate Gaussian and the multivariate Student-t distributions have been used, outperforming classical PD clustering, and its variation PD clustering adjusted for cluster size, on simulated and real datasets.

A Probabilistic Distance Clustering Algorithm Using Gaussian and Student-t Multivariate Density Distributions / Tortora, Cristina; Mcnicholas, Paul D.; Palumbo, Francesco. - In: SN COMPUTER SCIENCE. - ISSN 2661-8907. - 1:2(2020), pp. 64-85. [10.1007/s42979-020-0067-z]

A Probabilistic Distance Clustering Algorithm Using Gaussian and Student-t Multivariate Density Distributions

Palumbo, Francesco
Ultimo
Methodology
2020

Abstract

A new dissimilarity measure for cluster analysis is presented and used in the context of probabilistic distance (PD) cluster-ing. The basic assumption of PD-clustering is that for each unit, the product between the probability of the unit belonging to a cluster and the distance between the unit and the cluster is constant. This constant is a measure of the classifiability of the point, and the sum of the constant over units is called joint distance function (JDF). The parameters that minimize the JDF maximize the classifiability of the units. The new dissimilarity measure is based on the use of symmetric density functions and allows the method to find clusters characterized by different variances and correlation among variables. The multivariate Gaussian and the multivariate Student-t distributions have been used, outperforming classical PD clustering, and its variation PD clustering adjusted for cluster size, on simulated and real datasets.
2020
A Probabilistic Distance Clustering Algorithm Using Gaussian and Student-t Multivariate Density Distributions / Tortora, Cristina; Mcnicholas, Paul D.; Palumbo, Francesco. - In: SN COMPUTER SCIENCE. - ISSN 2661-8907. - 1:2(2020), pp. 64-85. [10.1007/s42979-020-0067-z]
File in questo prodotto:
File Dimensione Formato  
A Probabilistic Distance Clustering Algorithm Using Gaussian_Proof.pdf

accesso aperto

Tipologia: Documento in Pre-print
Licenza: Accesso privato/ristretto
Dimensione 4.98 MB
Formato Adobe PDF
4.98 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/847121
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? ND
social impact