Factor clustering methods have been developed in recent years thanks to improvements in computational power. These methods perform a linear transformation of data and a clustering of the transformed data, optimizing a common criterion. Probabilistic distance (PD)-clustering is an iterative, distribution free, probabilistic clustering method. Factor PD-clustering (FPDC) is based on PD-clustering and involves a linear transformation of the original variables into a reduced number of orthogonal ones using a common criterion with PD-clustering. This paper demonstrates that Tucker3 decomposition can be used to accomplish this transformation. Factor PD-clustering alternatingly exploits Tucker3 decomposition and PD-clustering on transformed data until convergence is achieved. This method can significantly improve the PD-clustering algorithm performance; large data sets can thus be partitioned into clusters with increasing stability and robustness of the results. Real and simulated data sets are used to compare FPDC with its main competitors, where it performs equally well when clusters are elliptically shaped but outperforms its competitors with non-Gaussian shaped clusters or noisy data.
Factor probabilistic distance clustering (FPDC): a new clustering method / Tortora, Cristina; Mireille, Gettler Summa; Marino, Marina; Palumbo, Francesco. - In: ADVANCES IN DATA ANALYSIS AND CLASSIFICATION. - ISSN 1862-5347. - 10:4(2016), pp. 441-464. [10.1007/s11634-015-0219-5]
Factor probabilistic distance clustering (FPDC): a new clustering method
MARINO, MARINA;PALUMBO, FRANCESCO
2016
Abstract
Factor clustering methods have been developed in recent years thanks to improvements in computational power. These methods perform a linear transformation of data and a clustering of the transformed data, optimizing a common criterion. Probabilistic distance (PD)-clustering is an iterative, distribution free, probabilistic clustering method. Factor PD-clustering (FPDC) is based on PD-clustering and involves a linear transformation of the original variables into a reduced number of orthogonal ones using a common criterion with PD-clustering. This paper demonstrates that Tucker3 decomposition can be used to accomplish this transformation. Factor PD-clustering alternatingly exploits Tucker3 decomposition and PD-clustering on transformed data until convergence is achieved. This method can significantly improve the PD-clustering algorithm performance; large data sets can thus be partitioned into clusters with increasing stability and robustness of the results. Real and simulated data sets are used to compare FPDC with its main competitors, where it performs equally well when clusters are elliptically shaped but outperforms its competitors with non-Gaussian shaped clusters or noisy data.File | Dimensione | Formato | |
---|---|---|---|
10.1007_s11634-015-0219-5(1).pdf
solo utenti autorizzati
Tipologia:
Documento in Post-print
Licenza:
Accesso privato/ristretto
Dimensione
1.28 MB
Formato
Adobe PDF
|
1.28 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.