Factor PD-Clustering

Tortora, Cristina; Mireille Gettler Summa,; Palumbo, Francesco

doi:10.1007/978-3-319-00035-0_11

Probabilistic Distance (PD) Clustering is a non parametric probabilistic method to find homogeneous groups in multivariate datasets with J variables and n units. PD Clustering runs on an iterative algorithm and looks for a set of K group centers, maximising the empirical probabilities of belonging to a cluster of the n statistical units. As J becomes large the solution tends to become unstable. This paper extends the PD-Clustering to the context of Factorial clustering methods and shows that Tucker3 decomposition is a consistent transformation to project original data in a subspace defined according to the same PD-Clustering criterion. The method consists of a two step iterative procedure: a linear transformation of the initial data and PD-clustering on the transformed data. The integration of the PD Clustering and the Tucker3 factorial step makes the clustering more stable and lets us consider datasets with large J and let us use it in case of clusters not having elliptical form.

Factor PD-Clustering / Cristina, T., Mireille Gettler, S., Palumbo, F.. - (2013), pp. 115-123. [10.1007/978-3-319-00035-0_11]