Joint data reduction (JDR) methods consist of a combination of well established unsupervised techniques such as dimension reduction and clustering. Distance-based clustering of high dimensional data sets can be problematic because of the well-known curse of dimensionality. To tackle this issue, practitioners use a principal component method first, in order to reduce dimensionality of the data, and then apply a clustering procedure on the obtained factor scores. JDR methods have proven to outperform such sequential (tandem) approaches, both in case of continuous and categorical data sets. Over time, several JDR methods followed by extensions, generalizations and modifications have been proposed, appraised both theoretically and empirically by researchers. Some aspects, however, are still worth further investigation, such as i) the presence of mixed continuous and categorical variables; ii) outliers undermining the identification of the clustering structure. In this paper, we propose a JDR method for mixed data: the method in question is built upon existing continuous-only and categorical-only JDR methods. Also, we appraise the sensitivity of theproposed method to the presence of outliers.

Issues in Joint Dimension Reduction and Clustering / van de Velden, Michel; IODICE D'ENZA, Alfonso; Markos, Angelos. - (2018), pp. 1-6.

Issues in Joint Dimension Reduction and Clustering

Iodice D'Enza Alfonso
;
2018

Abstract

Joint data reduction (JDR) methods consist of a combination of well established unsupervised techniques such as dimension reduction and clustering. Distance-based clustering of high dimensional data sets can be problematic because of the well-known curse of dimensionality. To tackle this issue, practitioners use a principal component method first, in order to reduce dimensionality of the data, and then apply a clustering procedure on the obtained factor scores. JDR methods have proven to outperform such sequential (tandem) approaches, both in case of continuous and categorical data sets. Over time, several JDR methods followed by extensions, generalizations and modifications have been proposed, appraised both theoretically and empirically by researchers. Some aspects, however, are still worth further investigation, such as i) the presence of mixed continuous and categorical variables; ii) outliers undermining the identification of the clustering structure. In this paper, we propose a JDR method for mixed data: the method in question is built upon existing continuous-only and categorical-only JDR methods. Also, we appraise the sensitivity of theproposed method to the presence of outliers.
2018
9788891910233
Issues in Joint Dimension Reduction and Clustering / van de Velden, Michel; IODICE D'ENZA, Alfonso; Markos, Angelos. - (2018), pp. 1-6.
File in questo prodotto:
File Dimensione Formato  
issues_in_JDR.pdf

accesso aperto

Tipologia: Versione Editoriale (PDF)
Licenza: Non specificato
Dimensione 735.72 kB
Formato Adobe PDF
735.72 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/741478
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact