The present proposal deals with high-dimensional binary data collected in different occasions in time or space. Studying the associations of data collected at different occasions, a primary aim is to detect changes in the association structure from one occasion to another. A suitable exploratory technique for the analysis of multiple associations in high-dimensional data is the multiple correspondence analysis (MCA; Greenacre, 2007). However, the comparison of MCA factorial displays referring to different occasions is meaningless. A possible solution to link the association structures of different data batches is to start from an MCA display of a reference and incrementally update the solution with further batches (Iodice D'Enza and Greenacre, 2010). This approach, does not take into account the presence of a cluster structure in the set of statistical units. This contribution intend to present an approach that, through the combination of clustering and factorial techniques, aims to visualize the evolution of the association structure of binary attributes over different data batches. The proposal is to introduce a latent categorical variable which is determined and updated at each incoming batch; in other words this variable is determined according to the association structure and represents the 'link' among the solutions. The latent categorical variable is endogenously determined by the procedure; in particular, it refers to the cluster structure characterizing the data set in question. A starting solution is updated incrementally as new data sets are analysed. The factorial display will describe the patterns of change in the multiple associations when shifting the analysis from one occasion to the other. Procedures suitably combining clustering with factorial analysis techniques have been proposed. Vichi and Kiers (2001) propose a combination of principal component analysis (PCA) with k-means clustering method. In the framework of categorical data, another interesting approach combining clustering and multiple correspondence analysis (MCA) is proposed by Hwang et al. (2006). Similarly, yet dealing with binary data, Palumbo and Iodice D'Enza (2010) propose a suitable dimension reduction and clustering. The present proposal is an enhancement of the latter approach to the comparative analysis of multiple batches.

Dynamic Visualization of Changes in Association Patterns

PALUMBO, FRANCESCO;A. IODICE D'ENZA
2011

Abstract

The present proposal deals with high-dimensional binary data collected in different occasions in time or space. Studying the associations of data collected at different occasions, a primary aim is to detect changes in the association structure from one occasion to another. A suitable exploratory technique for the analysis of multiple associations in high-dimensional data is the multiple correspondence analysis (MCA; Greenacre, 2007). However, the comparison of MCA factorial displays referring to different occasions is meaningless. A possible solution to link the association structures of different data batches is to start from an MCA display of a reference and incrementally update the solution with further batches (Iodice D'Enza and Greenacre, 2010). This approach, does not take into account the presence of a cluster structure in the set of statistical units. This contribution intend to present an approach that, through the combination of clustering and factorial techniques, aims to visualize the evolution of the association structure of binary attributes over different data batches. The proposal is to introduce a latent categorical variable which is determined and updated at each incoming batch; in other words this variable is determined according to the association structure and represents the 'link' among the solutions. The latent categorical variable is endogenously determined by the procedure; in particular, it refers to the cluster structure characterizing the data set in question. A starting solution is updated incrementally as new data sets are analysed. The factorial display will describe the patterns of change in the multiple associations when shifting the analysis from one occasion to the other. Procedures suitably combining clustering with factorial analysis techniques have been proposed. Vichi and Kiers (2001) propose a combination of principal component analysis (PCA) with k-means clustering method. In the framework of categorical data, another interesting approach combining clustering and multiple correspondence analysis (MCA) is proposed by Hwang et al. (2006). Similarly, yet dealing with binary data, Palumbo and Iodice D'Enza (2010) propose a suitable dimension reduction and clustering. The present proposal is an enhancement of the latter approach to the comparative analysis of multiple batches.
File in questo prodotto:
File Dimensione Formato  
950873(1).pdf

accesso aperto

Tipologia: Documento in Post-print
Licenza: Dominio pubblico
Dimensione 140.39 kB
Formato Adobe PDF
140.39 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/423272
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact