In modern applications, such as text mining and signal processing, large amounts of categorical data are produced at a high rate and are characterized by association structures changing over time. Multiple correspondence analysis (MCA) is a well established dimension reduction method to explore the associations within a set of categorical variables. A critical step of the MCA algorithm is a singular value decomposition (SVD) or an eigenvalue decomposition (EVD) of a suitably transformed matrix. The high computational and memory requirements of ordinary SVD and EVD make their application impractical on massive or sequential data sets. Several enhanced SVD/EVD approaches have been recently introduced in an effort to overcome these issues. The aim of the present contribution is twofold: (1) to extend MCA to a split-apply-combine framework, that leads to an exact and parallel MCA implementation; (2) to allow for incremental updates (downdates) of existing MCA solutions, which lead to an approximate yet highly accurate solution. For this purpose, two incremental EVD and SVD approaches with desirable properties are revised and embedded in the context of MCA.

Low-dimensional tracking of association structures in categorical data / IODICE D'ENZA, Alfonso; Markos, Angelos. - In: STATISTICS AND COMPUTING. - ISSN 1573-1375. - 25:5(2015), pp. 1009-1022. [10.1007/s11222-014-9470-4]

Low-dimensional tracking of association structures in categorical data

Iodice D'Enza Alfonso
;
2015

Abstract

In modern applications, such as text mining and signal processing, large amounts of categorical data are produced at a high rate and are characterized by association structures changing over time. Multiple correspondence analysis (MCA) is a well established dimension reduction method to explore the associations within a set of categorical variables. A critical step of the MCA algorithm is a singular value decomposition (SVD) or an eigenvalue decomposition (EVD) of a suitably transformed matrix. The high computational and memory requirements of ordinary SVD and EVD make their application impractical on massive or sequential data sets. Several enhanced SVD/EVD approaches have been recently introduced in an effort to overcome these issues. The aim of the present contribution is twofold: (1) to extend MCA to a split-apply-combine framework, that leads to an exact and parallel MCA implementation; (2) to allow for incremental updates (downdates) of existing MCA solutions, which lead to an approximate yet highly accurate solution. For this purpose, two incremental EVD and SVD approaches with desirable properties are revised and embedded in the context of MCA.
2015
Low-dimensional tracking of association structures in categorical data / IODICE D'ENZA, Alfonso; Markos, Angelos. - In: STATISTICS AND COMPUTING. - ISSN 1573-1375. - 25:5(2015), pp. 1009-1022. [10.1007/s11222-014-9470-4]
File in questo prodotto:
File Dimensione Formato  
08_STCO_Low_dim_track.pdf

solo utenti autorizzati

Tipologia: Documento in Post-print
Licenza: Non specificato
Dimensione 3.59 MB
Formato Adobe PDF
3.59 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/744212
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 4
social impact