In modern applications, such as text mining and signal processing, large amounts of categorical data are produced at a high rate and are characterized by association structures changing over time. Multiple correspondence analysis (MCA) is a well established dimension reduction method to explore the associations within a set of categorical variables. A critical step of the MCA algorithm is a singular value decomposition (SVD) or an eigenvalue decomposition (EVD) of a suitably transformed matrix. The high computational and memory requirements of ordinary SVD and EVD make their application impractical on massive or sequential data sets. Several enhanced SVD/EVD approaches have been recently introduced in an effort to overcome these issues. The aim of the present contribution is twofold: (1) to extend MCA to a split-apply-combine framework, that leads to an exact and parallel MCA implementation; (2) to allow for incremental updates (downdates) of existing MCA solutions, which lead to an approximate yet highly accurate solution. For this purpose, two incremental EVD and SVD approaches with desirable properties are revised and embedded in the context of MCA.
Low-dimensional tracking of association structures in categorical data / IODICE D'ENZA, Alfonso; Markos, Angelos. - In: STATISTICS AND COMPUTING. - ISSN 1573-1375. - 25:5(2015), pp. 1009-1022. [10.1007/s11222-014-9470-4]
Low-dimensional tracking of association structures in categorical data
Iodice D'Enza Alfonso
;
2015
Abstract
In modern applications, such as text mining and signal processing, large amounts of categorical data are produced at a high rate and are characterized by association structures changing over time. Multiple correspondence analysis (MCA) is a well established dimension reduction method to explore the associations within a set of categorical variables. A critical step of the MCA algorithm is a singular value decomposition (SVD) or an eigenvalue decomposition (EVD) of a suitably transformed matrix. The high computational and memory requirements of ordinary SVD and EVD make their application impractical on massive or sequential data sets. Several enhanced SVD/EVD approaches have been recently introduced in an effort to overcome these issues. The aim of the present contribution is twofold: (1) to extend MCA to a split-apply-combine framework, that leads to an exact and parallel MCA implementation; (2) to allow for incremental updates (downdates) of existing MCA solutions, which lead to an approximate yet highly accurate solution. For this purpose, two incremental EVD and SVD approaches with desirable properties are revised and embedded in the context of MCA.File | Dimensione | Formato | |
---|---|---|---|
08_STCO_Low_dim_track.pdf
solo utenti autorizzati
Tipologia:
Documento in Post-print
Licenza:
Non specificato
Dimensione
3.59 MB
Formato
Adobe PDF
|
3.59 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.