Different techniques exist to analyze multi-way data but PARAFAC is one of the most popular. The usual way of parameter estimation in PARAFAC is an alternating least squares (ALS) procedure which yields least-squares solutions and provides consistent outcomes. Together with these desirable features, the ALS procedure suffers several major flaws which might be particularly problematic for large-scale problems: slow convergence and sensitiveness to degeneracy conditions such as over-factoring, collinearity, bad initialization and local minima. Furthermore, it is well-known that algorithms which rely on least squares easily break down in the presence of outliers. The issue of nonrobustness of the ALS procedure was addressed by [1] and software for computing the robust PARAFAC model is available in the R package rrcov3way (see [4]). The other issues were addressed in a number of works proposing algorithms more efficient than ALS. However, often these do not provide stable results because the increased speed might come at the expense of accuracy. An integrated algorithm was proposed by [2] and [3] which seems to combine improved speed and stability. The purpose of this work is to elaborate this algorithm by adding capabilities for dealing with outliers which might be present in the data. The idea of a robust version of ALS is to identify enough ”good“ observations and to perform the classical ALS on these observations. This is repeated until no significant change is observed. Finally, a reweighting step is carried out to improve the efficiency of the estimators. In order to identify the ”good” observations a robust version of principal component analysis on the unfolded array is used. It is obvious that the robust procedure will be much more time consuming than the classical one, repeating many times the ALS optimization. Therefore, any improvement of the performance of the parameter estimation procedure will contribute to the improvement of the performance of the complete robust procedure. We combine the robust PARAFAC procedure proposed by [1] with the highly efficient estimation algorithm INT2 proposed by [2] in order to obtain a fast estimation and robust to outliers PARAFAC modeling technique which we call R-INT2. As before it starts by robust principal components to identify any outlying points and then iterates using the INT2 algorithm until no significant change is observed. After convergence a reweighting step with INT2 is conducted which produces the final solution. The performance of the newly proposed algorithm R-INT2 for robust estimation of trilinear PARAFAC models is demonstrated in a brief simulation study comparing classical ALS, the robust version based on ALS ([1]) and the new R-INT2. First of, all we want to verify that R-INT2 works well on data sets with and without contamination by identifying the outliers at least as good as the robust ALS algorithm retrieving solutions with good statistical quality. At the same time we want to verify that the convergence is improved significantly and thus the computational time is reduced. Future work should bring a thorough investigation of the properties of the algorithm, comparison to the many existing fast alternatives and studying the possibilities for combination with other computational algorithms.

An improved estimation procedure for robust PARAFAC model fitting

Simonacci Violetta;
2022

Abstract

Different techniques exist to analyze multi-way data but PARAFAC is one of the most popular. The usual way of parameter estimation in PARAFAC is an alternating least squares (ALS) procedure which yields least-squares solutions and provides consistent outcomes. Together with these desirable features, the ALS procedure suffers several major flaws which might be particularly problematic for large-scale problems: slow convergence and sensitiveness to degeneracy conditions such as over-factoring, collinearity, bad initialization and local minima. Furthermore, it is well-known that algorithms which rely on least squares easily break down in the presence of outliers. The issue of nonrobustness of the ALS procedure was addressed by [1] and software for computing the robust PARAFAC model is available in the R package rrcov3way (see [4]). The other issues were addressed in a number of works proposing algorithms more efficient than ALS. However, often these do not provide stable results because the increased speed might come at the expense of accuracy. An integrated algorithm was proposed by [2] and [3] which seems to combine improved speed and stability. The purpose of this work is to elaborate this algorithm by adding capabilities for dealing with outliers which might be present in the data. The idea of a robust version of ALS is to identify enough ”good“ observations and to perform the classical ALS on these observations. This is repeated until no significant change is observed. Finally, a reweighting step is carried out to improve the efficiency of the estimators. In order to identify the ”good” observations a robust version of principal component analysis on the unfolded array is used. It is obvious that the robust procedure will be much more time consuming than the classical one, repeating many times the ALS optimization. Therefore, any improvement of the performance of the parameter estimation procedure will contribute to the improvement of the performance of the complete robust procedure. We combine the robust PARAFAC procedure proposed by [1] with the highly efficient estimation algorithm INT2 proposed by [2] in order to obtain a fast estimation and robust to outliers PARAFAC modeling technique which we call R-INT2. As before it starts by robust principal components to identify any outlying points and then iterates using the INT2 algorithm until no significant change is observed. After convergence a reweighting step with INT2 is conducted which produces the final solution. The performance of the newly proposed algorithm R-INT2 for robust estimation of trilinear PARAFAC models is demonstrated in a brief simulation study comparing classical ALS, the robust version based on ALS ([1]) and the new R-INT2. First of, all we want to verify that R-INT2 works well on data sets with and without contamination by identifying the outliers at least as good as the robust ALS algorithm retrieving solutions with good statistical quality. At the same time we want to verify that the convergence is improved significantly and thus the computational time is reduced. Future work should bring a thorough investigation of the properties of the algorithm, comparison to the many existing fast alternatives and studying the possibilities for combination with other computational algorithms.
File in questo prodotto:
File Dimensione Formato  
icors2022_abstracts.pdf

accesso aperto

Licenza: Non specificato
Dimensione 277.49 kB
Formato Adobe PDF
277.49 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11588/894597
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact