Different techniques exist to analyze multiway data but PARAFAC is one of the most popular. The usual way of parameter estimation in PARAFAC is an alternating least squares (ALS) procedure which yields leastsquares solutions and provides consistent outcomes. Together with these desirable features, the ALS procedure suffers several major flaws which might be particularly problematic for largescale problems: slow convergence and sensitiveness to degeneracy conditions such as overfactoring, collinearity, bad initialization and local minima. Furthermore, it is wellknown that algorithms which rely on least squares easily break down in the presence of outliers. The issue of nonrobustness of the ALS procedure was addressed by [1] and software for computing the robust PARAFAC model is available in the R package rrcov3way (see [4]). The other issues were addressed in a number of works proposing algorithms more efficient than ALS. However, often these do not provide stable results because the increased speed might come at the expense of accuracy. An integrated algorithm was proposed by [2] and [3] which seems to combine improved speed and stability. The purpose of this work is to elaborate this algorithm by adding capabilities for dealing with outliers which might be present in the data. The idea of a robust version of ALS is to identify enough ”good“ observations and to perform the classical ALS on these observations. This is repeated until no significant change is observed. Finally, a reweighting step is carried out to improve the efficiency of the estimators. In order to identify the ”good” observations a robust version of principal component analysis on the unfolded array is used. It is obvious that the robust procedure will be much more time consuming than the classical one, repeating many times the ALS optimization. Therefore, any improvement of the performance of the parameter estimation procedure will contribute to the improvement of the performance of the complete robust procedure. We combine the robust PARAFAC procedure proposed by [1] with the highly efficient estimation algorithm INT2 proposed by [2] in order to obtain a fast estimation and robust to outliers PARAFAC modeling technique which we call RINT2. As before it starts by robust principal components to identify any outlying points and then iterates using the INT2 algorithm until no significant change is observed. After convergence a reweighting step with INT2 is conducted which produces the final solution. The performance of the newly proposed algorithm RINT2 for robust estimation of trilinear PARAFAC models is demonstrated in a brief simulation study comparing classical ALS, the robust version based on ALS ([1]) and the new RINT2. First of, all we want to verify that RINT2 works well on data sets with and without contamination by identifying the outliers at least as good as the robust ALS algorithm retrieving solutions with good statistical quality. At the same time we want to verify that the convergence is improved significantly and thus the computational time is reduced. Future work should bring a thorough investigation of the properties of the algorithm, comparison to the many existing fast alternatives and studying the possibilities for combination with other computational algorithms.
An improved estimation procedure for robust PARAFAC model fitting / Tordorov, Valentin; Simonacci, Violetta; Gallo, Michele; Trendafilov, Nickolay.  (2022), pp. 4950. ((Intervento presentato al convegno International Conference on Robust Statistics tenutosi a University of Waterloo, Canada nel 580 July 2022.
An improved estimation procedure for robust PARAFAC model fitting
Simonacci Violetta;
2022
Abstract
Different techniques exist to analyze multiway data but PARAFAC is one of the most popular. The usual way of parameter estimation in PARAFAC is an alternating least squares (ALS) procedure which yields leastsquares solutions and provides consistent outcomes. Together with these desirable features, the ALS procedure suffers several major flaws which might be particularly problematic for largescale problems: slow convergence and sensitiveness to degeneracy conditions such as overfactoring, collinearity, bad initialization and local minima. Furthermore, it is wellknown that algorithms which rely on least squares easily break down in the presence of outliers. The issue of nonrobustness of the ALS procedure was addressed by [1] and software for computing the robust PARAFAC model is available in the R package rrcov3way (see [4]). The other issues were addressed in a number of works proposing algorithms more efficient than ALS. However, often these do not provide stable results because the increased speed might come at the expense of accuracy. An integrated algorithm was proposed by [2] and [3] which seems to combine improved speed and stability. The purpose of this work is to elaborate this algorithm by adding capabilities for dealing with outliers which might be present in the data. The idea of a robust version of ALS is to identify enough ”good“ observations and to perform the classical ALS on these observations. This is repeated until no significant change is observed. Finally, a reweighting step is carried out to improve the efficiency of the estimators. In order to identify the ”good” observations a robust version of principal component analysis on the unfolded array is used. It is obvious that the robust procedure will be much more time consuming than the classical one, repeating many times the ALS optimization. Therefore, any improvement of the performance of the parameter estimation procedure will contribute to the improvement of the performance of the complete robust procedure. We combine the robust PARAFAC procedure proposed by [1] with the highly efficient estimation algorithm INT2 proposed by [2] in order to obtain a fast estimation and robust to outliers PARAFAC modeling technique which we call RINT2. As before it starts by robust principal components to identify any outlying points and then iterates using the INT2 algorithm until no significant change is observed. After convergence a reweighting step with INT2 is conducted which produces the final solution. The performance of the newly proposed algorithm RINT2 for robust estimation of trilinear PARAFAC models is demonstrated in a brief simulation study comparing classical ALS, the robust version based on ALS ([1]) and the new RINT2. First of, all we want to verify that RINT2 works well on data sets with and without contamination by identifying the outliers at least as good as the robust ALS algorithm retrieving solutions with good statistical quality. At the same time we want to verify that the convergence is improved significantly and thus the computational time is reduced. Future work should bring a thorough investigation of the properties of the algorithm, comparison to the many existing fast alternatives and studying the possibilities for combination with other computational algorithms.File  Dimensione  Formato  

icors2022_abstracts.pdf
accesso aperto
Licenza:
Non specificato
Dimensione
277.49 kB
Formato
Adobe PDF

277.49 kB  Adobe PDF  Visualizza/Apri 
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.