This special issue of Statistical Analysis and Data Mining collects papers presented at the 12-th Scientific Meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society (SIS), held in Cassino, Italy, September 11 – 13, 2019. The CLADAG group, founded in 1997, promotes advanced methodological research in multivariate statistics with a special vocation in Data Analysis and Classification. CLADAG is a member of the International Federation of Classification Societies (IFCS). It organizes a biennial international scientific meeting, schools related to classification and data analysis, publishes a newsletter, and cooperates with other member societies of the IFCS to the organization of their conferences. Founded in 1985, the IFCS is a federation of national, regional, and linguistically-based classification societies aimed at promoting classification research. Previous CLADAG meetings were held in Pescara (1997), Roma (1999), Palermo (2001), Bologna (2003), Parma (2005), Macerata (2007), Catania (2009), Pavia (2011), Modena and Reggio Emilia (2013), Cagliari (2015), and Milano (2017). Best papers from the conference have been submitted to this special issue, and five of them have been selected for publication, following a blind peer-review process. The manuscripts deal with different data analysis issues: mixture of distributions, compositional data analysis, Markov chain for web usability, survival analysis, and applications to high-throughput, eye-tracking, and insurance transaction data. The paper by S.X. Lee et al. proposes a parallelization strategy of the Expectation-Maximization (EM) algorithm, with a special focus on the estimation of finite mixtures of flexible distribution such as the canonical fundamental skew t distribution (CFUST). The parallel implementation of the EM-algorithm is suitable for single-threaded and multi-threaded processors as well as for single machine and multiple-node systems. The EM algorithm is also discussed in the paper of L. Scrucca. Here, a fast and efficient Modal EM algorithm for identifying the modes of a density estimated through a finite mixture of Gaussian distributions with parsimonious component covariance structures is provided. The proposed approach is based on an iterative procedure aimed at identifying the local maxima, exploiting features of the underlying Gaussian mixture model. Motivated by applications in high-throughput compositional data analysis, the paper by N. Štefelová et al. proposes a data-driven weighting strategy to enhance marker identification through PLS regression with compositional predictors. The weighting strategy draws on the correlation structure between response variable and pairwise log-ratios. Its practical relevance is illustrated through an analysis of metabolite signals associated with the emission of greenhouse gases from cattle. The paper by G. Zammarchi et al. exploits Markov chain to analyse web usability of a University website using eye tracking methodology. With the aim of improving its usability, the paper compares performances of high school and University students in terms of time to completion, number of fixations and difficulty ratio across ten different tasks. Data from a commercial insurance company in the Czech Republic are instead exploited by D. Zapletal to compare the efficacy of some survival analysis models within an insurance transaction framework. The ability to identify relevant explanatory variables through the Cox proportional hazard model and some competing risk models (i.e., the cause-specific and the sub-distribution hazard models) is assessed on a large data set consisting of more than 200 thousand individuals. In brief, this special issue is in line with the CLADAG goal of supporting the interchange of ideas in Classification and Data Analysis. We strongly believe it well represents the scientific characteristics of the CLADAG community, and we invite all readers to join the next CLADAG conference, which will be in Florence, September 11 to 13, 2021.

CLADAG 2019 Special Issue: Selected Papers on Classification and Data Analysis (editoriale)

Porzio G. C.
Penultimo
;
Vistocco D.
Ultimo
2021

Abstract

This special issue of Statistical Analysis and Data Mining collects papers presented at the 12-th Scientific Meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society (SIS), held in Cassino, Italy, September 11 – 13, 2019. The CLADAG group, founded in 1997, promotes advanced methodological research in multivariate statistics with a special vocation in Data Analysis and Classification. CLADAG is a member of the International Federation of Classification Societies (IFCS). It organizes a biennial international scientific meeting, schools related to classification and data analysis, publishes a newsletter, and cooperates with other member societies of the IFCS to the organization of their conferences. Founded in 1985, the IFCS is a federation of national, regional, and linguistically-based classification societies aimed at promoting classification research. Previous CLADAG meetings were held in Pescara (1997), Roma (1999), Palermo (2001), Bologna (2003), Parma (2005), Macerata (2007), Catania (2009), Pavia (2011), Modena and Reggio Emilia (2013), Cagliari (2015), and Milano (2017). Best papers from the conference have been submitted to this special issue, and five of them have been selected for publication, following a blind peer-review process. The manuscripts deal with different data analysis issues: mixture of distributions, compositional data analysis, Markov chain for web usability, survival analysis, and applications to high-throughput, eye-tracking, and insurance transaction data. The paper by S.X. Lee et al. proposes a parallelization strategy of the Expectation-Maximization (EM) algorithm, with a special focus on the estimation of finite mixtures of flexible distribution such as the canonical fundamental skew t distribution (CFUST). The parallel implementation of the EM-algorithm is suitable for single-threaded and multi-threaded processors as well as for single machine and multiple-node systems. The EM algorithm is also discussed in the paper of L. Scrucca. Here, a fast and efficient Modal EM algorithm for identifying the modes of a density estimated through a finite mixture of Gaussian distributions with parsimonious component covariance structures is provided. The proposed approach is based on an iterative procedure aimed at identifying the local maxima, exploiting features of the underlying Gaussian mixture model. Motivated by applications in high-throughput compositional data analysis, the paper by N. Štefelová et al. proposes a data-driven weighting strategy to enhance marker identification through PLS regression with compositional predictors. The weighting strategy draws on the correlation structure between response variable and pairwise log-ratios. Its practical relevance is illustrated through an analysis of metabolite signals associated with the emission of greenhouse gases from cattle. The paper by G. Zammarchi et al. exploits Markov chain to analyse web usability of a University website using eye tracking methodology. With the aim of improving its usability, the paper compares performances of high school and University students in terms of time to completion, number of fixations and difficulty ratio across ten different tasks. Data from a commercial insurance company in the Czech Republic are instead exploited by D. Zapletal to compare the efficacy of some survival analysis models within an insurance transaction framework. The ability to identify relevant explanatory variables through the Cox proportional hazard model and some competing risk models (i.e., the cause-specific and the sub-distribution hazard models) is assessed on a large data set consisting of more than 200 thousand individuals. In brief, this special issue is in line with the CLADAG goal of supporting the interchange of ideas in Classification and Data Analysis. We strongly believe it well represents the scientific characteristics of the CLADAG community, and we invite all readers to join the next CLADAG conference, which will be in Florence, September 11 to 13, 2021.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/858334
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact