Data Fusion and Data Grafting are concerned with combining files and information coming from different sources. The problem is not to extract data from a single database, but to merge information collected from different sample surveys. The typical data fusion situation formed of two data samples, the former made up of a complete data matrix X relative to a first survey, and the latter Y which contains a certain number of missing variables. The aim is to complete the matrix Y beginning from the knowledge acquired from the X. Thus, the goal is the definition of the correlation structure which joins the two data matrices to be merged. In this paper, we provide an innovative methodology for Data Fusion based on an incremental imputation algorithm in tree-based models. In addition, we consider robust tree validation by boosting iterations. A relevant advantage of the proposed method is that it works for a mixed data structure including both numerical and categorical variables. As benchmarking methods we consider explicit methods such as standard trees and multiple regression as well as an implicit method based principal component analysis. A widely extended simulation study proves that the proposed method is more accurate than the other methods.

Robust Tree-Based Incremental Imputation Method for Data Fusion / D'Ambrosio, Antonio; Aria, Massimo; Siciliano, Roberta. - STAMPA. - Volume 4723/2007:(2007), pp. 174-183. [10.1007/978-3-540-74825-0]

Robust Tree-Based Incremental Imputation Method for Data Fusion

D'AMBROSIO, ANTONIO;ARIA, MASSIMO;SICILIANO, ROBERTA
2007

Abstract

Data Fusion and Data Grafting are concerned with combining files and information coming from different sources. The problem is not to extract data from a single database, but to merge information collected from different sample surveys. The typical data fusion situation formed of two data samples, the former made up of a complete data matrix X relative to a first survey, and the latter Y which contains a certain number of missing variables. The aim is to complete the matrix Y beginning from the knowledge acquired from the X. Thus, the goal is the definition of the correlation structure which joins the two data matrices to be merged. In this paper, we provide an innovative methodology for Data Fusion based on an incremental imputation algorithm in tree-based models. In addition, we consider robust tree validation by boosting iterations. A relevant advantage of the proposed method is that it works for a mixed data structure including both numerical and categorical variables. As benchmarking methods we consider explicit methods such as standard trees and multiple regression as well as an implicit method based principal component analysis. A widely extended simulation study proves that the proposed method is more accurate than the other methods.
2007
9783540748243
Robust Tree-Based Incremental Imputation Method for Data Fusion / D'Ambrosio, Antonio; Aria, Massimo; Siciliano, Roberta. - STAMPA. - Volume 4723/2007:(2007), pp. 174-183. [10.1007/978-3-540-74825-0]
File in questo prodotto:
File Dimensione Formato  
IDA2007_springer.pdf

non disponibili

Tipologia: Documento in Post-print
Licenza: Accesso privato/ristretto
Dimensione 662.64 kB
Formato Adobe PDF
662.64 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/115275
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact