In this work, we introduce an active learning approach for the estimation of chemical concentrations from spectroscopic data. Its main objective is to opportunely collect training samples in such a way as to minimize the error of the regression process while minimizing the number of training samples used, and thus to reduce the costs related to training sample collection. In particular, we propose two different active learning strategies developed for regression approaches based on partial least squares regression, ridge regression, kernel ridge regression, and support vector regression. The first strategy uses a pool of regressors in order to select the samples with the greatest disagreements among the different regressors of the pool, while the second one is based on adding samples that are distant from the current training samples in the feature space. For support vector regression, a specific strategy based on the selection of the samples distant from the support vectors is proposed. Experimental results on three different real data sets are reported and discussed.
Active learning for spectroscopic data regression / Douak, Fouzi; Melgani, Farid; Alajlan, Naif; Pasolli, Edoardo; Bazi, Yakoub; Benoudjit, Nabil. - In: JOURNAL OF CHEMOMETRICS. - ISSN 0886-9383. - 26:7(2012), pp. 374-383. [10.1002/cem.2443]
Active learning for spectroscopic data regression
Pasolli, Edoardo;
2012
Abstract
In this work, we introduce an active learning approach for the estimation of chemical concentrations from spectroscopic data. Its main objective is to opportunely collect training samples in such a way as to minimize the error of the regression process while minimizing the number of training samples used, and thus to reduce the costs related to training sample collection. In particular, we propose two different active learning strategies developed for regression approaches based on partial least squares regression, ridge regression, kernel ridge regression, and support vector regression. The first strategy uses a pool of regressors in order to select the samples with the greatest disagreements among the different regressors of the pool, while the second one is based on adding samples that are distant from the current training samples in the feature space. For support vector regression, a specific strategy based on the selection of the samples distant from the support vectors is proposed. Experimental results on three different real data sets are reported and discussed.File | Dimensione | Formato | |
---|---|---|---|
Douak_2012.pdf
solo utenti autorizzati
Tipologia:
Documento in Post-print
Licenza:
Accesso privato/ristretto
Dimensione
1.07 MB
Formato
Adobe PDF
|
1.07 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.