Active learning for spectroscopic data regression

Douak, Fouzi; Melgani, Farid; Alajlan, Naif; Pasolli, Edoardo; Bazi, Yakoub; Benoudjit, Nabil

doi:10.1002/cem.2443

In this work, we introduce an active learning approach for the estimation of chemical concentrations from spectroscopic data. Its main objective is to opportunely collect training samples in such a way as to minimize the error of the regression process while minimizing the number of training samples used, and thus to reduce the costs related to training sample collection. In particular, we propose two different active learning strategies developed for regression approaches based on partial least squares regression, ridge regression, kernel ridge regression, and support vector regression. The first strategy uses a pool of regressors in order to select the samples with the greatest disagreements among the different regressors of the pool, while the second one is based on adding samples that are distant from the current training samples in the feature space. For support vector regression, a specific strategy based on the selection of the samples distant from the support vectors is proposed. Experimental results on three different real data sets are reported and discussed.

Active learning for spectroscopic data regression / Douak, Fouzi; Melgani, Farid; Alajlan, Naif; Pasolli, Edoardo; Bazi, Yakoub; Benoudjit, Nabil. - In: JOURNAL OF CHEMOMETRICS. - ISSN 0886-9383. - 26:7(2012), pp. 374-383. [10.1002/cem.2443]