This paper introduces the Global Sensitivity Analysis framework, commonly used in mathematical modeling to assess input uncertainty, as a novel approach to understanding variable importance in machine learning. Specifically, we consider Random Forests and we aim to extend their utility beyond mere prediction. While Random Forests are highly accurate “black–box” models, their internal mechanisms often remain obscure. Traditional variable importance measures primarily quantify the contribution of each feature to predictive performance. We propose a generative variable importance ranking based on sensitivity analysis to detect the intrinsic importance of each input feature to the underlying data–generating process. As a result, it provides crucial insight into how the response is genuinely determined by the dependence structure of its predictors. A simulation study shows how our methodology not only offers deeper insights into model uncertainty but also significantly advances in explainable machine learning by enhancing explanatory capabilities and revealing complex relationships beyond just predictive performance.
Global sensitivity analysis in random forests: unveiling generative variable importance / Vannucci, G.; Siciliano, R.; Saltelli, A.. - In: STATISTICAL METHODS & APPLICATIONS. - ISSN 1618-2510. - (2026). [10.1007/s10260-026-00839-y]
Global sensitivity analysis in random forests: unveiling generative variable importance
Vannucci G.
Primo
Methodology
;Siciliano R.;Saltelli A.
2026
Abstract
This paper introduces the Global Sensitivity Analysis framework, commonly used in mathematical modeling to assess input uncertainty, as a novel approach to understanding variable importance in machine learning. Specifically, we consider Random Forests and we aim to extend their utility beyond mere prediction. While Random Forests are highly accurate “black–box” models, their internal mechanisms often remain obscure. Traditional variable importance measures primarily quantify the contribution of each feature to predictive performance. We propose a generative variable importance ranking based on sensitivity analysis to detect the intrinsic importance of each input feature to the underlying data–generating process. As a result, it provides crucial insight into how the response is genuinely determined by the dependence structure of its predictors. A simulation study shows how our methodology not only offers deeper insights into model uncertainty but also significantly advances in explainable machine learning by enhancing explanatory capabilities and revealing complex relationships beyond just predictive performance.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


