A novel augmentation strategy for credit scoring modeling

La Gatta, Valerio; Postiglione, Marco; Sperli', Giancarlo

doi:10.1007/s00521-024-10452-3

In last years, social lending platforms have been increasingly used as virtual environments where borrowers can directly interact with lenders without any intermediary. As a result, a reliable credit scoring strategy, i.e., assessing whether a client is able to fully repay a loan, became of utmost importance to reduce the risk of not repaying the lenders. In this context, machine learning tools are being increasingly adopted to design automatic credit scoring systems but the data imbalance problem still penalizes their predictive performance, i.e., the greatest majority of clients can afford the repayment and learning to classify ”bad” borrowers depends on few instances where the loan was not paid back. In this paper, we target the data imbalance problem and propose a novel data augmentation strategy to improve the predictive performance of credit scoring models. The proposed methodology performs data augmentation by injecting synthetic instances in the dataset generated along the decision boundary of the decision model. We assessed the effectiveness of the proposed augmentation strategy on a million-scale dataset from Lending Club, the largest Social Lending platform, and found that it improves the performance of several classification models, also in comparison to other state-of-the-art approaches.

A novel augmentation strategy for credit scoring modeling / La Gatta, V., Postiglione, M., Sperli', G.. - In: NEURAL COMPUTING & APPLICATIONS. - ISSN 1433-3058. - 37:9(2025), pp. 6663-6675. [10.1007/s00521-024-10452-3]