Track: Undergraduate Student Paper Competition
Abstract
Debt collection from a debt collection agency (DCA) has become more difficult due to the pandemic. Nevertheless, in the last year the population has incurred in more debts, while there has been a decrease in default loans. This has created an opportunity for DCAs to stablish strategies to improve the debt collection process. In this work, the CRISP-DM methodology is implemented in an Ecuadorian DCA to develop a machine learning algorithm that predicts a debtor´s payment probability and stablish a debt collection strategy. An unbalanced dataset with 7,447,856 registers is gathered, cleaned, and preprocessed to train a Random Forrest Classifier, Gradient Boosting Machine, Logistic Regression, and Multi-Layer Perceptron using a random under-sampling technique. The models’ performance is compared using the sensitivity, specificity, and AUC evaluation metrics. The best performing algorithm is the Gradient Boosting Machine with a sensitivity score of 0.97, specificity of 0.93, and AUC of 0.98 on the validation set. This algorithm also allows to identify the most discriminative features for the prediction, these being the days past due, the day between the acquisition of the account and the default date, the name of the business category, the name of the prior account owner, and the number of direct contacts performed by a robot.