Abstract
In human resources operations, it is crucial to anticipate potential employee turnovers to implement effective planning strategies like employee recruitment and training planning. In addition to exploring the feasibility of predicting employee turnover based on non-job-specific features, this study identifies the most important candidate characteristics for predicting potential turnover. These features encompass demographic and educational data and are applicable across various industries rather than being confined to a specific industry. The dataset used is characterized by its imbalanced nature where class proportions are skewed, necessitating the inclusion of every data point for comprehensive analysis. A comparative investigation was conducted, evaluating different sampling methodologies to handle imbalanced data including up-sampling and down-sampling, alongside various classification algorithms such as ensemble learning techniques and Support Vector Machines (SVM). As the result of this study, the significance of each feature was determined through the application of the most effective model, namely Random Forest, which achieved an accuracy rate of 87.3% and an area-under-curve score (AUC) of 87.3%, which exceeded previous studies using the same dataset. These metrics not only indicate the model's ability to correctly classify employees into potential turnover and non-turnover categories with high accuracy, but also highlight its capability to minimize false positives and false negatives, which is crucial for decision-making processes.