Track: Machine Learning
Abstract
Either a high complexity hospital or a smaller clinic, healthcare centers have to withstand the constant pressure of the incoming flow of new patients. While some patients require a simple medical procedure, others will need further examination and probably have remain in observation for some time. This situation is particularly complicated in times of sanitary crisis. Since the infrastructure, supplies, and medical staff are limited resources, there is a real need for utilizing them efficiently. This research is focused on the use of ensemble machine learning algorithms to develop models for predicting the destination of patients who are discharge after a stay at an intensive care unit (ICU).
The investigation was carried out following a 4-phase methodology: analysis, design, development, and validation. During the analysis, an extensive review and preprocessing of patient records collected from a public hospital was carried out. Then, during the design several ensemble machine learning algorithms were compared and selected for the investigation. To name a few: Linear Regression, Decision Tree, Stacking, Bagging, and Random Forest. The following phases, development and validation were completed using data processing software. In all models proposed here, instead of a simple hold-out, a 10-fold cross-validation scheme was applied.
For the purposes of this research, twenty thousand patient records collected in 2020 were considered. The complete dataset was split in two subsets. One subset for training and test with 80% of the data and another dataset for validation with the remaining 20%.
During the development of the models, only data for training and for test were used. The validation data were used only to measure the models performance with unseen data.
Results revealed that regardless the size of the training and test dataset, there was a notorious consistency in the correct prediction rates. The proposed ensemble scheme made of three base learner plus a meta algorithm, systematically leaded to correct prediction rates close to 82%.
In conclusion, the proposed models proved that, with based on the existing data, high rates of correct prediction can be achieved when an ensemble scheme is used. In this case, with a reasonable certainty, it was possible to predict whether a patient was going to be referred to another unit or sent home after his or her stay at ICU.