Track: Machine Learning
Abstract
This research is focused on the development of models based on ensemble algorithms to predict the list price of properties for sale in Valdivia (Chile) in 2020. The research is carried out following a classic 4-stage methodology (analysis, design, development, and validation). During the analysis, data is gathered and preprocessed. In the design, attributes of interest are selected and grouped in six domains, which are combined to build 16 predictive models. Comparison metrics are selected at this point: correlation coefficient, MAE and RMSE. Construction and validation are carried out entirely using the software WEKA. A total of 34 attributes and 228 properties are considered. The dataset is split up into a subset for training and test (80%) and a subset for validation (20%). List prices are predicted using a stacking ensemble with a support vector machine as meta-learner and as base learners a linear regression, a decision tree, and an artificial neural network. In the best case, predictions and actual list prices have a correlation of 90% and a percentual MAE of 26%. In conclusion, some of the proposed models can help predict list prices. However, prediction errors are still significant.