Track: Data Analytics
Abstract
Categorical variables that are numerical, are sometimes, included in a regression model, as continuous predictors. In this paper we show that the fit of regression models may be very different when numerical categorical variables are considered as continuous or as factors. With a small example we show that it is possible that the adjusted R-squared can be negative in the former case and close to one in the latter. We use data visualization to explain the difference. We build models with categorical variables, using data from the car's Consumer Reports, to show how to improve the fit and to explain outliers.