Categorical variables that are numerical, are sometimes, included in a regression model, as continuous predictors. In this paper we show that the fit of regression models may be very different when numerical categorical variables are considered as continuous or as factors. With a small example we show that it is possible that the adjusted R-squared can be negative in the former case and close to one in the latter. We use data visualization to explain the difference. We build models with categorical variables, using data from the car's Consumer Reports, to show how to improve the fit and to explain outliers.
Track: Data Analytics
Published in: 3rd North American International Conference on Industrial Engineering and Operations Management, Washington D.C., USA
Publisher: IEOM Society International
Date of Conference: September 27
-29
, 2018
ISBN: 978-1-5323-5946-0
ISSN/E-ISSN: 2169-8767