Track: Transportation and Traffic
Abstract
Machine learning models have shown high prediction accuracy as a result of their freedom from the limitations of data distribution assumptions in classical statistical methods, non-compliance with which leads to incorrect and inaccurate results. One of the most important applications of machine learning is to predict the severity of traffic crashes according to a number of independent factors related to the crash. This paper aims to compare four machine learning methods: logistic regression, k-nearest neighbor, decision trees, and random forests. More than 40,000 crashes occurred in Riyadh, Saudi Arabia, during 2012-2016 were used. It was found that Decision Trees and Random Forests are the best algorithms in terms of accuracy, and the logistic regression is the weakest. The type of crash was the most important factor in the crash severity, followed by the time of the crash. On the other hand, the number of parties involved in the crash and the lighting condition were the lowest.