A Comparative Analysis of Machine Learning Models for Forecasting Air Quality Index in New Delhi
A S Jagadeeswara Rao, Karthikeyan Arumugam, Sandaka Surya Narendra,
and Muhammed Navas T
Department of Mathematics, School of Advanced Sciences
Vellore Institute of Technology, Vellore - 632014, Tamil Nadu, India.
alamuri.sri2024@vitstudent.ac.in, karthikeyan.2024a@vitstudent.ac.in, surya.2024@vitstudent.ac.in, muhammednavas.t@vit.ac.in
Abstract
Accurate forecasting of urban air quality is a significant challenge due to the complex, nonlinear nature of atmospheric dynamics. Many existing models struggle to reliably capture both short-term pollutant fluctuations and long-term seasonal trends, which reduces their utility for public health and policy decisions. This study addresses this gap by conducting a comprehensive comparative analysis of various forecasting models for the Air Quality Index (AQI) in New Delhi. We utilize a daily dataset spanning from September 2017 to October 2024, which was rigorously preprocessed using forward-backward filling for missing values and Winsorization to mitigate outlier effects. A wide range of models were benchmarked, including classical time-series approaches (ARIMA), deep learning networks (LSTM, GRU), and tree-based methods (Random Forest, XGBoost). Experimental results demonstrate that a tuned Random Forest model achieved superior performance, yielding a coefficient of determination (R2) of 0.8689 and a Root Mean Squared Error (RMSE) of 35.94. This surpassed the performance of popular deep learning models like GRU (R2 = 0.8361). These findings highlight the robustness of optimized tree-based models for this application, offering a reliable decision-support tool for effective air quality management.
Keywords
Air Quality Index, Forecasting, Machine Learning, Time Series Analysis, Random Forest
1