3rd European International Conference on Industrial Engineering and Operations Management

A Scalable Approach for Vehicle Routing Problem with Reinforcement Learning

Cheung Yu Lo & Ka Man Lee
Publisher: IEOM Society International
0 Paper Citations
1 Views
1 Downloads
Track: Undergradute Research Competition
Abstract

This paper proposed a scalable approach for solving vehicle routing problems by breaking down a large scale problem to small sub-problems where the reinforcement learning model is employed recursively. The approach is inspired by large scale vehicle routing problem in a city with some pre-defined district clusters (E.g. Hong Kong). The result within every sub-problems is obtained using a sequence to sequence deep learning network trained by policy gradient. Despite the time required for training, the model outperforms traditional heuristics models with a comparable amount of time. 

Bello et al. (2016) have proposed a framework for solving neural combinatorial problems with reinforcement learning. A pointer network (Vinyals et al., 2015) has been employed to process sequential inputs. Nazar et al. (2018) have replaced the pointer network with an LSTM recurrent neural network that accepts both static and dynamic inputs. Our proposed model broke down the vehicle routing problem to a city level problem (VRP for a group of districts) and many district level problems (VRP for groups of nodes) given the districts labels. Each problem is solved using the  reinforcement learning model given its generic properties proven in previous studies.

The model receives an input  array  . Each input  composes of the concatenation of a 2D array, the location for each node and the demand in the specific location, as well as the corresponding district . The location of nodes are represented as latitudes and longitudes which will be mapped to cartesian coordinates. The result will be passed to the encoder of the network. An attention mechanism is employed for constraining the amount of computation in the model.

Parameters of the network are updated using the REINFORCE algorithm in which the baseline in the equation is estimated by a critic network which learns the approximated time required for tours travelling. The network will be trained and updated concurrently with the sequence to sequence (actor) network. The rewards generated by the environment equal to the negative amount of tour travelling time. The model objective is approaching the optimal policy  that assign high possibility to the tour that require less amount of time. The stochastic policy generated by the network are calculated.

The centroid of the districts will be calculated and pass to the city level model. In every step of the city level model, a district level model with independent parameters is executed. The result of the district level model will be passed back to the city level model and the city level model will proceed to the next step by policy sampling.


The preliminary results indicated that the model outperforms traditional heuristic algorithms such as Clarke-Wright Savings Heuristic and Sweep Heuristic. The significance of the study is that it provides an approach for solving large scale vehicle routing problem which is highly computational expensive with traditional algorithms.

Published in: 3rd European International Conference on Industrial Engineering and Operations Management, Pilsen, Czech Republic

Publisher: IEOM Society International
Date of Conference: July 23-26, 2019

ISBN: 978-1-5323-5949-1
ISSN/E-ISSN: 2169-8767