Deep Reinforcement Learning Intelligent Traffic Signal Control

First 30 seconds the agent has learned nothing, and is taking exploratory actions 100% of the time, large queue accumulates, poor performance. The last 30 seconds, after significant learning, the agent is taking exploitative actions 100% of the time, actions which yield high reward (reduction in vehicle delay) very small queues forms, high performance.

paper – https://arxiv.org/abs/1611.01142

2000 sim hours, ~4 days wall time, DQN, Q-learning, e-greedy policy, 4 layer (2 convolutional + 2 dense) architecture. SUMO traffic simulation software, Theano + Keras code for ANN.

Post time: Jun-17-2017