My interest in using reinforcement learning in traffic simulation started a while back. It had always irked me that every time a simulation run is conducted, agents have zero idea about the network: they only know their origins and destinations, but the rest is fed to them during the simulation, such as the shortest route, which lane to choose, etc. Besides, if something unexpected were to occur in the network, say an incident or a lane closure, vehicles are rerouted strictly based on the shortest path, assuming that familiar drivers (Paramics jargon) have access to real-time travel time information, and that they have a zero indifference band (e.g. they choose another route even if it is a second faster). Furthermore every familiar agent that receive the new information start routing, then the alternative route, most likely some urban network, gets congested due to this extra volume. Now, agents are assumed to not learn from this. Another cycle, they do the same mistake… another simulation run the same scenario.
I started worked on training agents using their repeated experiences in the simulation network, and applied the Q-learning method to model vehicles’ gap acceptance behavior at an uncontrolled traffic intersection, where the ultimate goal of agents is to ensure safety and reduce wait-time. This is a single decision process, and rather easy to implement. Then again, looking back at this I realized that I could have done a better job. Anyway, the paper is published and the link is below:
Bartin, B. (2017). “Simulation of Vehicles’ Gap Acceptance Decision Using Reinforcement Learning.” Uludag University Journal of the Faculty of Engineering. Vol. 22, No. 2. DOI: 10.17482/uumfd.338803 (Link)
Then I worked on the same one-time decision process, this time for modelling the lane selection of vehicles at a toll plaza. I used XCS learning to simulate vehicles’ lane selection decisions in a microscopic simulation model of a toll plaza. The underlying idea is to train a vehicle (agent) using the XCS framework through multiple simulation runs where the agent adapts to the network by learning its possible states via its environmental input sensors, and make decisions based on assigned objectives. During learning, the agent’s objectives are to minimize wait time and crash risk. The agent learns from its experiences and improves its decision-making progressively. I tested the use of XCS learning using a hypothetical toll plaza simulation model developed in Paramics. Paramics API is used to control the agent before approaching the toll plaza and assign its lane selection using the XCS framework.
The paper is accepted for the 97th Transportation Research Board Annual Meeting for presentation and potential publication. Revisions are being made right now.