A GPU-accelerated Large-scale Simulator for Transportation System Optimization Benchmarking

Jun Zhang, Wenxuan Ao¹¹footnotemark: 1, Junbo Yan, Depeng **, Yong Li
Department of Electronic Engineering, BNRist, Tsinghua University
[email protected]
These authors contributed equally to this work.

Abstract

With the development of artificial intelligence techniques, transportation system optimization is evolving from traditional methods relying on expert experience to simulation and learning-based decision optimization methods. Learning-based optimization methods require extensive interaction with highly realistic microscopic traffic simulators for optimization. However, existing microscopic traffic simulators are computationally inefficient in large-scale scenarios and therefore significantly reduce the efficiency of the data sampling process of optimization algorithms. In addition, the optimization scenarios supported by existing simulators are limited, mainly focusing on the traffic signal control. To address these challenges and limitations, we propose the first open-source GPU-accelerated large-scale microscopic simulator for transportation system simulation. The simulator is able to iterate at 84.09Hz, which achieves 88.92 times computational acceleration in the large-scale scenario with more than a million vehicles compared to the best baseline. Based on the simulator, we implement a set of microscopic and macroscopic controllable objects and metrics to support most typical transportation system optimization scenarios. These controllable objects and metrics are all provided by Python API for ease of use. We choose five important and representative transportation system optimization scenarios and benchmark classical rule-based algorithms, reinforcement learning, and black-box optimization in four cities. The codes are available at https://github.com/tsinghua-fib-lab/moss-benchmark with the MIT License.

1 Introduction

With the increasing level of urbanization and residents’ travel demand, the urban transportation system faces heavier traffic pressure, which brings higher commuting costs, environmental pollution and other society problems, affecting the sustainable development of the city [13, 33, 30]. To alleviate the above problems, governments usually build more transportation infrastructure and optimize the existing transportation infrastructure to enhance the systems’ capacity. For instance, these transportation system optimization methods include traffic signal control, congestion pricing, etc. However, the traditional transportation system optimization process is highly dependent on the experience of experts, which is labor-heavy and often sub-optimal [22]. With the development of reinforcement learning [23, 27] and black-box optimization [10, 5], the above optimization methods have great potential for improving the transportation system. But since all of these optimization methods use extensive interaction with the environment for feedback to perform optimization, it requires that the environment be able to model the transportation system as realistically as possible and can provide feedback fast. In the field of transportation system, simulators that simulate individual motion to provide a realistic result are referred to as microscopic simulators.

At present, there are several available microscopic simulators that can evaluate the efficiency of the transportation system, including SUMO [1], CityFlow [39], and CBLab [19]. However, these simulators face the following two key challenges:

•

Computational inefficiency in large-scale scenarios. Since the urban transportation system is a complex system with strong direct spatial and temporal correlation between different regions, the traffic improvement in a one area may lead to congestion in another area. Therefore, the effect of transportation system optimization should be evaluated in a global city-level perspective, which poses a requirement for large-scale microscopic simulation. However, existing simulators typically use CPUs for computation. The most popular open source simulator, SUMO, is still using a single-threaded computing architecture, which significantly reduces the efficiency of the data sampling process of optimization algorithms such as reinforcement learning. Even though CityFlow and CBLab use multi-threading techniques, it still takes more than 100 seconds to simulate 1 hour in a scenario of about 100,000 vehicles. Due to the large number of environment interactions required by optimization methods especially learning-based methods, we need simulators that can simulate in large-scale scenarios with a frequency of at least 10Hz for adopting these methods into transportation system optimization.
•

Limited supported optimization scenarios. In order to improve the efficiency of the transportation system, traffic management authorities usually apply a variety of transportation system optimization methods, including traffic signal control optimization, intersection lane turn assignment, tidal lane, congestion pricing, etc. If these methods can be used jointly, transportation system efficiency improvements will be further enhanced. However, existing simulators and related optimization studies usually focus on only a few of these scenarios, such as the traffic signal control optimization problem [35, 20, 37], ignoring other scenarios. This situation prevents traffic managers from fully evaluating and comparing the effectiveness of various transportation system optimization methods from microscopic like traffic signal control [42, 35, 38, 40] to macroscopic like congestion pricing [2, 4, 25]. To improve it, the simulator should be implemented to be able to support most common transportation system optimization methods and scenarios.

To address the above challenges, considering the characteristics of individual independent computation in microscopic simulation matches the GPU architecture and the massive computational power of GPUs compared to CPUs, we propose the first open-source GPU-accelerated large-scale microscopic simulator for transportation system simulation. This simulator adopts a parallel-friendly design of computational flow and data partitioning, and designs an efficient indexes for sensing between vehicles. Based on these, we implement microscopic traffic simulation on CUDA and substantially improves the scale and efficiency of simulation. In the largest scenario with more than a million vehicles, this simulator is able to iterate at 84.09Hz, which is 88.92 times better than the optimal baseline. To support the optimization of various scenarios, the simulator also implements a set of microscopic and macroscopic controllable objects and metrics, and provides a Python application programming interface (API) by pybind11 ¹¹1https://github.com/pybind/pybind11. By combining controllable objects and metrics, we implement five typical transportation system optimization scenarios including traffic signal control, dynamic lane assignment within junctions, tidal lane control, congestion pricing, and road planning for benchmarking and evaluate the performance of classical rule-based algorithms, reinforcement learning algorithms and black-box optimization algorithms for these scenarios in 4 large cities including Bei**g, Shanghai, Paris, and New York.

Table 1: Comparison of microscopic simulators for transportation system. The Scale field indicates the approximate number of vehicles that can be computed by this simulator at a simulation computation frequency of 10Hz.

Simulator		SUMO [1]	CityFlow [39]	CBLab [19]	Ours
Scale (10Hz)		<10000	~130,000	~150,000	>10,000,000
Controllable Objects	Traffic Signal	✓	✓	✓	✓
	Lane/Road Max Speed	✓	$\times$	✓	✓
	Lane Function	$\times$	$\times$	$\times$	✓
	Vehicle Route	✓	✓	✓	✓
Metrics	Lane Queue Length	✓	✓	✓	✓
	Road Travelling Time	✓	$\times$	$\times$	✓
	Average Travelling Time	✓	✓	✓	✓
	Throughput	✓	$\times$	$\times$	✓

In short, our contribution are two-fold. First, we propose an high-performance large-scale microscopic simulator for transportation system simulation on GPU and implement microscopic and macroscopic controllable objects and metrics to support transportation system optimization. Second, we choose and implement five typical transportation system optimization scenarios and benchmark common optimization algorithms in four cities to show the usability of our proposed simulator.

2 Related Works

2.1 Existing Simulators for Transportation System

Existing simulators for transportation system can be divided into three categories based on the level of simplification of the simulation models: microscopic simulators, mesoscopic simulators, and macroscopic simulators. Macroscopic simulators [21, 9] typically do not consider modeling individual vehicles, but rather treat the vehicles as a fluid for using velocity and density to describe them. Mesoscopic simulators like often speed up the simulation by simplifying the vehicle motion models. For instance, MATSIM [32] use a uniform motion model with intersection waiting queues [8] to model vehicles and do not consider acceleration and deceleration. Since macroscopic and mesoscopic simulators oversimplify vehicle motion, they are not usually used for AI algorithm based transportation system optimization. Among the microscopic simulators, SUMO [1], CityFlow [39], and CBLab [19] are popular simulators for transportation system optimization. SUMO offers a rich set of controllable objects and metrics. However, due to its software architecture, SUMO can almost exclusively use one CPU core for computation, which leads to small simulation scales shown in Table 1. For CityFlow and CBLab, they both use a multi-threaded architecture for computational acceleration, which improve computational speed by about 20~30 times on 64-threaded CPUs relative to SUMO. But with city-scale simulations of at least 100,000 vehicles, it still takes minutes for them to simulate an hour, which constrains the speed of reinforcement learning algorithms to learn by interacting with the environment. Besides, in terms of controllable objects, CityFlow only provides interfaces for setting traffic signal phases and vehicle routes while CBLab adds the setting of road speed limits as an additional feature. In terms of metrics, both CityFlow and CBLab provides lane queue length and average traveling time (ATT) directly. Most of these controllable objects and metrics are designed for traffic signal optimization, and other optimization scenarios cannot be directly implemented accordingly. Overall, there is a lack of simulators that can effectively simulate and provide rich controllable objects and metrics to support transportation system optimization problems in large scale scenarios.

2.2 Existing Transportation System Optimization Methods

Existing methods for optimizing transportation systems can be classified into rule-based and learning-based methods. Rule-based methods use expert experience to design and improve rules, relying on rules for control and optimization, e.g. the maximum pressure algorithm [31] in traffic signal control and the $\Delta\textit{-tolling}$ algorithm [28] in congestion pricing. Such methods are difficult to adapt to complex and changing traffic conditions and only consider local optimization. Learning-based methods usually use reinforcement learning [23, 27] to find the global optimal solution by making a large number of tries in a simulation environment. The traffic signal control problem is the most extensively studied problem in the field of transportation system optimization, with both rule-based methods [31] and learning-based methods [42, 35, 38, 40, 36, 24]. In the congestion pricing problem, the reinforcement learning algorithm has also been adopted [2, 4, 25, 26, 34]. Comparatively, other transportation system optimization scenarios such as dynamic lane assignment [16, 43, 11], tidal lane control [18, 41, 17], etc. do not seem to have received much attention from researchers. And existing works only focus on small-scale problems. This is most likely due to the lack of simulators that support multiple scenarios simultaneously including those mentioned above.

3 The Simulator

Refer to caption — Figure 1: The framework and pipeline of the proposed simulator. (best view in color)

In the section, we give the overview of the design of our proposed simulator for efficient microscopic traffic simulation in large-scale urban scenarios and its interface for users as shown in Figure 1.

3.1 System Design

Microscopic traffic simulation is the process of modeling and discrete-time simulation calculations for each vehicle in the transportation system. Performing one step simulation usually represents simulating a 1-second change in the real world. When facing large-scale scenarios with hundreds of thousands of vehicles, The large number of vehicle model calculations will consume a lot of computational power, resulting in a low running speed.

The development of modern computational acceleration hardware provides the basis for a solution to this problem. Single instruction multiple data (SIMD), as the basic computational model of hardware acceleration cards such as GPUs, trades off instruction flexibility for the ability to parallelize a large number of homogeneous tasks and has been used with great success in areas such as matrix arithmetic acceleration and 3D image rendering. In microscopic traffic simulation, the simulation models of individual vehicles are also highly homogeneous and therefore highly compatible with the SIMD computational model.

However, before we can simply write vehicle simulation models as CUDA code, we need to address the two problems posed by the need for vehicles to sense each other. First, in an iteration, the vehicle needs to read the position, speed, and other attributes of other vehicles as inputs to the simulation model for computing appropriate driving behaviors such as accelerating, decelerating, and changing lanes. Thereafter, the vehicle will also modify its own attributes such as position, speed, etc. based on the driving behavior of the decision. This leads to the problem of read/write conflict of vehicle data, which will affect the correctness of the simulation results. Second, the sensing behavior of a vehicle is spatially localized. Specifically, the range that the vehicle needs to sense includes only the front vehicle in the current lane and the front and rear vehicles in the adjacent lanes. Thus, implementing SIMD-friendly vehicle sensing indexes for the above retrieval task is the key to fully utilize the massive arithmetic power of the modern computational acceleration cards. Our proposed simulator designs a two-phase parallel process for read/write separation and a link-list based vehicle sensing indexes to solve the above two problems respectively. The details are described below.

Two-phase Parallel Process for Read/Write Separation. In order to ensure that vehicles always correctly read the previous step’s attributes of other vehicles in the computation and to avoid interfering with the vehicle’s computation and attribute updating process, we divide the vehicle’s attributes into two partitions: snapshot and runtime. The snapshot is a read-only data partition that always saves the public attributes of the previous step for other vehicles to access. The runtime is a private and read-write partition, the attributes of which are changed after the vehicle completes its simulation calculations. In order to implement the data replication from the runtime partition to the snapshot partition, we also divide each iteration into two sequential phases, the prepare phase and the update phase. The prepare phase is used to perform vehicle data replication in parallel and update the vehicle sensing indices based on the new snapshot data. In the update phase, the vehicle performs sensing to obtain the attributes of the snapshot partitions of other vehicles and performs the car-following model [29] and the lane-changing model [12, 7] calculations to update its own runtime partition attributes. The above process effectively avoids the read/write conflict of vehicle data, which on one hand ensures the correctness of the calculation results, and on the other hand makes the calculation flow more suitable for the SIMD calculation model due to the mutex-free structure and the highly homogeneous calculation procedures.

Linked-list based Vehicle Sensing Indexes. Faced with the need for SIMD-friendly spatial relative position indexing, it is not appropriate to intuitively employ a binary tree search. This is due to the fact that the process of tree searching leads to the control flow divergence problem, which significantly reduces the efficiency of the operation under the SIMD computation model. Therefore, we choose a bidirectional ordered linked list data structure to build the index. One linked list records all vehicles in order of spatial location in one lane. Each node on the linked list additionally contains two sets of pointers to the front and rear vehicles in both the left lane and the right lane, respectively. With such an index structure, vehicle sensing always requires only one pointer operation, avoiding the control flow divergence problem. Since there are usually only a small number of vehicles entering or leaving the lane at each step, and the order of the original vehicles on the lane is basically unchanged, the number of operations such as adding new nodes, deleting nodes, and reordering of the linked list during the index update process is relatively small so that the impact on the computational performance is acceptable. With this design, we address the second problem by providing a SIMD-friendly vehicle sensing indexes with low update cost for vehicle simulation model computation.

To help make the simulator user-friendly, we also provide a toolchain for building simulator inputs and the simulator’s Python API.

Simulator Inputs and Toolchain. Following microscopic traffic simulation setup, the simulator inputs are map data and travel demand. The map data describes the geospatial attributes and topological relationships of road networks and the candidate traffic signal phases of junctions. Travel demand describes the vehicle’s origin, destination, departure time, and chosen route. These inputs are stored in a binary format defined by Protobuf²²2https://protobuf.dev/. In order to facilitate the construction of simulator inputs, we have developed a toolchain available at https://github.com/tsinghua-fib-lab/mosstool. The toolchain mainly provide map building based on OpenStreetMap³³3https://openstreetmap.org/ and real travel demand generation based on globally available public data represented by satellite imagery. By using this toolchain, users can quickly build maps, generate travel demands, and subsequently begin simulation and optimization.

Python API. The simulator exposes the C interfaces as Python API via pybind11. The Python API consists of a series of initialization functions, getter functions, setter functions, and the next_step function that control the progress of the simulation. The setter functions usually provide batch versions additionally with the _batch suffix to minimize the data transfer overhead for large numbers of calls. This Python API dose not directly provide the gymnasium-style reinforcement learning environments, but rather requires users to build the environment by combining the above functions according to the need of scenarios.

3.2 Controllable Objects

To support the major transportation system optimization scenarios, we set up the following APIs for the controllable objects of the transportation infrastructure and traffic participants, where the simulator instance in Python is always labeled with engine.

Traffic Signal. The simulator allows the user to set the traffic signal control policies for given junctions via engine.set_tl_policy(id, policy). The policy enumeration includes MANUAL, FIXED_TIME, MAX_PRESSURE and NONE. Under the MANUAL policy, the user can change the current phase and duration of the signal via engine.set_tl_phase(id, phase_index) and engine.set_tl_duration(id, duration). The FIXED_TIME policy indicates that the fixed phase procedure built into the map data is used. The MAX_PRESSURE policy indicates that the adaptive maximum pressure algorithm [31] is used. The NONE policy indicates that there is no signaling.

Lane. Lanes in the simulator include both clearly marked lanes on the roadway and "virtual" lanes within junctions that connect the two roadways. For lanes, the user can first set their maximum speed via engine.set_lane_max_speed(id, max_speed). Secondly, the user can set whether the lane is restricted from passing via engine.set_lane_restriction(id, flag).

Road. To support dynamic changes in lane function combinations, the roadway is pre-configured with multiple lane function combination plans. Of these, lane functions are referred to as being used for going straight, turning left, and turning right. The user can set the road’s lane function plan via engine.set_road_lane_plan(id, plan_index).

Vehicle. The user can change the route of the vehicle via engine.set_vehicle_route (vehicle_id, route, end_lane, end_s) to modify its route and destination lane position.

In addition to these controllable objects, the user can also change the map before simulating to build optimization scenarios.

3.3 Metrics

To make it easier for users to calculate common microscopic and macroscopic metrics, we also provide the following metric APIs.

Lane Queue Length. Lane queue length is used to count the number of vehicles waiting to be released at the end of the lane, which is a microscopic metric often used as an input to traffic signal control algorithms. The metric is provided via engine.get_lane_waiting_at_end_vehicle_counts().

Road Traveling Time. Road traveling time indicates the time taken by vehicles to pass through the road under the current traffic flow on the road, which is a microscopic metric that directly shows how congested the road is. The metric is provided via engine.get_road_average_vehicle_speed().

Average Traveling Time. Average traveling time (ATT) is the average time taken by all vehicles to complete a trip. It is a commonly used macroscopic metric that directly reflects the overall efficiency of the transportation system. The metric is provided via engine.get_finished_vehicle_average_traveling_time().

Throughput. Throughput (TP) is used to indicate how many vehicles complete a trip in a given time period. It is also commonly used as a macroscopic metric for assessing the efficiency and capacity of a transportation system. The metric is provided via engine.get_finished_vehicle_count().

4 Transportation System Optimization Scenarios

As shown in Figure 2, We choose three microscopic optimization scenarios and two macroscopic ones for benchmarking. The former ones focus on both junction-level and roadway-level transportation infrastructure control. The latter ones include pre-construction planning phase as well as the post-construction management phase.

Traffic Signal Control. Traffic signal control is the most convenient approach to optimize the transportation system, which is also the scenario where the AI optimization methods are most widely used in the research field of transportation. The approach adjusts the phase and duration of traffic signals at junctions to control the number of vehicles passing in different directions, making full use of road resources to reduce the time spent by vehicles in the transportation system. Therefore, the appropriate setting of signal phasing and timing taking into account the interactions between junctions will substantially affect the efficiency of the transportation system.

Dynamic Lane Assignment within Junctions. Dynamic lane assignment within junctions refers to the adaptive reallocation of lane functions, such as for straight, left turn or right turn, across all lanes at the junctions based on real-time traffic conditions. For example, when there is an increase in the number of left-turning vehicles in a particular direction at an junction, the method will increase the number of lanes on the corresponding roadway used for left-turning and decrease the number of lanes used for going straight, thereby decreasing the waiting time for vehicles at the junction. How to make the correct dynamic lane assignment based on the current situation and the prediction of the future is an important transportation system optimization problem.

Tidal Lane Control. Tidal lanes are a classical traffic management strategy to manage the increased traffic pressure that is predominantly in one direction during morning and evening rush hours. This method increases roadway capacity and reduces congestion by redirecting lane usage. For instance, during the morning rush hour, more lanes might be designated for inbound traffic, while in the evening, the direction is reversed to accommodate outbound traffic. Thus, optimization of the timing and direction of tidal lane adjustments can improve commuting efficiency throughout the city.

Congestion Pricing. Congestion pricing is a macroscopic traffic management strategy that uses congestion charges for vehicles driving into specific areas or roads to control and reduce traffic flow, thereby improving traffic conditions. Through such pricing tactics, vehicles will change to routes with lower costs. From a global perspective, a good pricing strategy will balance the traffic flow and traffic pressure in different areas, and thus improve the overall traffic congestion situation.

Road Planning. Building new roads is the most direct way to increase the carrying capacity of the transportation system. Properly planning the location of new roads and their relationship to existing roads is a prerequisite for maximizing the return on investment. In this scenario, we consider a numerous set of potential new road candidates and use optimization approaches to identify the road combinations that are optimal in terms of efficiency improvement of the overall transportation system under specific constraints such as total distances, total investment, number, etc.

5 Experiments

Simulator Performance. To illustrate the computational performance of our proposed simulators, we compared the computational efficiency of simulators including SUMO, CityFlow, CBLab, and our proposed simulator for different road network scales and vehicle sizes.

We adopt the datasets from CBLab [19] with the MIT License, which includes 6 real-world city datasets and 9 synthesized datasets. We simulated 3600 steps for all the datasets and record the total running times as the performance of each simulator. All simulations are conducted in the same hardware environment with an Intel(R) Xeon(R) Platinum 8462Y CPU (64 threads) and an NVIDIA GeForce RTX 4090 GPU. As shown in Figure 3, the result indicates that our proposed simulator has a huge performance improvement over existing simulators. On the largest dataset, the running time of ours is 42.81s and that of the best baseline (CityFlow) is 3806.7s, a relative performance improvement of 88.09 times.

To benchmark the optimization algorithms for the five transportation system optimization scenarios described above, we chose Bei**g, Shanghai, Paris and New York as test cities. The road networks of these cities are built using our mosstool toolchain. The real origin-destination (OD) matrices of these cities are also generated by mosstool using generative AI methods. As synthetic datasets, in terms of vehicle departure times, we kept only the morning and evening peaks to challenge the optimization algorithms for each scenario. The total number of vehicles was scaled based on the generated real OD matrix to construct travel demand data for three different congestion levels including smooth (marked as City-S), normal (marked as City-N), congested (marked as City-C). More information on the synthetic datasets will be provided in Appendix A. We evaluate the optimization effectiveness of different algorithms under the above cities and congestion levels, using ATT and TP as global metrics for comparison. In the following text, the comparisons of the various optimization algorithms used in all the five scenarios and their performance under normal congestion will be reported. The detailed experimental settings and the complete results are presented in Appendix B and Appendix C respectively due to the page limit.

Traffic Signal Control Benchmark. In this scenario, the task is to choose the best traffic signal signal phase from the list of available phases for each junction. We compared the rule-based algorithms, including fixed-time algorithm [15] and maximum pressure algorithm [31], and the reinforcement learning-based algorithms, including FRAP [42], MPLight [3], CoLight [35], Efficient-MPLight [38], Advanced-MPLight and Advanced-Colight [40], as well as a pressure-based model trained with PPO [27]. The related algorithms are trained and tested in the morning rush hour scenario, from 7:00 to 10:00. The results are presented in Table 2.

Table 2: The benchmark results for the traffic signal control scenario with normal traffic conditions.

Method	Bei**g-N		Shanghai-N		Paris-N		New York-N
Method	ATT	TP	ATT	TP	ATT	TP	ATT	TP
FixedTime	4843.05	120049	4324.07	131020	4245.86	60020	4682.07	72725
MaxPressure	4580.41	132055	4045.75	144640	3984.20	64405	4309.91	83927
FRAP	5105.22	112321	4671.52	121896	4404.23	58481	5002.48	68196
MPLight	4790.12	124991	4674.93	122198	3980.07	64921	4196.37	85331
CoLight	5108.88	112672	4640.50	124565	4413.91	58513	4989.17	68184
Efficient-MPLight	5101.14	113272	4480.28	132995	4364.89	60428	5019.92	67303
Advanced-MPLight	5049.47	117568	4603.57	127911	4281.72	61470	5031.15	67322
Advanced-CoLight	5107.23	112778	4661.07	123336	4408.14	58929	5014.66	68292
PPO	4452.05	136630	4143.19	141768	4017.61	64277	4254.92	84411

Dynamic Lane Assignment within Junctions Benchmark. In this scenario, the task is to assign the direction, e.g. left or straight, for the in-going lanes of each junction. We compared the following methods: 1) NoChange, where we do not change the direction of the lane and leave it as it is, 2) Random, where we randomly change the direction in every period, 3) Rule, where we estimate the number of vehicles going for each direction and choosing the direction with the maximum number of vehicles, 4) PPO, where we use a PPO-trained model to estimate the number of vehicles. The above algorithms are trained and tested in the morning rush hour scenario, from 7:00 to 10:00. The results are presented in Table 3.

Table 3: The benchmark results for the dynamic lane assignment scenario with normal traffic conditions.

Method	Bei**g-N		Shanghai-N		Paris-N		New York-N
Method	ATT	TP	ATT	TP	ATT	TP	ATT	TP
NoChange	4846.70	119890	4322.21	131145	4245.84	60020	4674.08	72870
Random	4839.71	120338	4324.96	131216	4176.70	61810	4636.11	74055
Rule	4761.33	123673	4258.22	133346	4155.11	62366	4615.01	74254
PPO	4769.98	122929	4256.51	133379	4160.89	61792	4614.52	73907

Tidal Lane Control Benchmark. In this scenario, the task is to switch the direction of the tidal lane to be forward or backward. We compared the following methods: 1) NoChange, where we disable the tidal lane, 2) Random, where we randomly change the direction in every period, 3) Rule, where we count the number of vehicles going in each direction and choosing the direction with the maximum number of vehicles, 4) PPO, where we use a PPO-trained model to estimate the number of vehicles. The above algorithms are trained and tested in the morning rush hour scenario, from 7:00 to 10:00. The results are presented in Table 4.

Table 4: The benchmark results for the tidal lane control scenario with normal traffic conditions.

Method	Bei**g-N		Shanghai-N		Paris-N		New York-N
Method	ATT	TP	ATT	TP	ATT	TP	ATT	TP
NoChange	4844.57	120105	4334.80	130604	4224.94	60794	4675.10	72834
Random	4827.75	120778	4338.40	130283	4216.17	60946	4665.83	73335
Rule	4823.40	120901	4313.48	131315	4192.85	61636	4638.03	74284
PPO	4820.48	120936	4304.29	132167	4187.37	61738	4628.47	74756

Congestion Pricing Benchmark. In this scenario, each driver has three candidate routes and the task is to set the price of each road to motivate drivers to choose the route that avoids congested areas. We compared $\Delta$ -toll [28] and EBGtoll [26] with two baselines: 1) NoChange, where we do not set the prices, 2) Random, where the drivers randomly choose a route. The above algorithms are trained and tested in the morning rush hour scenario, from 7:00 to 10:00. The results are presented in Table 5.

Table 5: The benchmark results for the congestion pricing scenario with normal traffic conditions.

Method	Bei**g-N		Shanghai-N		Paris-N		New York-N
Method	ATT	TP	ATT	TP	ATT	TP	ATT	TP
NoChange	4840.18	120207	4328.34	130621	4239.23	60100	4681.75	72765
Random	5190.37	105640	4422.00	131346	4144.47	62284	4830.86	68976
$\Delta$ -toll	4667.77	131747	4096.62	147533	4040.68	65024	4549.26	78182
EBGtoll	5637.30	80476	4637.64	116624	4240.82	60362	5096.60	59154

Road Planning Benchmark. In this scenario, the algorithms are asked to select at most 30 roads from 50 candidates for construction to minimize post-construction ATT. We compared 5 methods: 1) NoChange, no of these 50 roads are built, 2) Random, where we select random roads to build, 3) Rule-based, where we select the top-30 vehicle count roads to build, 4) simulated annealing [14], 5) bayesian optimization [6]. The above algorithms are tested both on morning peak from 6:00 to 12:00 and evening peak from 17:00 to 23:00 and computes the mean of metrics. The result are presented in Table 6.

Table 6: The benchmark results for the road planning scenario with normal traffic conditions.

Method	Bei**g-N		Shanghai-N		Paris-N		New York-N
Method	ATT	TP	ATT	TP	ATT	TP	ATT	TP
No-Change	8439.44	161722	6699.66	161722	7193.90	76327	7892.50	105533
Random	8304.71	163148	6567.12	181063	7106.19	75839	7967.24	102996
Rule	8235.49	164247	6570.72	181504	7177.88	76176	7956.08	102951
SA	8332.44	163660	6590.66	180954	7154.16	76164	7871.23	105507
GeneralBO	8242.19	164182	6721.78	178790	7161.74	75979	7759.88	106883

6 Conclusion

In this paper, we propose a high-performance large-scale microscopic simulator powered by GPU for transportation system simulation and optimization. We also benchmarked the effect of different optimization algorithms on five transportation system optimization scenarios with different traffic flows in four cities. Interested researchers can use the same pipeline to benchmark most cities around the world with our open source simulator and toolchain. We believe that the proposed simulator will contribute to more researchers joining the research work on urban transportation system optimization. We hope that this will not only support more research work on transportation system optimization scenarios, but also promote the development of urban transportation systems towards AI-driven intelligent transportation systems.

References

[1] Michael Behrisch, Laura Bieker, Jakob Erdmann, and Daniel Krajzewicz. Sumo–simulation of urban mobility: an overview. In Proceedings of SIMUL 2011, The Third International Conference on Advances in System Simulation. ThinkMind, 2011.
[2] Hamid Mirzaei Buini, Guni Sharon, Stephen D. Boyles, Tony Givargis, and Peter Stone. Enhanced delta-tolling: Traffic optimization via policy gradient reinforcement learning. 2018 21st International Conference on Intelligent Transportation Systems (ITSC), pages 47–52, 2018.
[3] C. Chen, H. Wei, N. Xu, G. Zheng, M. Yang, Y. Xiong, K. Xu, and Z. Li. Toward a thousand lights: Decentralized deep reinforcement learning for large-scale traffic signal control. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 3414–3421, 2020.
[4] Haipeng Chen, Bo An, Guni Sharon, Josiah P. Hanna, Peter Stone, Chunyan Miao, and Yeng Chai Soh. Dyetc: Dynamic electronic toll collection for traffic congestion alleviation. In AAAI Conference on Artificial Intelligence, 2018.
[5] Alberto Costa and Giacomo Nannicini. Rbfopt: an open-source library for black-box optimization with costly function evaluations. Mathematical Programming Computation, 10:597–629, 2018.
[6] Alexander I Cowen-Rivers, Wenlong Lyu, Zhi Wang, Rasul Tutunov, Hao Jianye, Jun Wang, and Haitham Bou Ammar. Hebo: Heteroscedastic evolutionary bayesian optimisation. arXiv preprint arXiv:2012.03826, page 7, 2020.
[7] Shuo Feng, Xintao Yan, Haowei Sun, Yiheng Feng, and Henry X Liu. Intelligent driving intelligence test for autonomous vehicles with naturalistic and adversarial environment. Nature communications, 12(1):748, 2021.
[8] Christian Gawron. An iterative algorithm to determine the dynamic user equilibrium in a traffic simulation model. International Journal of Modern Physics C, 9(03):393–407, 1998.
[9] PTV Group. Transport Planning Software | PTV Visum. https://www.ptvgroup.com/en/products/ptv-visum. Accessed: 2024-06-03.
[10] Nikolaus Hansen, Anne Auger, Raymond Ros, Steffen Finck, and Petr Pošík. Comparing results of 31 algorithms from the black-box optimization benchmarking bbob-2009. In Proceedings of the 12th annual conference companion on Genetic and evolutionary computation, pages 1689–1696, 2010.
[11] Qize Jiang, **gze Li, Weiwei Sun, and Baihua Zheng. Dynamic lane traffic signal control with group attention and multi-timescale reinforcement learning. In International Joint Conference on Artificial Intelligence, 2021.
[12] Arne Kesting, Martin Treiber, and Dirk Helbing. General lane-changing model mobil for car-following models. Transportation Research Record, 1999(1):86–94, 2007.
[13] Leonard Kirago, Michael J Gatari, Örjan Gustafsson, and August Andersson. Black carbon emissions from traffic contribute substantially to air pollution in nairobi, kenya. Communications Earth & Environment, 3(1):74, 2022.
[14] Scott Kirkpatrick, C Daniel Gelatt Jr, and Mario P Vecchi. Optimization by simulated annealing. science, 220(4598):671–680, 1983.
[15] P. Koonce and L. Rodegerdts. Traffic signal timing manual. Technical report, United States. Federal Highway Administration, 2008.
[16] Lili Li, Zhao wei Qu, Xian min Song, and Dianhai Wang. Research on variable lane signalized control method. 2009 International Conference on Measuring Technology and Mechatronics Automation, 3:575–578, 2009.
[17] Tao Li, Nengmin Wang, Meng Zhang, and Zheng wen He. Dynamic reversible lane optimization in autonomous driving environments: Balancing efficiency and safety. Journal of Industrial and Management Optimization, 2023.
[18] Xu Li, Jun-Hua Chen, and Hao Wang. Study on flow direction changing method of reversible lanes on urban arterial roadways in china. Procedia - Social and Behavioral Sciences, 96:807–816, 2013.
[19] Chumeng Liang, Zherui Huang, Yicheng Liu, Zhanyu Liu, Guanjie Zheng, Hanyuan Shi, Kan Wu, Yuhao Du, Fuliang Li, and Zhenhui Jessie Li. Cblab: Supporting the training of large-scale traffic control policies with scalable traffic simulation. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 4449–4460, 2023.
[20] Yiling Liu, Guiyang Luo, Quan Yuan, **glin Li, Lei **, Bo Chen, and Rui Pan. Gplight: grouped multi-agent reinforcement learning for large-scale traffic signal control. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, pages 199–207, 2023.
[21] HS Mahmassani. Dynamic traffic assignment and simulation for advanced network informatics (dynasmart). In the 2nd International Seminar on Urban Traffic Networks, 1992, 1992.
[22] Michael G McNally. The four-step model. In Handbook of transport modelling, volume 1, pages 35–53. Emerald Group Publishing Limited, 2007.
[23] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
[24] Afshin Oroojlooy, M. Nazari, Davood Ha**ezhad, and Jorge Silva. Attendlight: Universal attention-based reinforcement learning model for traffic signal control. Advances in Neural Information Processing Systems, 2020.
[25] Venktesh Pandey and Stephen D. Boyles. Multiagent reinforcement learning algorithm for distributed dynamic pricing of managed lanes. 2018 21st International Conference on Intelligent Transportation Systems (ITSC), pages 2346–2351, 2018.
[26] Wei Qiu, Haipeng Chen, and Bo An. Dynamic electronic toll collection via multi-agent deep reinforcement learning with edge-based graph convolutional networks. In IJCAI, pages 4568–4574, 2019.
[27] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
[28] Guni Sharon, Michael W Levin, Josiah P Hanna, Tarun Rambha, Stephen D Boyles, and Peter Stone. Network-wide adaptive tolling for connected and automated vehicles. Transportation Research Part C: Emerging Technologies, 84:142–157, 2017.
[29] Martin Treiber, Ansgar Hennecke, and Dirk Helbing. Congested traffic states in empirical observations and microscopic simulations. Physical review E, 62(2):1805, 2000.
[30] Martin Treiber, Arne Kesting, and Christian Thiemann. How much does traffic congestion increase fuel consumption and emissions? applying a fuel consumption model to the ngsim trajectory data. In 87th Annual Meeting of the Transportation Research Board, Washington, DC, volume 71, pages 1–18, 2008.
[31] Pravin Varaiya. Max pressure control of a network of signalized intersections. Transportation Research Part C: Emerging Technologies, 36:177–195, 2013.
[32] Kay W Axhausen, Andreas Horni, and Kai Nagel. The multi-agent transport simulation MATSim. Ubiquity Press, 2016.
[33] Qi Wang, Haixia Feng, Haiying Feng, Yue Yu, Jian Li, and Erwei Ning. The impacts of road traffic on urban air quality in **an based gwr and remote sensing. Scientific reports, 11(1):15512, 2021.
[34] Yiheng Wang, Hexi **, and Guanjie Zheng. Ctrl: Cooperative traffic tolling via reinforcement learning. Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022.
[35] Hua Wei, Nan Xu, Huichu Zhang, Guanjie Zheng, Xinshi Zang, Chacha Chen, Weinan Zhang, Yanmin Zhu, Kai Xu, and Zhenhui Li. Colight: Learning network-level cooperation for traffic signal control. In Proceedings of the 28th ACM international conference on information and knowledge management, pages 1913–1922, 2019.
[36] Qiang Wu, Ming Li, Jun Shen, Linyuan Lü, Bo Du, and Kecheng Zhang. Transformerlight: A novel sequence modeling based traffic signaling mechanism via gated transformer. Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023.
[37] Qiang Wu, Mingyuan Li, Jun Shen, Linyuan Lü, Bo Du, and Ke Zhang. Transformerlight: A novel sequence modeling based traffic signaling mechanism via gated transformer. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2639–2647, 2023.
[38] Qiang Wu, Liang Zhang, Jun Shen, Linyuan Lü, Bo Du, and Jianqing Wu. Efficient pressure: Improving efficiency for signalized intersections. arXiv preprint arXiv:2112.02336, 2021.
[39] Huichu Zhang, Siyuan Feng, Chang Liu, Yaoyao Ding, Yichen Zhu, Zihan Zhou, Weinan Zhang, Yong Yu, Haiming **, and Zhenhui Li. Cityflow: A multi-agent reinforcement learning environment for large scale city traffic scenario. In The world wide web conference, pages 3620–3624, 2019.
[40] Liang Zhang, Qiang Wu, Jun Shen, Linyuan Lü, Bo Du, and Jianqing Wu. Expression might be enough: representing pressure and demand for reinforcement learning based traffic signal control. In International Conference on Machine Learning, pages 26645–26654. PMLR, 2022.
[41] Zuoting Zhang and Suhua Tang. Enhancing urban road network by combining route planning and dynamic lane reversal. 2021 Thirteenth International Conference on Mobile Computing and Ubiquitous Network (ICMU), pages 1–6, 2021.
[42] Guanjie Zheng, Yuanhao Xiong, Xinshi Zang, Jie Feng, Hua Wei, Huichu Zhang, Yong Li, Kai Xu, and Zhenhui Li. Learning phase competition for traffic signal control. In Proceedings of the 28th ACM international conference on information and knowledge management, pages 1963–1972, 2019.
[43] Lihua Zhou, Juanjuan Li, and Kangkang Ding. Research on variable lane control method based on traffic priority. In International Conferences on Artificial Intelligence, Information Processing and Cloud Computing, 2019.

Checklist

1.
For all authors…
1. (a)
  
  Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope? [Yes]
2. (b)
  
  Did you describe the limitations of your work? [No]
3. (c)
  
  Did you discuss any potential negative societal impacts of your work? [No] We do not think that the traffic simulator will have any negative societal impacts.
4. (d)
  
  Have you read the ethics review guidelines and ensured that your paper conforms to them? [Yes]
2.
If you are including theoretical results…
1. (a)
  
  Did you state the full set of assumptions of all theoretical results? [N/A] There is no theoretical result in the paper.
2. (b)
  
  Did you include complete proofs of all theoretical results? [N/A] There is no theoretical result in the paper.
3.
If you ran experiments (e.g. for benchmarks)…
1. (a)
  
  Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] Provided in our codes URL.
2. (b)
  
  Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] Provided in Appendix A and Appendix B.
3. (c)
  
  Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? [No]
4. (d)
  
  Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [No]
4.
If you are using existing assets (e.g., code, data, models) or curating/releasing new assets…
1. (a)
  
  If your work uses existing assets, did you cite the creators? [Yes] We use CBLab datasets for performance evaluation and already cite it in the paper.
2. (b)
  
  Did you mention the license of the assets? [Yes]
3. (c)
  
  Did you include any new assets either in the supplemental material or as a URL? [Yes] We provide all the scenario codes and our synthetic datasets in the Github URL.
4. (d)
  
  Did you discuss whether and how consent was obtained from people whose data you’re using/curating? [N/A] The data are generated from public data sources.
5. (e)
  
  Did you discuss whether the data you are using/curating contains personally identifiable information or offensive content? [N/A] The data are generated without any personally identifiable information or offensive content.
5.
If you used crowdsourcing or conducted research with human subjects…
1. (a)
  
  Did you include the full text of instructions given to participants and screenshots, if applicable? [N/A] No crowdsourcing was used and no human subjects were included.
2. (b)
  
  Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable? [N/A] No crowdsourcing was used and no human subjects were included.
3. (c)
  
  Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation? [N/A] No crowdsourcing was used and no human subjects were included.

Supplementary material

We present the following items in the supplementary material section:

1.

Datasets for transportation system optimization benchmarking. (Section A)
2.

The settings of transportation system optimization scenarios. (Section B)
3.

Complete benchmark results. (Section C)

Appendix A Datasets for Transportation System Optimization Benchmarking

In the section, we describe in detail the process of constructing the datasets used for transportation system optimizaiton benchmarking. We chose 4 representative big cities around the world, including Bei**g, Shanghai, Paris and New York, as our targets. We use OpenStreetMap (OSM) ⁴⁴4https://openstreetmap.org/ as the data source for map construction and a diffusion model based on publicly available data represented by satellite imagery as input to generate realistic travel origin-destination (OD) matrices as travel demands.

Map Building. First, based on mosstool, we selected the bounding boxes as shown in Table 7 for each city for to build the map.

Table 7: Geometry bounding boxes of four cities.

Bounding Box	maximum latitude	minimum latitude	maximum longitude	minimum longitude
Bei**g	40.131	39.771	116.626	116.158
Shanghai	31.389	31.100	121.676	121.313
Paris	48.949	48.745	2.514	2.131
New York	40.941	40.567	-73.697	-74.058

Specifically, we first use code like https://github.com/tsinghua-fib-lab/mosstool/blob/main/examples/map_osm2geojson.py to convert OSM data within the bounding boxes to GeoJSON format, and secondly use code like https://github.com/tsinghua-fib-lab/mosstool/blob/main/examples/build_map.py to build maps from GeoJSON format data. The statistics of the maps of the four cities are shown in Table 8

Table 8: Statistics of the four maps.

Statistics	# of roads	# of junctions
Bei**g	25945	11953
Shanghai	14837	6270
Paris	14411	6588
New York	19046	8339

Realistic OD Matrix Generation. In order to generate realistic travel demands of the four cities, we perform OD matrix generation based on a diffusion model that has been pre-trained in several regions around the world. The diffusion model is also provided in mosstool.

Obtaining Travel Demand of Different Congestion Levels. In order not to introduce too many variables, we assume that driving is used for all trips. In addition, to better represent commuting traffic, we assume that the departure times of all vehicles are limited to the morning and evening peaks. And we scale the total traffic volume to get the travel demand under different congestion levels. For the morning peak, we adjust vehicles’ arrival time to create a morning peak flows by using a uniform distribution between 8 o’clock and 9 o’clock. We then subtracted the estimated travel time from the arrival time to get the departure time. The estimated travel time is calculated by dividing the route length by the vehicle speed, which is set as $60km/h$ for our experiment.

Similarly, we create an evening peak group by exchanging the origin and destination of individuals from the morning peak flows. Their departure times are set to be uniformly distributed between 17 o’clock and 18 o’clock.

After completing the above steps, we assign the route based on the shortest time to each vehicles and remove those who are unable to reach their destinations. For each city, we scale the number of vehicles and observe the arrival rate of all vehicles, which refers to the rate of vehicles that successfully reached their destinations, to construct datasets with different congestion levels.

The arrival rate of the dataset is determined as the minimum rate between the morning and evening peak periods. Specifically, an arrival rate of $80\%$ is considered congested, $90\%$ is considered normal, and $95\%$ is considered smooth. Based on the above rates, we construct the travel demand datasets under different congestion levels in the four cities, and the relevant statistics are shown in Table 9.

Table 9: # of trips of datasets.

Congestion Level	Smooth	Normal	Congested
Bei**g	350838	439280	571412
Shanghai	348880	436888	612160
Paris	154276	202664	251236
New York	218712	262706	306078

Appendix B The Settings of Transportation System Optimization Scenarios

All experiments are conducted in the same hardware environment with an Intel(R) Xeon(R) Platinum 8462Y CPU (64 threads) and an NVIDIA GeForce RTX 4090 GPU. The training time varies across different scenarios. The optimal hyper-parameters are grid-searched and hard-coded into the released code. Please refer to the release files for detailed hyper-parameter settings.

B.1 Traffic Signal Control.

Scenario. There are multiple junctions in the road network with traffic signals to be controlled. Each junction has a list of available traffic signal phases predefined according to the geometry of the junction, like the number and direction of the incoming and outgoing lanes. Every $T=30$ seconds, the agent has to choose one phase from the list to be applied in the next period.

Observation. The observation includes the geometry of the junction and the number of (all/waiting) vehicles on each lane.

Action. Choose one phase from the given list.

Reward. Opposite of the average number of waiting vehicles on the incoming lanes.

Training. The learning-based methods are all trained for 4 hours.

B.2 Dynamic Lane Assignment within Junctions.

Scenario. There are multiple roads in the road network with dynamic lanes at the end where the roads connect to junctions. Each road has exactly one dynamic lane whose direction can be either LEFT or STRAIGHT. Every $T=30$ seconds, the agent has to assign the direction of the dynamic lane.

Observation. The observation includes the geometry of the junction and the number of (all/waiting) vehicles on each lane.

Action. Choose one of the two directions.

Reward. Opposite of the average number of waiting vehicles on the lanes of the road.

Training. The learning-based methods are all trained for 3 hours.

B.3 Tidal Lane Control.

Scenario. There are multiple road pairs in the road network with tidal lanes. Each road pair has exactly one tidal lane in the center whose direction can be either FORWARD or BACKWARD. Every $T=180$ seconds, the agent has to choose the direction of the tidal lane.

Observation. The observation includes the geometry of the road and the number of (all/waiting) vehicles on each lane.

Action. Choose one of the two directions.

Reward. Opposite of the average number of waiting vehicles on the lanes of the road.

Training. The learning-based methods are all trained for 3 hours.

B.4 Congestion Pricing.

Scenario. All the roads in the road network can be set with a congestion price for vehicles traveling through it. Every $T=20$ seconds, the agent can change the prices according to the traffic condition.

Observation. The observation includes the geometry of the road network and the number of (all/waiting) vehicles on each lane.

Action. Set the prices for each road.

Reward. The number of finished vehicles in the past period.

Training. The learning-based methods are all trained for 3 hours.

B.5 Road Planning.

Scenario. In the road network, there are multiple newly constructed roads during the past five years. Each of these roads has two statuses, either KEEP or REMOVE. The algorithms observe the ATT and are asked to minimize the ATT by setting the road statues as KEEP or REMOVE.

Candidate Roads Identification. For each city, we extract driving roads from OSM of 2019. We match every road in our map to the road network of 2019, there are three aspects to evaluate the matching, the distance between two roads, the distance between the middle point of road in our map and difference highway level between two roads. Any road that cannot be matched with any roads in 2019 is identified as a newly constructed road, and regarded as a candidate road. The spatial and length statistics of candidate roads are shown in Figure 4. We select the 50 roads with the highest number of vehicles from each candidate set as the optimization set for the algorithm.

Table 10: # of candidate roads.

Basic Statistics	# of roads
Bei**g	263
Shanghai	136
Paris	156
New York	612

Appendix C Complete Benchmark Results

C.1 Traffic Signal Control.

Table 11: The benchmark results for the traffic signal control scenario with smooth traffic conditions.

Method	Bei**g-S		Shanghai-S		Paris-S		New York-S
Method	ATT	TP	ATT	TP	ATT	TP	ATT	TP
FixedTime	4426.51	109588	3878.57	120449	3749.65	52459	4369.44	67046
MaxPressure	4099.14	122737	3523.45	132028	3477.37	55123	3951.76	77018
FRAP	4741.04	102556	4261.76	113613	3920.69	51301	4725.86	63572
MPLight	4086.96	121620	3474.95	133044	3486.59	55946	3840.82	78125
CoLight	4757.27	102433	4221.91	115244	3922.72	51351	4714.33	63553
Efficient-MPLight	4529.54	110199	4013.16	121301	3953.45	51236	4390.86	69279
Advanced-MPLight	4750.34	102649	3997.82	120529	3786.51	53982	4378.73	70352
Advanced-CoLight	4740.79	103172	4245.06	114927	3945.78	51164	4748.60	63463
PPO	4005.74	124001	3636.27	130485	3485.31	55915	3914.77	77007

Table 12: The benchmark results for the traffic signal control scenario with normal traffic conditions.

Method	Bei**g-N		Shanghai-N		Paris-N		New York-N
Method	ATT	TP	ATT	TP	ATT	TP	ATT	TP
FixedTime	4843.05	120049	4324.07	131020	4245.86	60020	4682.07	72725
MaxPressure	4580.41	132055	4045.75	144640	3984.20	64405	4309.91	83927
FRAP	5105.22	112321	4671.52	121896	4404.23	58481	5002.48	68196
MPLight	4790.12	124991	4674.93	122198	3980.07	64921	4196.37	85331
CoLight	5108.88	112672	4640.50	124565	4413.91	58513	4989.17	68184
Efficient-MPLight	5101.14	113272	4480.28	132995	4364.89	60428	5019.92	67303
Advanced-MPLight	5049.47	117568	4603.57	127911	4281.72	61470	5031.15	67322
Advanced-CoLight	5107.23	112778	4661.07	123336	4408.14	58929	5014.66	68292
PPO	4452.05	136630	4143.19	141768	4017.61	64277	4254.92	84411

Table 13: The benchmark results for the traffic signal control scenario with congested traffic conditions.

Method	Bei**g-C		Shanghai-C		Paris-C		New York-C
Method	ATT	TP	ATT	TP	ATT	TP	ATT	TP
FixedTime	5274.90	130419	4928.58	145308	4587.84	66608	4923.74	76165
MaxPressure	5106.47	142654	4790.71	158804	4371.45	71874	4596.29	87897
FRAP	5490.86	122061	5205.55	133039	4743.12	64711	5219.06	71448
MPLight	5496.34	121452	4740.95	160551	4378.05	72643	4468.59	90632
CoLight	5484.53	122508	5185.19	135131	4750.70	64463	5214.91	72091
Efficient-MPLight	5375.55	130151	5104.42	142562	4718.07	65795	5225.18	71556
Advanced-MPLight	5358.05	129512	5129.86	138727	4650.76	67593	5227.87	71713
Advanced-CoLight	5481.23	122564	5207.07	133709	4756.00	64517	5222.79	71697
PPO	4975.47	149589	4814.59	155279	4394.63	71491	4535.97	90111

C.2 Dynamic Lane Assignment within Junctions.

Table 14: The benchmark results for the dynamic lane assignment scenario with smooth traffic conditions.

Method	Bei**g-S		Shanghai-S		Paris-S		New York-S
Method	ATT	TP	ATT	TP	ATT	TP	ATT	TP
NoChange	4426.43	109588	3878.56	120449	3749.64	52459	4369.47	67046
Random	4435.44	109450	3875.35	120896	3681.61	53474	4344.24	67271
Rule	4340.58	112462	3807.56	122688	3675.52	53244	4310.52	67737
PPO	4345.00	111391	3804.21	122692	3663.37	53629	4309.67	67626

Table 15: The benchmark results for the dynamic lane assignment scenario with normal traffic conditions.

Method	Bei**g-N		Shanghai-N		Paris-N		New York-N
Method	ATT	TP	ATT	TP	ATT	TP	ATT	TP
NoChange	4846.70	119890	4322.21	131145	4245.84	60020	4674.08	72870
Random	4839.71	120338	4324.96	131216	4176.70	61810	4636.11	74055
Rule	4761.33	123673	4258.22	133346	4155.11	62366	4615.01	74254
PPO	4769.98	122929	4256.51	133379	4160.89	61792	4614.52	73907

Table 16: The benchmark results for the dynamic lane assignment scenario with congested traffic conditions.

Method	Bei**g-C		Shanghai-C		Paris-C		New York-C
Method	ATT	TP	ATT	TP	ATT	TP	ATT	TP
NoChange	5275.00	130419	4928.51	145308	4587.81	66608	4923.79	76165
Random	5294.29	129976	4932.60	144774	4539.27	68410	4907.50	77460
Rule	5221.05	134661	4875.04	148393	4521.80	68795	4863.84	78469
PPO	5240.12	131854	4874.53	148228	4522.36	68413	4854.74	78857

C.3 Tidal Lane Control.

Table 17: The benchmark results for the tidal lane control scenario with smooth traffic conditions.

Method	Bei**g-S		Shanghai-S		Paris-S		New York-S
Method	ATT	TP	ATT	TP	ATT	TP	ATT	TP
NoChange	4422.68	109912	3884.78	120156	3723.22	52992	4388.48	66269
Random	4418.86	110150	3887.78	119940	3711.32	53313	4360.21	67295
Rule	4416.80	109997	3870.30	120366	3687.63	53626	4341.27	68113
PPO	4411.00	110161	3857.94	121344	3674.53	53922	4325.07	68775

Table 18: The benchmark results for the tidal lane control scenario with normal traffic conditions.

Method	Bei**g-N		Shanghai-N		Paris-N		New York-N
Method	ATT	TP	ATT	TP	ATT	TP	ATT	TP
NoChange	4844.57	120105	4334.80	130604	4224.94	60794	4675.10	72834
Random	4827.75	120778	4338.40	130283	4216.17	60946	4665.83	73335
Rule	4823.40	120901	4313.48	131315	4192.85	61636	4638.03	74284
PPO	4820.48	120936	4304.29	132167	4187.37	61738	4628.47	74756

Table 19: The benchmark results for the tidal lane control scenario with congested traffic conditions.

Method	Bei**g-C		Shanghai-C		Paris-C		New York-C
Method	ATT	TP	ATT	TP	ATT	TP	ATT	TP
NoChange	5278.35	130133	4937.88	144546	4575.58	67201	4923.74	76253
Random	5274.19	130456	4937.78	143921	4568.53	67661	4903.82	77236
Rule	5259.98	131304	4908.70	146094	4555.47	68025	4877.93	79096
PPO	5258.72	131709	4909.53	146145	4544.70	68274	4860.92	79731

C.4 Congestion Pricing.

Table 20: The benchmark results for the congestion pricing scenario with smooth traffic conditions.

Method	Bei**g-S		Shanghai-S		Paris-S		New York-S
Method	ATT	TP	ATT	TP	ATT	TP	ATT	TP
NoChange	4433.37	109604	3875.91	120444	3743.18	52611	4368.11	66984
Random	4865.59	96231	3969.04	121137	3705.23	53904	4571.66	63670
$\Delta$ -toll	4267.05	118077	3630.38	133056	3611.00	54762	4246.56	72188
EBGtoll	5348.10	75078	4230.05	107683	3759.70	52424	4837.93	55155

Table 21: The benchmark results for the congestion pricing scenario with normal traffic conditions.

Method	Bei**g-N		Shanghai-N		Paris-N		New York-N
Method	ATT	TP	ATT	TP	ATT	TP	ATT	TP
NoChange	4840.18	120207	4328.34	130621	4239.23	60100	4681.75	72765
Random	5190.37	105640	4422.00	131346	4144.47	62284	4830.86	68976
$\Delta$ -toll	4667.77	131747	4096.62	147533	4040.68	65024	4549.26	78182
EBGtoll	5637.30	80476	4637.64	116624	4240.82	60362	5096.60	59154

Table 22: The benchmark results for the congestion pricing scenario with congested traffic conditions.

Method	Bei**g-C		Shanghai-C		Paris-C		New York-C
Method	ATT	TP	ATT	TP	ATT	TP	ATT	TP
NoChange	5281.92	129904	4927.87	145154	4588.83	66603	4928.00	76140
Random	5558.59	115184	5027.27	143400	4488.19	69075	5066.64	72916
$\Delta$ -toll	5141.39	143518	4767.74	162829	4399.95	72309	4801.01	82602
EBGtoll	5912.51	86945	5182.92	128301	4571.64	66971	5275.42	63052

C.5 Road Planning.

Table 23: The benchmark results for the road planning scenario with smooth traffic conditions.

Method	Bei**g-S		Shanghai-S		Paris-S		New York-S
Method	ATT	TP	ATT	TP	ATT	TP	ATT	TP
No-Change	7411.53	137793	5432.11	151689	5997.46	62016	6954.14	92756
Random	7216.43	139070	5325.55	151828	5899.89	62070	7176.53	88947
Rule	7240.73	138795	5352.50	152124	5907.31	62155	6942.72	92463
SA	7272.01	139326	5358.86	151983	5772.82	62990	6921.12	92754
GeneralBO	7102.82	139983	5297.74	152046	5755.44	62478	6808.24	93858

Table 24: The benchmark results for the road planning scenario with normal traffic conditions.

Method	Bei**g-N		Shanghai-N		Paris-N		New York-N
Method	ATT	TP	ATT	TP	ATT	TP	ATT	TP
No-Change	8439.44	161722	6699.66	161722	7193.90	76327	7892.50	105533
Random	8304.71	163148	6567.12	181063	7106.19	75839	7967.24	102996
Rule	8235.49	164247	6570.72	181504	7177.88	76176	7956.08	102951
SA	8332.44	163660	6590.66	180954	7154.16	76164	7871.23	105507
GeneralBO	8242.19	164182	6721.78	178790	7161.74	75979	7759.88	106883

Table 25: The benchmark results for the road planning scenario with congested traffic conditions.

Method	Bei**g-C		Shanghai-C		Paris-C		New York-C
Method	ATT	TP	ATT	TP	ATT	TP	ATT	TP
No-Change	9684.37	191786	8793.09	216625	8310.09	87786	8624.63	117606
Random	9496.80	195463	8651.96	219199	8230.79	87704	8663.12	115278
Rule	9489.22	195427	8656.58	218214	8250.89	87872	8682.73	114650
SA	9487.05	195918	8600.00	219711	8188.43	88650	8589.73	118036
GeneralBO	9369.14	198446	8508.84	221049	8161.15	88803	8542.19	118579