Model Predictive Simulation Using Structured Graphical Models and Transformers
Xinghua Lou
Google DeepMind
Meet Dave
Google DeepMind
Shrinu Kushagra
Google DeepMind
Miguel Lázaro-Gredilla
Google DeepMind
Kevin Murphy
Google DeepMind
Abstract
We propose an approach to simulating trajectories of multiple interacting agents (road users)
based on transformers
and probabilistic graphical models (PGMs),
and apply it to the Waymo SimAgents challenge.
The transformer baseline is based on the
MTR model (Shi et al., 2024),
which predicts multiple future trajectories
conditioned on the past trajectories
and static road layout features.
We then improve upon these generated trajectories
using a PGM, which contains factors which encode prior knowledge, such as a preference for smooth trajectories, and avoidance of collisions with static obstacles and other moving agents.
We perform (approximate) MAP inference in this PGM using the Gauss-Newton method.
Finally we sample trajectories
for each of the agents for the next
time steps,
where is the sampling rate per second.
Following the Model Predictive Control (MPC) paradigm,
we only return the first element of our forecasted
trajectories at each step,
and then we replan, so that the simulation can constantly adapt to its changing environment.
We therefore call our approach "Model Predictive Simulation" or MPS.
We show that MPS improves upon the MTR baseline,
especially in safety critical metrics such as collision rate.
Furthermore, our approach is compatible with any underlying forecasting model, and does not require extra training, so we believe it is a valuable contribution to the community.
keywords:
Multi-Agent Planning, Probabilistic Graphical Model, Model Predictive Control, Transformer
1 Introduction
The use of transformers to create generative models to simulate agent trajectories,
trained on large datasets such as Waymo Open Data
(Ettinger et al., 2021),
has become very popular in recent years.
Most previous work has been focusing on improving the architecture (Nayakanti et al., 2023; Shi et al., 2024),
the training objective (Ngiam et al., 2021; Shi et al., 2024),
the trajectory representation (Seff et al., 2023; Philion et al., 2023) or the speed (Zhou et al., 2023) of these transformer-based models.
This paper tackles the problem from an orthogonal and complementary angle – namely the use of prior knowledge, encoded using a probabilistic graphical model (PGM).
We perform approximate MAP inference in the PGM to “post process”
the trajectory proposals from a base transformer
model, to increase their realism
and compliance with constraints, such as collision avoidance.
To ensure that our predicted forecasts are adaptive to the changing environment, we replan at each step, following the principle of model predictive control (MPC),
which is widely used for controlling complex dynamical systems (Schwenzer et al., 2021).
We therefore call our approach
Model Predictive Simulation (MPS).
Our MPS approach differs from previous PGM methods for trajectory simulation, such as
JFP (Luo et al., 2023),
in several ways.
First, we explicitly include (data-dependent) factors
for collision avoidance
and smooth trajectories,
so we have better control over the generated trajectories.
Second, our approach is iterative (being based on MPC), while JFP commits to the trajectory proposals at and is thus open loop.
Third, our approach uses the Gauss-Newton method to compute the joint MAP estimate,
whereas JFP is based on discrete belief propagation methods to choose amongst a finite
set of candidate trajectories.
The overall simulation pseudocode is shown in Algo. 1.
It generates a set of trajectories,
each of length ,
for agents
given the scene context .
(The exact value of depends
on the number of agents that
are visible in
.)
We denote the generated output by
,
where is the state (2d location and velocity) of the ’th agent
in sample .
Inner loop
Input:
Scene context ,
agent history ,
num. agents ,
future planning horizon ,
number of rollouts ,
transformer proposal
Output:Predicted next state for each agent,
for to do
Sample
Initialize
end for
Sample
return
Algorithm 2Model Predictive Simulation
At each step , the simulator calls our MPS
algorithm to generate a prediction
for the next state of each agent. The pseudocode for this is shown in Algo. 2.
The approach is as follows.
First we use the MTR transformer model
(Shi et al., 2024)
to sample a set of goal locations,
, one for each agent,
as well as a sequence of anchor points
leading to each goal,
,
where is the planning or forecast horizon.
We do this times in parallel,
to create a set of possible futures.
We then use the PGM to generate
joint trajectories (for all agents), using the method described below.
Finally we evaluate the energy of each generated
trajectory, ,
sample one of the low energy (high probability) ones to get ,
and return the first step of this sampled trajectory,
.
Graphical model
Figure 1: Factor Graph for agents
unrolled for planning steps.
Circles are random variables,
gray squares are fixed factors.
Figure 2: Joint probability model.
The key to our method is the probabilistic graphical model (PGM) for improving upon the proposed trajectories by MTR.
The factor graph is shown
in Fig. 1
and the corresponding conditional joint distribution is given in Fig. 2. The model was inspired by (Patwardhan et al., 2022) who uses Gaussian belief propagation.
We now explain each of the factors.
First we have factors which compare
a candidate trajectory to the original proposal.
The motion factor is defined as
,
where is the predicted location (anchor point)
for agent at time as computed by .
This ensures the trajectory stays close to the initial proposal.
The proximity to goal factor is defined as
,
where is the goal for agent
predicted by .
This ensures the trajectory ends close to where we expect.
Second we have factors defined from "physics".
We define a factor that penalizes
deviation from linear motion:
,
where are the location components of ,
are the velocity components of ,
and is the sampling rate.
We also define a factor that penalizes
change in direction:
.
We used weight 2.0 for and 1.0 for all other factors.
Third we have factors derived from static obstacles
on the road:
,
where represents the coordinates of the road edges (part of the context ) and is a Gaussian field centered and rotated according to the agent’s location .
Finally, we have pairwise collision factors between agents:
,
where is a Gaussian field for agent , and
are the 9 collision checking points (CCP) for the other agent (4 corners, 4 centers of the sides, and center of the agent).
Inference
Inference on the factor graph is equivalent to minimizing a non-linear, non-convex quadratic optimization problem defined over .
For efficiency reasons, we developed a two-step approach. First, we use the Gauss–Newton method to solve a partial model that only consists of , and factors,
as these can all be evaluated in parallel
across agents using
individual trajectory models.
This step produces smoothed trajectories, which are then frozen.
Second, we sample joint trajectories for agents according to their probability (unnormalized energy),
and use the and factors
to score their quality. After repeating this times, the best joint trajectories are sampled from a softmin operation over the scores of the samples.
WAYMO
META METRIC
KINEMATIC
INTERACTIVE
MAP
LEADERBOARD
REALISM
LINEAR
SPEED
LINEAR
ACCEL.
ANG.
SPEED
ANG.
ACCEL.
DIST.
TO OBJ.
COLLISION
TTC
DIST.
TO ROAD
OFFROAD
minADE
SMART
0.7511
0.3646
0.4057
0.4231
0.5844
0.3769
0.9655
0.8317
0.6590
0.9362
1.5447
MVTE
0.7301
0.3506
0.3530
0.4974
0.5999
0.3742
0.9049
0.8309
0.6655
0.9071
1.6769
MPS (Ours)
0.7416
0.3137
0.3049
0.4705
0.5834
0.3593
0.9629
0.8070
0.6651
0.9366
1.4841
Table 1: WOSAC Leaderboard: SMART (2024 winner) Vs. MVTE (2023 winner) Vs. MPS (ours).
3 Experimental Evaluation
Benchmark We evaluated MPS on the 2024 Waymo Open Sim Agents Challenge
(Montali et al., 2023),
where the task is simulating 32 realistic rollouts of all agents in the scene given their 1s history for 8s into the future. The simulation needs to be closed-loop and factorized between the ADV and other agents, which MPS satisfies naturally.
Implementation Details We implemented the factors and the inference in JAX
111https://github.com/google/jax
and JAXopt
222https://jaxopt.github.io/
for the Gauss-Newton method. We leveraged JAX’s just-in-time (JIT) compilation and observed great scalability. For speed up, we take 10 immediate next steps at each MPS iteration.
We trained our own MTR model
using the open source code
333https://github.com/sshaoshuai/MTR
. We removed local attention and reduced the source polylines to 512. The training data
is augmented by adding extra interacting agents,
and by applying random history dropouts. We followed the original training setup except the number of epochs (50), the batch size (8) and the LR schedule ([25, 30, 35, 40, 45]). Training took about 3 days on 16 A100s.
We used only the official Waymo Open Motion Dataset v1.2.1
and did not use any Lidar or Camera data. We did not need any additional training and we did not use ensembles.
Sim Agents 2024 Results We ranked number 4 among all methods (Table 1). We outperformed the 2023 winner MVTE (Wang et al., 2023) which also uses MTR (Shi et al., 2024), and are approximately 1 point behind the 2024 winner SMART (Wu et al., 2024). MPS achieved near-top performance in a few safety critical metrics such as COLLISION and OFFROAD, showing the effectiveness of the priors in our model. MPS showed a lack of performance in LINEAR SPEED / ACCEL. We speculate this is because MPS can generate diverse rollouts that are very different from the logged data used for metric evaluation.
META METRIC
KINEMATIC
INTERACTIVE
MAP
METHOD
REALISM
LINEAR
SPEED
LINEAR
ACCEL.
ANG.
SPEED
ANG.
ACCEL.
DIST.
TO OBJ.
COLLISION
TTC
DIST.
TO ROAD
OFFROAD
minADE
MTR+RAND
0.7019
0.3922
0.3530
0.3899
0.3304
0.3691
0.8491
0.8164
0.6706
0.9207
1.3084
MPS
0.7418
0.3158
0.3056
0.4664
0.5818
0.3604
0.9617
0.8094
0.6651
0.9374
1.4841
Table 2: Abalation study – comparing MPS to MTR with random trajectory sampling.
Ablation Study To evaluate the value of the PGM priors, we compare MPS to the same MTR model with random trajectory sampling (MTR+RAND) on the validation dataset. As shown in Table 2, MPS improved safety-critical metrics such as COLLISION, OFFROAD and the overall REALISM score, while lacked performance at LINEAR SPEED / ACCEL for the same reason discussed above.
Qualitative Study Qualitatively, MPS generates diverse (multi-modal) predictions (Fig. 3),
and each prediction
contains realistic traffic patterns such as lane merging, unprotected left turn, yielding, among others (Fig. 4).
We explored an approach
that can improve on any trajectory simulation
model by adding domain-specific priors,
and performing inference in the corresponding PGM. We believe combing prior-driven (top-down) and data-driven (bottom-up) methods
is key to building robust and reliable autonomous driving planning and simulation.444We thank Joseph Ortiz and Wolfgang Lehrach for many useful discussions and suggestions.
\nobibliography
*
References
Ettinger et al. (2021)
S. Ettinger, S. Cheng, B. Caine, C. Liu, H. Zhao, S. Pradhan, Y. Chai, B. Sapp,
C. R. Qi, Y. Zhou, et al.
Large scale interactive motion forecasting for autonomous driving:
The waymo open motion dataset.
In ICCV, 2021.
Luo et al. (2023)
W. Luo, C. Park, A. Cornman, B. Sapp, and D. Anguelov.
Jfp: Joint future prediction with interactive multi-agent modeling
for autonomous driving.
In CoRL, 2023.
Montali et al. (2023)
N. Montali, J. Lambert, P. Mougin, A. Kuefler, N. Rhinehart, M. Li, C. Gulino,
T. Emrich, Z. Yang, S. Whiteson, B. White, and D. Anguelov.
The waymo open sim agents challenge.
In NIPS Datasets Track, May 2023.
Nayakanti et al. (2023)
N. Nayakanti, R. Al-Rfou, A. Zhou, K. Goel, K. S. Refaat, and B. Sapp.
Wayformer: Motion forecasting via simple & efficient attention
networks.
In ICRA. IEEE, 2023.
Ngiam et al. (2021)
J. Ngiam, B. Caine, V. Vasudevan, Z. Zhang, H.-T. L. Chiang, J. Ling,
R. Roelofs, A. Bewley, C. Liu, A. Venugopal, et al.
Scene transformer: A unified architecture for predicting multiple
agent trajectories.
arXiv preprint arXiv:2106.08417, 2021.
Patwardhan et al. (2022)
A. Patwardhan, R. Murai, and A. J. Davison.
Distributing collaborative Multi-Robot planning with gaussian
belief propagation.
In IEEE Robotics and Automation Letters, Mar. 2022.
Philion et al. (2023)
J. Philion, X. B. Peng, and S. Fidler.
Trajeglish: Learning the language of driving scenarios.
arXiv preprint arXiv:2312.04535, 2023.
Schwenzer et al. (2021)
M. Schwenzer, M. Ay, T. Bergs, and D. Abel.
Review on model predictive control: An engineering perspective.
The International Journal of Advanced Manufacturing
Technology, 117(5):1327–1349, 2021.
Seff et al. (2023)
A. Seff, B. Cera, D. Chen, M. Ng, A. Zhou, N. Nayakanti, K. S. Refaat,
R. Al-Rfou, and B. Sapp.
Motionlm: Multi-agent motion forecasting as language modeling.
In ICCV, 2023.
Shi et al. (2024)
S. Shi, L. Jiang, D. Dai, and B. Schiele.
Mtr++: Multi-agent motion prediction with symmetric scene modeling
and guided intention querying.
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 2024.
Wang et al. (2023)
Y. Wang, T. Zhao, and F. Yi.
Multiverse transformer: 1st place solution for waymo open sim agents
challenge 2023.
arXiv preprint arXiv:2306.11868, 2023.
Wu et al. (2024)
W. Wu, X. Feng, Z. Gao, and Y. Kan.
Smart: Scalable multi-agent real-time simulation via next-token
prediction.
arXiv preprint arXiv:2405.15677, 2024.
Zhou et al. (2023)
Z. Zhou, J. Wang, Y.-H. Li, and Y.-K. Huang.
Query-centric trajectory prediction.
In CVPR, 2023.
Appendix
mAP
minADE
minFDE
MissRate
Vehicle
0.4745
0.7526
1.5107
0.1489
Pedestrian
0.4827
0.3455
0.7251
0.0756
Cyclist
0.3898
0.7095
1.4299
0.1865
Avg
0.4490
0.6025
1.2219
0.1370
Table 3: Evaluation of our MTR model on the WOMD Motion Prediction dataset (validation).
Figure 5: Three simulated scenarios (top to bottom) at different timesteps (left to right) showcasing multi-modal behavior of agents. In the top and bottom simulation, the dark green car takes the left turn. However, in the middle simulation, it turns right. The green car, in the top and middle simulation, attempts the lane change to the left as the cars in front wait at the signal. In the middle simulation, the same green car comes to a stop in the same lane behind the traffic.
Figure 6: Two simulated scenarios (top to bottom) at different timesteps (left to right). In the top simulation, the light green car, attempting to take the right turn, stops and respects the teal car’s right of way. Whereas, in the bottom simulation, the light green car, quickly takes the right turn.
Figure 7: Simulated scenario at different timesteps(left to right). The dark green car, about to take the free right, waits for the pedestrian to cross the road.
Figure 8: Simulated scenario at different timesteps(left to right). The dark green car merges to the left lane as the right lane comes to an end.
Figure 9: Simulated scenario at different timesteps(left to right). The cars come to stop at the signal. Additionally, the teal car starts to slow down and stops as the golden car in front is waiting at the signal.
Figure 10: Simulated scenario at different timesteps(left to right). The red car waits for the purple car to pass before taking an unprotected left turn.
Figure 11: Two simulated scenarios (top to bottom) at different timesteps (left to right). The top simulation shows the brown car entering the parking lot and driving straight. In the bottom simulation, the brown car attempts to park besides the pink car. We can also observe the dark purple car, waiting for the teal car to complete the U-turn before taking the free right.