Generalizable Multi‑Agent Planning from Signal Temporal Logic Specifications via Diffusion

Project page accompanying the paper submission.

3‑Minute Overview

Abstract

Multi-agent systems in the real-world (e.g., drone swarms, autonomous cars, warehouse robots) must satisfy rich, temporal tasks while avoiding collisions. Signal Temporal Logic (STL) elegantly encodes such objectives, but current STL planning methods face critical limitations. State-of-the-art optimization-based approaches can handle arbitrary STL specifications but struggle with scalability, becoming computationally impractical as the number of agents grows. Learning-based methods efficiently handle a large number of agents with rapid planning times but fare poorly when deployment-time objectives differ from those used during training, and do not support planning tasks that require different specifications to be ascribed to different agents (i.e., heterogeneity). This fundamental trade-off between generalizability and scalability presents a significant challenge for realizing multi-agent STL planning algorithms in practice. To overcome this challenge, we introduce a new diffusion method for multi-agent planning with STL specifications. Using a differentiable approximation of STL, we integrate the STL gradient in the denoising process, making our approach generalizable to novel formulas constructed from a fixed set of predicates available at evaluation, while achieving the same scalability as existing learning-based methods. Our method supports heterogeneous specifications, and by using diffusion models, naturally enhances plan diversity, thereby significantly reducing safety-related violations (e.g., collisions) among agents. A detailed evaluation study justifies the utility of STL-guided diffusion-based multi-agent planners for constructing generalizable, scalable, and diverse plans.

Diffusion Process visualized for N=8 agents with heterogeneous tasks

Diffusion Process visualized for N=32 agents with heterogeneous tasks

Real‑World Demonstration

Heterogeneous sequence demo with 3 goals on N=10 robots in the Robotarium testbench.

Diversity Videos

Obs: These clips illustrate qualitative diversity across rollouts for the sequence specification for 3 goals with all agents following the same task (Homogeneous).

Diffusion

GNN-ODE

STLPY

N = number of agents; Spec abbreviations: Seq. = Sequence.

Diversity Metrics
All diversity metrics for $N=16, 32$ agents in the Homogeneous task setting. D-MA: Diffusion-based Multi-Agent (Ours), G-O: GNN-ODE planner, STLPY: MILP planner. Best results are in bold.
Spec N Agents per Cluster (↓) Max Cluster Fraction (↓) Mean Pairwise Dist. (↑) Num Clusters (↑) Path Entropy (↑) Path Overlap (↓)
D-MAG-OSTLPY D-MAG-OSTLPY D-MAG-OSTLPY D-MAG-OSTLPY D-MAG-OSTLPY D-MAG-OSTLPY
Branch 16 1.093.812.35 0.120.560.33 2.632.102.33 14.674.206.80 0.930.880.93 0.000.080.00
32 1.036.584.10 0.040.790.20 3.432.352.35 31.104.877.80 0.900.860.92 0.000.000.00
Cover 16 1.086.403.02 0.120.810.47 2.641.762.11 14.872.505.30 0.920.870.91 0.000.180.00
32 1.0311.716.27 0.040.910.32 3.431.782.09 31.132.735.10 0.900.850.91 0.000.000.00
Loop 16 1.044.403.40 0.090.640.58 2.431.831.82 15.333.634.70 0.900.830.89 0.008.720.27
32 1.2610.005.33 0.040.860.41 2.231.831.85 25.303.206.00 0.870.790.87 0.202.410.00
Seq. 16 1.087.742.89 0.110.850.41 2.401.322.34 14.872.075.53 0.930.900.85 0.000.2512.42
32 1.4518.8232.00 0.200.860.25 2.240.750.94 22.071.701.00 0.910.900.94 0.000.003.66

Simulation Videos

Double Integrator

Diffusion

STLPY

Environment Details & Dynamics

Global setting. All simulations use a uniform time step of $\delta t = 0.03$.

State: $x_i=\bigl[p_i^x,\, p_i^y,\, v_i^x,\, v_i^y\bigr]^\top$; control: $u_i=\bigl[a_i^x,\, a_i^y\bigr]^\top$.

Continuous-time dynamics: \[ \dot{x}_i \;=\; \bigl[v_i^x,\, v_i^y,\, a_i^x,\, a_i^y\bigr]^\top, \] with relative state $e_{ij}=x_j-x_i$ for interaction features.

Double Integrator — Results

Obs: N = no obstacles, Y = with obstacles. TtR = mean time-to-reach (steps).

Obs: N = no obstacles, Y = with obstacles. TtR = mean time-to-reach (s).

All results for the DoubleIntegrator environment (Heterogeneous specs; $N\in\{8,16,32\}$). D-MA: Diffusion-based Multi-Agent (Ours), STLPY: MILP planner. Best results are in bold.
Spec Obs N Planning Time (s) (↓) Success Rate (↑) TtR (↓)
D-MASTLPY D-MASTLPY D-MASTLPY
Branch N 8 0.5611.28 100.0092.50 1317.541203.60
16 1.5219.30 100.0095.62 1498.841009.62
32 19.9538.53 98.8590.31 1912.651469.74
Y 8 0.579.91 97.9285.00 1466.511136.44
16 0.9319.32 98.7588.12 1686.291203.41
32 2.1739.47 97.6083.75 1940.351458.93
Cover N 8 1.498.03 99.5892.50 1360.39926.76
16 9.8216.40 98.7595.00 1578.781247.36
32 40.3333.23 89.6982.19 1914.501347.15
Y 8 1.568.31 98.7585.00 1389.641177.51
16 2.2816.55 96.8887.50 1607.361268.25
32 4.4332.81 87.8176.88 1908.441343.73
Loop N 8 3.9249.29 95.8392.50 2070.631817.73
16 7.29100.19 96.0492.50 2548.742030.55
32 13.77197.32 93.6591.25 3135.922961.40
Y 8 3.9549.58 92.9287.50 2315.571871.11
16 7.52100.53 93.5488.12 2379.242097.43
32 14.43196.92 90.2183.75 3124.323058.88
Seq. N 8 1.834.10 99.5892.50 1740.111431.82
16 7.888.01 98.9693.12 1833.661626.01
32 31.8916.28 90.2170.94 2351.661852.80
Y 8 2.033.98 96.2582.50 1580.931234.13
16 2.547.97 96.0483.75 1834.801780.33
32 4.4215.81 85.7366.25 2324.781850.46

Supplementary Tables

Hyperparameters
Hyperparameters used in the experiments.
Hparam. Value Description
Diffusion Parameters
\(N_{\text{diff}}\)256Number of diffusion steps.
\(\lambda_{\text{STL}}\)1.0Weight for the STL loss.
\(\lambda_{\text{ach}}\)0.1Weight for the achievable loss.
\(N_{\text{sample}}\)40Max. number of plan resamples.
\(k_{\text{ach}}\) 0.1 Fraction of diffusion steps
to compute achievable loss.
\(\epsilon_{\text{resample}}\)0.1Min. STL loss for resampling.
\(\sigma_{\max}\)80Maximum noise level.
\(\sigma_{\min}\)0.002Minimum noise level.
Environment Parameters
\(T^{\text{train}}_h\)1000Time horizon in training.
\(T^{\text{eval}}_h\)3000Time horizon in evaluation.
Evaluation Parameters
\(d_{\tau}\)1.0Distance threshold for clustering.
\(r_{\text{grid}}\)0.3Resolution for occupancy grid.
Dataset Coverage

SingleIntegrator Predicate Coverage

Predicate coverage for SingleIntegrator dataset

Top: Coverage of dataset over predicates (SingleIntegrator).

DubinsCar Trajectory Samples

Example trajectories for DubinsCar dataset

Bottom: Visualizing a subset of dataset trajectories (DubinsCar).

JIT Compilation & Resampling
Compilation and Resampling times for the Dubins Car environment in the Homogeneous task setting. D‑MA: Diffusion‑based Multi‑Agent (Ours), D‑MA (No Ach.): ablation without achievable loss, D‑SA: Diffusion‑based Single‑Agent.
Spec N JIT Comp. Time (↓) Num. Resampling (↓)
D‑MA D‑MA (No Ach.) D‑SA D‑MA D‑MA (No Ach.) D‑SA
Branch 874.2418.247.791.121.001.00
1687.8316.649.791.061.001.00
32120.4139.7812.451.061.001.47
Cover 832.2510.706.041.001.121.50
1679.9215.077.201.121.122.25
3271.0023.488.731.031.002.47
Loop 8130.0873.7722.521.881.886.38
16184.14131.9830.351.811.817.06
32287.6896.9242.363.123.066.91
Seq. 857.988.826.021.751.001.12
1643.1116.716.581.061.061.44
3282.4214.488.251.411.411.62