3‑Minute Overview

Abstract

Multi-agent systems in the real-world (e.g., drone swarms, autonomous cars, warehouse robots) must satisfy rich, temporal tasks while avoiding collisions. Signal Temporal Logic (STL) elegantly encodes such objectives, but current STL planning methods face critical limitations. State-of-the-art optimization-based approaches can handle arbitrary STL specifications but struggle with scalability, becoming computationally impractical as the number of agents grows. Learning-based methods efficiently handle a large number of agents with rapid planning times but fare poorly when deployment-time objectives differ from those used during training, and do not support planning tasks that require different specifications to be ascribed to different agents (i.e., heterogeneity). This fundamental trade-off between generalizability and scalability presents a significant challenge for realizing multi-agent STL planning algorithms in practice. To overcome this challenge, we introduce a new diffusion method for multi-agent planning with STL specifications. Using a differentiable approximation of STL, we integrate the STL gradient in the denoising process, making our approach generalizable to novel formulas constructed from a fixed set of predicates available at evaluation, while achieving the same scalability as existing learning-based methods. Our method supports heterogeneous specifications, and by using diffusion models, naturally enhances plan diversity, thereby significantly reducing safety-related violations (e.g., collisions) among agents. A detailed evaluation study justifies the utility of STL-guided diffusion-based multi-agent planners for constructing generalizable, scalable, and diverse plans.

Diffusion Process visualized for N=8 agents with heterogeneous tasks

Diffusion Process visualized for N=32 agents with heterogeneous tasks

Real‑World Demonstration

Heterogeneous sequence demo with 3 goals on N=10 robots in the Robotarium testbench.

Diversity Videos

Obs: These clips illustrate qualitative diversity across rollouts for the sequence specification for 3 goals with all agents following the same task (Homogeneous).

8 Agents
32 Agents

Diffusion

GNN-ODE

STLPY

Diffusion

GNN-ODE

STLPY

N = number of agents; Spec abbreviations: Seq. = Sequence.

Diversity Metrics

Diversity Metrics Explanation (click to expand)

Notation. We consider valid trajectories as polylines $\tau_{s_i}=\{\mathbf{x}^{(i)}_t\}_{t=0}^{T-1}\subset\mathbb{R}^2$ for $M$ successful agents over $T$ steps.

Path Overlap (↓)

Occupancy grid. Overlay a uniform grid of resolution $r_{\text{grid}}=0.3\,\mathrm{m}$ on the workspace $[x_{\min},x_{\max}]\times[y_{\min},y_{\max}]$ (e.g., $x_{\max}=y_{\max}=6$, $x_{\min}=y_{\min}=-0.5$). For agent $i$, the binary occupancy mask is \[ O^{(i)}_{u,v}=\mathbf{1}\!\left[(u,v)\in \text{cells}\bigl(\tau_{s_i}\bigr)\right]. \] The (normalised) overlap among $M$ agents is \[ \text{Overlap} \;=\; \frac{\displaystyle\sum_{u,v}\Bigl(\bigwedge_{i=1}^{M}O^{(i)}_{u,v}\Bigr)} {\displaystyle\sum_{u,v}\Bigl(\bigvee_{i=1}^{M}O^{(i)}_{u,v}\Bigr) + \varepsilon}, \qquad \varepsilon=10^{-6}, \] where $\wedge$/$\vee$ denote logical AND/OR over agents. $\text{Overlap}\!\approx\!1$ means identical footprints; $\approx\!0$ means disjoint footprints.

Path Entropy (↑)

From $O^{(i)}_{u,v}$ build the cumulative grid \[ \mathrm{occ}_{u,v}=\sum_{i=1}^{M}\sum_{t=0}^{T-1}\mathbf{1}\!\left[(u,v)\in \text{cell}(\mathbf{x}^{(i)}_t)\right], \] and the visited set $\mathcal V=\{(u,v)\mid \mathrm{occ}_{u,v}>0\}$ with $V=|\mathcal V|$. Define \[ p_{u,v}=\frac{\mathrm{occ}_{u,v}}{\sum_{(a,b)\in\mathcal V}\mathrm{occ}_{a,b}},\quad (u,v)\in\mathcal V, \] and report the normalised Shannon entropy \[ \hat H \;=\; -\frac{1}{\log V}\sum_{(u,v)\in\mathcal V} p_{u,v}\log p_{u,v}. \]

Pairwise Distance & Cluster Statistics

Distances. For each pair $(i,j)$, compute the discrete Fréchet distance $D_{ij}=d_F\!\bigl(\tau_{s_i},\tau_{s_j}\bigr)$, forming a symmetric matrix $D\in\mathbb{R}^{M\times M}$. The mean pairwise distance (reported as “Mean Pairwise Dist. (↑)”) is \[ \overline{D} \;=\; \frac{1}{M^2}\sum_{i=1}^{M}\sum_{j=1}^{M} D_{ij}, \] where higher values indicate more separation among successful plans.

Graph & clusters. Build an undirected graph $G=(V,E)$ with $V=\{1,\ldots,M\}$ and \[ (i,j)\in E \;\Longleftrightarrow\; 0 < D_{ij}\le \Delta\tau,\quad \Delta\tau = 1\,\mathrm{m}. \] Let $\text{ConnComp}(G)$ be the connected components, $k=|\text{ConnComp}(G)|$, and $C_c$ a component. We report:

Num. Clusters (↑): $k$ (larger $\Rightarrow$ more distinct route groups).
Max Cluster Fraction (↓): $\displaystyle\max_c |C_c|/M$ (smaller $\Rightarrow$ no single dominant route).
Agents per Cluster (↓): $\displaystyle \frac{1}{k}\sum_c |C_c|$ (smaller $\Rightarrow$ more evenly split).

Legend: “↑” higher is better; “↓” lower is better. These metrics complement success/safety to characterise diversity.

All diversity metrics for $N=16, 32$ agents in the Homogeneous task setting. D-MA: Diffusion-based Multi-Agent (Ours), G-O: GNN-ODE planner, STLPY: MILP planner. Best results are in **bold**.
Spec	N	Agents per Cluster (↓)			Max Cluster Fraction (↓)			Mean Pairwise Dist. (↑)			Num Clusters (↑)			Path Entropy (↑)			Path Overlap (↓)
Spec	N	D-MA	G-O	STLPY	D-MA	G-O	STLPY	D-MA	G-O	STLPY	D-MA	G-O	STLPY	D-MA	G-O	STLPY	D-MA	G-O	STLPY
Branch	16	1.09	3.81	2.35	0.12	0.56	0.33	2.63	2.10	2.33	14.67	4.20	6.80	0.93	0.88	0.93	0.00	0.08	0.00
Branch	32	1.03	6.58	4.10	0.04	0.79	0.20	3.43	2.35	2.35	31.10	4.87	7.80	0.90	0.86	0.92	0.00	0.00	0.00
Cover	16	1.08	6.40	3.02	0.12	0.81	0.47	2.64	1.76	2.11	14.87	2.50	5.30	0.92	0.87	0.91	0.00	0.18	0.00
Cover	32	1.03	11.71	6.27	0.04	0.91	0.32	3.43	1.78	2.09	31.13	2.73	5.10	0.90	0.85	0.91	0.00	0.00	0.00
Loop	16	1.04	4.40	3.40	0.09	0.64	0.58	2.43	1.83	1.82	15.33	3.63	4.70	0.90	0.83	0.89	0.00	8.72	0.27
Loop	32	1.26	10.00	5.33	0.04	0.86	0.41	2.23	1.83	1.85	25.30	3.20	6.00	0.87	0.79	0.87	0.20	2.41	0.00
Seq.	16	1.08	7.74	2.89	0.11	0.85	0.41	2.40	1.32	2.34	14.87	2.07	5.53	0.93	0.90	0.85	0.00	0.25	12.42
Seq.	32	1.45	18.82	32.00	0.20	0.86	0.25	2.24	0.75	0.94	22.07	1.70	1.00	0.91	0.90	0.94	0.00	0.00	3.66

Simulation Videos

Double Integrator

Agents

Diffusion

STLPY

Environment Details & Dynamics

Global setting. All simulations use a uniform time step of $\delta t = 0.03$.

State: $x_i=\bigl[p_i^x,\, p_i^y,\, v_i^x,\, v_i^y\bigr]^\top$; control: $u_i=\bigl[a_i^x,\, a_i^y\bigr]^\top$.

Continuous-time dynamics: \[ \dot{x}_i \;=\; \bigl[v_i^x,\, v_i^y,\, a_i^x,\, a_i^y\bigr]^\top, \] with relative state $e_{ij}=x_j-x_i$ for interaction features.

Double Integrator — Results

Obs: N = no obstacles, Y = with obstacles. TtR = mean time-to-reach (steps).

Obs: N = no obstacles, Y = with obstacles. TtR = mean time-to-reach (s).

All results for the DoubleIntegrator environment (Heterogeneous specs; $N\in\{8,16,32\}$). D-MA: Diffusion-based Multi-Agent (Ours), STLPY: MILP planner. Best results are in **bold**.
Spec	Obs	N	Planning Time (s) (↓)		Success Rate (↑)		TtR (↓)
Spec	Obs	N	D-MA	STLPY	D-MA	STLPY	D-MA	STLPY
Branch	N	8	0.56	11.28	100.00	92.50	1317.54	1203.60
		16	1.52	19.30	100.00	95.62	1498.84	1009.62
		32	19.95	38.53	98.85	90.31	1912.65	1469.74
	Y	8	0.57	9.91	97.92	85.00	1466.51	1136.44
		16	0.93	19.32	98.75	88.12	1686.29	1203.41
		32	2.17	39.47	97.60	83.75	1940.35	1458.93
Cover	N	8	1.49	8.03	99.58	92.50	1360.39	926.76
		16	9.82	16.40	98.75	95.00	1578.78	1247.36
		32	40.33	33.23	89.69	82.19	1914.50	1347.15
	Y	8	1.56	8.31	98.75	85.00	1389.64	1177.51
		16	2.28	16.55	96.88	87.50	1607.36	1268.25
		32	4.43	32.81	87.81	76.88	1908.44	1343.73
Loop	N	8	3.92	49.29	95.83	92.50	2070.63	1817.73
		16	7.29	100.19	96.04	92.50	2548.74	2030.55
		32	13.77	197.32	93.65	91.25	3135.92	2961.40
	Y	8	3.95	49.58	92.92	87.50	2315.57	1871.11
		16	7.52	100.53	93.54	88.12	2379.24	2097.43
		32	14.43	196.92	90.21	83.75	3124.32	3058.88
Seq.	N	8	1.83	4.10	99.58	92.50	1740.11	1431.82
		16	7.88	8.01	98.96	93.12	1833.66	1626.01
		32	31.89	16.28	90.21	70.94	2351.66	1852.80
	Y	8	2.03	3.98	96.25	82.50	1580.93	1234.13
		16	2.54	7.97	96.04	83.75	1834.80	1780.33
		32	4.42	15.81	85.73	66.25	2324.78	1850.46

Dubins Car

Agents

Diffusion

STLPY

Environment Details & Dynamics

Global setting. All simulations use a uniform time step of $\delta t = 0.03$.

State of agent $i$: $x_i = \bigl[p_i^x,\, p_i^y,\, \theta_i,\, v_i\bigr]^\top$. Control input: $u_i = \bigl[\omega_i,\, a_i\bigr]^\top$ (angular rate and linear acceleration magnitude).

Continuous-time dynamics: \[ \dot x_i \;=\; \bigl[v_i \cos\theta_i,\; v_i \sin\theta_i,\; \omega_i,\; a_i\bigr]^\top. \] For interaction features we use $e_{ij}=e_j(x_j)-e_i(x_i)$ with \[ e_i(x_i)=\bigl[p_i^x,\; p_i^y,\; v_i\cos\theta_i,\; v_i\sin\theta_i\bigr]^\top. \]

Dubins Car — Results

Obs: N = no obstacles, Y = with obstacles. TtR = mean time-to-reach (steps).

All results for the Dubins Car environment (Heterogeneous specs; $N\in\{8,16,32\}$). D-MA: Diffusion-based Multi-Agent (Ours), D-SA: Diffusion-based Single-Agent, STLPY: MILP planner. Best results are in **bold**.
Spec	Obs	N	Planning Time (s) (↓)			Success Rate (↑)			TtR (↓)
Spec	Obs	N	D-MA	D-SA	STLPY	D-MA	D-SA	STLPY	D-MA	D-SA	STLPY
Branch	N	8	0.20	0.54	8.61	100.00	97.50	97.50	1327.00	1237.50	746.07
		16	0.79	0.33	18.81	99.38	87.92	90.00	1252.00	1322.37	832.75
		32	2.10	0.68	43.80	93.12	65.00	68.65	1453.71	1579.60	1216.69
	Y	8	0.27	0.22	11.49	95.00	90.42	83.75	1334.66	1234.21	670.08
		16	0.84	0.31	22.30	90.62	82.50	78.12	1341.26	1327.89	1045.21
		32	2.26	0.46	44.12	87.19	55.00	66.88	1753.57	1620.55	998.40
Cover	N	8	0.42	1.22	7.24	100.00	100.00	97.50	1514.21	1430.00	997.07
		16	1.24	0.62	14.19	98.75	86.67	92.50	1452.65	1373.83	1090.52
		32	3.55	1.11	28.56	90.31	60.62	72.50	1553.56	1604.45	1286.58
	Y	8	0.83	0.42	9.12	90.00	87.08	85.00	1599.75	1246.25	854.22
		16	1.38	0.58	18.37	90.00	79.38	80.62	1659.24	1382.99	1141.86
		32	4.14	0.81	37.35	82.50	55.42	62.81	1591.70	1638.02	1097.58
Loop	N	8	0.84	5.07	45.45	97.50	90.00	95.00	2224.92	2313.10	1708.86
		16	3.18	2.48	91.60	96.25	87.50	82.50	2307.25	2338.47	1839.81
		32	6.36	2.98	244.22	82.19	67.81	53.85	2316.42	2730.77	2010.56
	Y	8	0.84	1.80	60.62	87.92	83.33	72.50	2625.00	2167.89	1795.36
		16	1.42	2.55	126.32	85.00	80.83	73.75	2511.69	2386.08	1880.43
		32	2.19	2.98	240.85	71.04	61.98	43.75	2858.89	2837.73	2101.98
Seq.	N	8	0.27	0.33	2.44	100.00	97.50	100.00	1887.00	1466.14	1277.50
		16	1.25	0.61	4.73	98.75	87.50	77.50	1694.25	1805.62	1705.25
		32	1.05	0.66	9.76	81.25	49.27	46.88	2041.64	2051.86	1824.10
	Y	8	0.43	0.32	4.59	87.92	85.42	76.25	1676.99	1727.93	1295.20
		16	1.87	0.42	9.08	88.75	75.62	68.75	1878.02	1747.22	1529.92
		32	4.75	0.68	18.86	71.25	42.19	41.88	1843.99	2022.89	1799.16

Single Integrator

Agents

Diffusion

STLPY

Environment Details & Dynamics

Global setting. All simulations use a uniform time step of $\delta t = 0.03$.

State: $x_i=\bigl[p_i^x,\, p_i^y\bigr]^\top$; action is velocity $v_i=\bigl[v_i^x,\, v_i^y\bigr]^\top$.

Continuous-time dynamics: \[ \dot{x}_i \;=\; v_i. \] Relative position $e_{ij}=x_j-x_i$ is used as the edge feature for interactions.

Single Integrator — Results

Obs: N = no obstacles, Y = with obstacles. TtR = mean time-to-reach (steps).

All results for the SingleIntegrator environment (Heterogeneous specs; $N\in\{8,16,32\}$). D-MA: Diffusion-based Multi-Agent (Ours), D-SA: Diffusion-based Single-Agent, STLPY: MILP planner. Best results are in **bold**.
Spec	Obs	N	Planning Time (s) (↓)			Success Rate (↑)			TtR (↓)
Spec	Obs	N	D-MA	D-SA	STLPY	D-MA	D-SA	STLPY	D-MA	D-SA	STLPY
Branch	N	8	0.46	0.27	9.79	100.00	100.00	92.50	1288.88	1107.45	498.36
		16	1.35	0.29	19.37	100.00	100.00	95.62	1123.75	1165.91	536.30
		32	4.62	0.31	38.12	100.00	100.00	96.25	1633.31	1309.40	531.44
	Y	8	0.54	0.27	9.93	94.17	89.58	81.25	1234.49	1489.94	665.80
		16	0.73	0.30	19.29	93.54	90.00	85.62	1323.92	1549.64	709.70
		32	1.07	0.31	38.62	92.81	91.67	87.19	1778.98	1677.16	734.18
Cover	N	8	0.43	0.29	8.08	100.00	100.00	100.00	1115.27	1117.58	564.79
		16	1.81	0.44	16.23	100.00	100.00	100.00	1176.00	1429.29	593.31
		32	7.21	0.47	33.03	100.00	100.00	100.00	1477.85	1583.75	900.85
	Y	8	0.42	0.31	8.18	92.92	85.83	81.25	1430.52	1479.96	853.77
		16	0.62	0.43	16.45	92.29	88.12	85.62	1529.05	1540.77	887.44
		32	1.09	0.47	32.89	92.81	89.90	88.12	1555.41	1448.32	947.44
Loop	N	8	3.98	1.83	53.08	97.50	96.25	92.50	1512.35	1699.71	941.06
		16	7.17	2.40	103.33	97.50	96.46	95.62	1915.74	1781.47	998.75
		32	12.54	3.88	214.74	100.00	100.00	100.00	2317.23	1695.50	1118.85
	Y	8	3.77	1.86	52.86	90.83	81.25	78.75	1904.90	1856.04	1233.15
		16	7.31	2.55	105.73	90.83	86.67	81.88	1886.95	1952.12	1119.33
		32	12.76	3.92	218.89	90.73	87.19	84.38	2206.91	2154.67	1423.70
Seq.	N	8	0.70	0.43	4.02	100.00	100.00	100.00	1485.80	1205.51	669.49
		16	4.76	0.50	7.95	100.00	100.00	100.00	1271.25	1289.96	693.78
		32	24.75	0.66	16.27	100.00	100.00	100.00	1602.76	1493.86	815.63
	Y	8	1.25	0.45	4.11	91.67	89.58	81.25	1573.66	1715.81	947.60
		16	1.81	0.50	8.12	93.33	88.75	83.12	1721.08	1824.43	1006.02
		32	3.35	0.64	16.39	91.77	87.50	80.94	2243.08	1988.05	1387.37

Supplementary Tables

Hyperparameters

Hyperparameters used in the experiments.
Hparam.	Value	Description
Diffusion Parameters
$N_{\text{diff}}$	256	Number of diffusion steps.
$\lambda_{\text{STL}}$	1.0	Weight for the STL loss.
$\lambda_{\text{ach}}$	0.1	Weight for the achievable loss.
$N_{\text{sample}}$	40	Max. number of plan resamples.
$k_{\text{ach}}$	0.1	Fraction of diffusion steps to compute achievable loss.
$\epsilon_{\text{resample}}$	0.1	Min. STL loss for resampling.
$\sigma_{\max}$	80	Maximum noise level.
$\sigma_{\min}$	0.002	Minimum noise level.
Environment Parameters
$T^{\text{train}}_h$	1000	Time horizon in training.
$T^{\text{eval}}_h$	3000	Time horizon in evaluation.
Evaluation Parameters
$d_{\tau}$	1.0	Distance threshold for clustering.
$r_{\text{grid}}$	0.3	Resolution for occupancy grid.

Dataset Coverage

SingleIntegrator Predicate Coverage

Predicate coverage for SingleIntegrator dataset

Top: Coverage of dataset over predicates (SingleIntegrator).

DubinsCar Trajectory Samples

Example trajectories for DubinsCar dataset

Bottom: Visualizing a subset of dataset trajectories (DubinsCar).

JIT Compilation & Resampling

What's Measured

JIT (XLA) compilation. We use JAX/XLA just‑in‑time compilation. The first call for a given planner/specification shape triggers a one‑time compile; subsequent evaluations reuse the compiled graph. We therefore separate this cost from the planning metrics and report it explicitly as “JIT Comp. Time (↓)”.

Resampling. Our diffusion planner may resample candidate plans if the intermediate STL loss exceeds a threshold. Each resample iteration has approximately fixed runtime cost, but the number of resamples depends on the specification and number of agents. We therefore report “Num. Resampling (↓)” alongside other metrics.

Achievable-loss ablation. We also include an ablation variant that removes the achievable loss term, denoted “D‑MA (No Ach.)”. This variant changes both compile costs (fewer STL constraints) and resampling behaviour.

Planning‑time results in the main tables exclude the one‑time JIT compile and report resampling counts separately for transparency.

Compilation and Resampling times for the Dubins Car environment in the Homogeneous task setting. D‑MA: Diffusion‑based Multi‑Agent (Ours), D‑MA (No Ach.): ablation without achievable loss, D‑SA: Diffusion‑based Single‑Agent.
Spec	N	JIT Comp. Time (↓)			Num. Resampling (↓)
Spec	N	D‑MA	D‑MA (No Ach.)	D‑SA	D‑MA	D‑MA (No Ach.)	D‑SA
Branch	8	74.24	18.24	7.79	1.12	1.00	1.00
	16	87.83	16.64	9.79	1.06	1.00	1.00
	32	120.41	39.78	12.45	1.06	1.00	1.47
Cover	8	32.25	10.70	6.04	1.00	1.12	1.50
	16	79.92	15.07	7.20	1.12	1.12	2.25
	32	71.00	23.48	8.73	1.03	1.00	2.47
Loop	8	130.08	73.77	22.52	1.88	1.88	6.38
	16	184.14	131.98	30.35	1.81	1.81	7.06
	32	287.68	96.92	42.36	3.12	3.06	6.91
Seq.	8	57.98	8.82	6.02	1.75	1.00	1.12
	16	43.11	16.71	6.58	1.06	1.06	1.44
	32	82.42	14.48	8.25	1.41	1.41	1.62

Hparam.	Value	Description
Diffusion Parameters
\(N_{\text{diff}}\)	256	Number of diffusion steps.
\(\lambda_{\text{STL}}\)	1.0	Weight for the STL loss.
\(\lambda_{\text{ach}}\)	0.1	Weight for the achievable loss.
\(N_{\text{sample}}\)	40	Max. number of plan resamples.
\(k_{\text{ach}}\)	0.1	Fraction of diffusion steps to compute achievable loss.
\(\epsilon_{\text{resample}}\)	0.1	Min. STL loss for resampling.
\(\sigma_{\max}\)	80	Maximum noise level.
\(\sigma_{\min}\)	0.002	Minimum noise level.
Environment Parameters
\(T^{\text{train}}_h\)	1000	Time horizon in training.
\(T^{\text{eval}}_h\)	3000	Time horizon in evaluation.
Evaluation Parameters
\(d_{\tau}\)	1.0	Distance threshold for clustering.
\(r_{\text{grid}}\)	0.3	Resolution for occupancy grid.

Generalizable Multi‑Agent Planning from Signal Temporal Logic Specifications via Diffusion

3‑Minute Overview

Real‑World Demonstration

Diversity Videos

Path Overlap (↓)

Path Entropy (↑)

Pairwise Distance & Cluster Statistics

Simulation Videos

Double Integrator

Environment Details & Dynamics

Dubins Car

Environment Details & Dynamics

Single Integrator

Environment Details & Dynamics

Supplementary Tables