Constrained Decoding for Robot Foundation Models

Anonymous Authors
Under Review  |  2025

SpecDec biases action selection on‑the‑fly so that robot trajectories provably satisfy user‑defined STL safety and mission constraints, no model fine‑tuning or retraining necessary. In this video, around 0:07 you see the constrained decoding restricting the model from entering the room due to the geofence specification. The model then finds an alternate route and explores other rooms before finding the target object that is an apple.

Abstract

Recent advances in robotic foundation models enable end‑to‑end policies that map multimodal observations directly to action sequences. While these policies generalize across tasks, they lack explicit notions of safety and correctness. We introduce SpecDec, a constrained decoding framework that enforces Signal Temporal Logic (STL) specifications at inference time, guaranteeing that generated trajectories satisfy formal constraints without retraining and irrespective of the underlying model. Comprehensive experiments on state‑of‑the‑art navigation models demonstrate that specification‑guided decoding not only filters unsafe actions but can also condition generation, achieving high task success while provably adhering to safety rules.

BibTeX

@inproceedings{specdec2025,
  title     = { Constrained Decoding for Robot Foundation Models},
  author    = {Anonymous Authors},
  booktitle = {Under Submission},
  year      = {2025}
}