Constrained Decoding for transformer based robot policies

Anonymous Authors

Under Review | 2025

SafeDec biases action selection on‑the‑fly so that robot trajectories provably satisfy user‑defined STL safety and mission constraints, no model fine‑tuning or retraining necessary. In this video, around 0:07 you see the constrained decoding restricting the model from entering the room due to the geofence specification. The model then finds an alternate route and explores other rooms before finding the target object that is an apple.

Abstract

Recent advances in transformer based models enable end‑to‑end policies that map multimodal observations directly to action sequences. While these policies generalize across tasks, they lack explicit notions of safety and correctness. We introduce SafeDec, a constrained decoding framework that enforces Signal Temporal Logic (STL) specifications at inference time, guaranteeing that generated trajectories satisfy formal constraints without retraining and irrespective of the underlying model. Comprehensive experiments on state‑of‑the‑art navigation models demonstrate that specification‑guided decoding not only filters unsafe actions but can also condition generation, achieving high task success while provably adhering to safety rules.

An RGB view and natural-language goal are encoded and passed through a frozen transformer policy (e.g., SPOC / FlaRE), which proposes an action distribution. Our STL-robustness constrained decoder consults the rulebook on the left, reweights the candidate actions based on logic satisfaction: pink = original, green = safe-reweighted, red line = per-action robustness. The resulting trajectory in the bird's-eye view (orange) reaches the target objects (green circles) while avoiding exclusion zones (red squares), ensuring safety-compliant navigation without retraining.

The orange bars show the original action probabilities proposed by the transformer based robot policy. Lighter bars represent previous-step probabilities, while darker bars depict the reweighted values after applying temporal logic constraints. The final selected action is shown in green, chosen for its safety and task alignment. The red curve overlays the STL robustness score for each action, guiding the decoder away from unsafe candidates in real time.

Top-down views across indoor environments with SafeDec induced trajectories Red circles mark target objects, green boxes are avoid regions, and orange paths show safe trajectories under constrained decoding.

BibTeX

@inproceedings{safedec2025,
  title     = {Constrained Decoding for transformer based robot policies},
  author    = {Anonymous Authors},
  booktitle = {Under Submission},
  year      = {2025}
}