PALM: Progress-Aware Policy Learning via Affordance Reasoning for Long-Horizon Robotic Manipulation

Yuanzhe Liu^1,2, Jingyuan Zhu¹, Yuchen Mo², Gen Li³, Xu Cao², Jin Jin⁴

Yifan Shen², Zhengyuan Li², Tianjiao Yu², Wenzhen Yuan², Fangqiang Ding⁵, Ismini Lourentzou²

¹University of Pennsylvania ²University of Illinois Urbana-Champaign ³Nanyang Technological University

⁴University of Oxford ⁵Massachusetts Institute of Technology

CVPR 2026

Paper arXiv Method Results Videos BibTeX

PALM couples structured affordance reasoning with continuous subtask progress estimation, enabling vision-language-action policies to remain temporally coherent across long-horizon robotic manipulation.

PALM teaser showing progress-aware affordance reasoning for long-horizon manipulation

PALM uses future affordance cues and progress prediction to guide long-horizon action generation.

Abstract

Recent vision-language-action models show strong promise for robotic manipulation, but they remain brittle in long-horizon, multi-step tasks. PALM addresses this limitation by structuring policy learning around interaction-centric affordance reasoning and subtask progress cues. It predicts structured future affordances that capture object relevance, contact geometry, spatial placements, and motion dynamics, then conditions a progress-aware diffusion policy on these affordance representations. The resulting policy jointly predicts actions and continuous within-subtask progress values, helping the robot decide when to continue, transition, or terminate a subtask. Across simulation and real-world experiments, PALM improves long-horizon manipulation performance, reaching a 91.8% success rate on LIBERO-LONG, improving average length on CALVIN ABCD by 12.5%, and achieving a 2× improvement over real-world baselines across three generalization settings.

Method

PALM introduces two complementary query sets on top of a multimodal VLA backbone: affordance queries that anticipate task-relevant future interaction cues, and action-progress queries that generate actions while estimating subtask completion. The affordance representation is factorized into global, local, spatial, and dynamic components, encouraging the policy to reason about what object matters, where to interact, where to place or move, and how the next interaction should unfold.

Structured affordance foresight

Predicts future interaction cues for object relevance, contact geometry, spatial placement, and motion dynamics.

Progress-aware control

Jointly predicts action and continuous progress to stabilize subtask transitions and reduce repeated or skipped actions.

Long-horizon robustness

Maintains coherent behavior under object relocation, unseen lighting, and visual distractors in real-world rollouts.

PALM architecture: multimodal encoding, structured affordance queries, and action-progress diffusion decoding.

Results and Analysis

PALM is evaluated across simulation benchmarks and real-world long-horizon generalization settings. The real-world setup uses a UFACTORY xArm6 with Gripper G2 and dual RealSense D455 cameras. The long-horizon task requires a robot to complete a six-step instruction sequence while remaining robust to relocation, lighting shifts, and distractor objects.

91.8%

success rate on LIBERO-LONG

+12.5%

average-length gain on CALVIN ABCD

2×

real-world improvement over baselines

Real-world robot setup and six-step long-horizon task guided by one high-level instruction.

PALM robustness under relocation, unseen lighting, and visual distractions

Robustness settings: random relocation, unseen lighting disturbances, and multi-object visual distractions.

PALM achieves longer-horizon completion across real-world generalization settings.

Qualitative Videos

The videos below show PALM rollouts across the original task and robustness settings. Each clip illustrates how progress-aware affordance reasoning supports temporally coherent execution across multi-step manipulation.

Original

Unseen Lighting

Visual Distraction

Random Relocation

BibTeX

@article{liu2026palm,
  title={PALM: Progress-Aware Policy Learning via Affordance Reasoning for Long-Horizon Robotic Manipulation},
  author={Liu, Yuanzhe and Zhu, Jingyuan and Mo, Yuchen and Li, Gen and Cao, Xu and Jin, Jin and Shen, Yifan and Li, Zhengyuan and Yu, Tianjiao and Yuan, Wenzhen and Ding, Fangqiang and Lourentzou, Ismini},
  journal={arXiv preprint arXiv:2601.07060},
  eprint={2601.07060},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  year={2026}
}