`

ECE logo Uncertainty in Action: Confidence Elicitation in Embodied Agents

University of Illinois Urbana-Champaign
TL;DR: We propose a framework for embodied verbalized confidence elicitation, designed for multimodal and action-driven environments. We introduce Elicitation Policies and Execution Policies to enhance confidence estimation in embodied settings. We provide the first structured analysis of open-world embodied uncertainty and identify effective methods for improving confidence calibration and failure prediction, while also pinpointing persistent challenges.
Interpolate start reference image.

Our proposed Confidence Estimation Framework introduces a structured approach to assessing and expressing an agent’s confidence through Elicitation Policies and Execution Policies. Operating at both perception and action stages, Elicitation Modules prompt the agent to evaluate uncertainty in its observations and decisions. Execution Policies further refine confidence calibration by expanding the agent’s reasoning space, enabling more robust and context-aware decision-making.

Abstract

Expressing confidence is crucial for embodied agents as they navigate dynamic, multimodal environments where uncertainty arises from both perception and decision-making processes. To the best of our knowledge, this is the first work investigating open-world embodied confidence elicitation, focusing on settings where agents, powered by large language models and vision-language models, lack direct access to their internal reasoning processes. We introduce Elicitation Policies designed to address inductive, deductive, and abductive uncertainties, along with Execution Policies for scenario reinterpretation, action sampling, and hypothetical reasoning. Evaluating agents on calibration and failure prediction tasks in the Minecraft environment, we show that structured reasoning approaches, such as Chain-of-Thoughts, improve confidence calibration performance. However, our findings also reveal persistent challenges in distinguishing uncertainty, particularly under abductive settings, highlighting the need for more sophisticated embodied confidence elicitation methods.

Embodied Confidence Elicitation

ECE Architecture

Embodied Confidence Elicitation. Elicitation Policies enable agents to express uncertainty, while Execution Policies refine and expand confidence assessment through scenario reinterpretation, action sampling, and hypothetical reasoning. Together, they enhance confidence calibration in embodied agents. The orange text represents the vanilla elicitation policy, which incorporates the vanilla confidence prompt into the original instruction. The brown arrows denote the Scenario-Reinterpretation execution policy, prompting the agent to generate additional scene insights.

Results

ece results.

Confidence Metrics Across Elicitation Policies with three models (GPT-4V, MineLLM, and LLaMA-based STEVE) using different elicitation strategies: Vanilla (basic task understanding), Self-Intervention (reflection on own actions), Chain-of-Thought (step-by-step reasoning), Plan & Solve (explicit planning before execution), and Top-K (confidence distribution across multiple outputs) with No Execution Policies applied. The best performance across each model is in bold.



ece results.

ECE and AUROC across Models and Execution Policies. Bars present ECE (top, lower is better) and AUROC (bottom, higher is better) under different elicitation strategies. Red dashed lines are metrics for Vanilla elicitation with no execution policy applied.

BibTeX

@misc{yu2025uncertaintyactionconfidenceelicitation,
      title={Uncertainty in Action: Confidence Elicitation in Embodied Agents}, 
      author={Tianjiao Yu and Vedant Shah and Muntasir Wahed and Kiet A. Nguyen and Adheesh Juvekar and Tal August and Ismini Lourentzou},
      year={2025},
      eprint={2503.10628},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2503.10628}, 
      }