f in x
Amazon unveils trust framework for AI agents focusing on consistency and robustness
> cd .. / HUB_EDITORIALE
News

Amazon unveils trust framework for AI agents focusing on consistency and robustness

[2026-06-25] Author: Meteora Web

Amazon has revealed a new approach to ensuring the reliability of AI agents, moving beyond traditional EVAL scores. Bryan Silverthorn, director of Amazon's AGI Autonomy research lab, shared the framework exclusively with VentureBeat ahead of its presentation at VB Transform 2026. The core of the strategy is a structured system that evaluates AI agents not just on raw performance, but on consistency, robustness, predictability, and safety.

The limitations of EVAL scores for AI agents

According to Silverthorn, standard benchmarks like EVAL scores provide only a static snapshot of performance, failing to capture the agent's predictability across different prompts, environments, and input types. This shortfall explains why many IT leaders are reluctant to grant access permissions to enterprise systems. A VentureBeat Q2 Pulse Research survey of over 100 senior technology leaders revealed that only 4% trust model guardrails alone. 40% fear unauthorized access to tools or data, while 27% cite prompt manipulation or injection as their top concern.

Sponsored Protocol

Amazon's framework: decoupled systems and human oversight

Amazon's approach moves away from the assumption that models can be made safe solely through internal guardrails. Instead, it emphasizes decoupled systems, such as sandboxed environments where agents propose changes that are then reviewed by a human before implementation. This strategy aims to bridge the trust gap by prioritizing verifiable interactions, even in highly sensitive domains like finance, where the potential damage from an agent is significant. Silverthorn highlighted the importance of evolving from single-agent wrappers to multi-tool architectures capable of self-correcting mid-execution.

Sponsored Protocol

Concrete examples of AI operating in high-reliability contexts include Stanford's simulation of the entire drug discovery cycle using 10,000 AI agents, which showed that failure rates could plummet. Meanwhile, tools like Mistral OCR 4 improve document extraction for European enterprises, integrating reliability into business processes. These developments illustrate how the industry is striving to balance capability and safety.

The future of trust in AI agents: from VB Transform to Waymo

At VB Transform 2026, taking place July 14-15 in Menlo Park, Silverthorn will elaborate on the framework in a session titled "Closing the capability-reliability gap." Another key talk will be by Manasi Joshi, director of systems intelligence and machine learning at Waymo, discussing how to build safe and efficient AI for the physical world. The conference offers an opportunity to explore practical solutions to the trust problem, a critical issue as AI agents are increasingly delegated autonomous tasks in enterprises.

Sponsored Protocol

For more on AI agent challenges, see the original VentureBeat article.

Source: https://venturebeat.com/orchestration/amazon-will-present-its-framework-for-engineering-trustworthy-ai-agents-at-vb-transform-2026

Meteora Web

> AUTHOR_EXTRACTED

Meteora Web

[ Read Full Dossier ]

> METEORA_WEB // DIGITAL AGENCY

We build the digital presence your business deserves.

Websites, social media, online advertising, e-commerce and high-performance hosting, engineered with method by computer engineers in Sciacca, for all of Italy.

> MW_JOURNAL

> READ_ALL()