f in x
> cd .. / HUB_EDITORIALE
News

Alibaba unveils SkillWeaver: AI framework cuts agent token usage by 99%

[2026-07-03] Author: Ing. Calogero Bono
Zenithby Meteora Web The operating system for your business. Social, clients, bookings and invoices in one platform. Gyms, barbers, professionals. Discover Zenith Free demo · no card

As enterprise AI systems scale to handle complex workflows, routing subtasks to the right tools remains a critical bottleneck. Researchers at Alibaba have introduced SkillWeaver, a framework that reduces token consumption by over 99% compared to naive approaches while improving routing accuracy. The key innovation is Skill-Aware Decomposition (SAD), a feedback loop that aligns task decomposition with actual tool libraries.

The challenge of skill routing in enterprise AI agents

Modern agents integrate vast skill libraries, each described by structured natural language documentation. Exposing the entire library to an LLM is highly inefficient: it overwhelms context limits and consumes hundreds of thousands of tokens. Current frameworks treat routing as a single-skill selection, but real-world queries are compositional. For instance, a request like "Download the dataset, transform it, and create visual reports" requires sequencing multiple tools. SkillWeaver addresses this by framing the problem as compositional skill routing.

Sponsored Protocol

How SkillWeaver and SAD work

SkillWeaver operates in three stages: Decompose, Retrieve, and Compose. First, an LLM breaks the complex query into atomic sub-tasks. Then, an embedding model compares each sub-task against the skill library to fetch top candidates. Finally, a planner evaluates inter-skill compatibility and assembles a Directed Acyclic Graph (DAG) for parallel execution. The Iterative Skill-Aware Decomposition (SAD) feedback loop refines the initial decomposition using retrieved tool hints, ensuring granularity and vocabulary match actual skills. For example, the system might decompose a data pipeline into "api-client", "csv-parser", and "chart-gen" rather than generic steps.

Sponsored Protocol

99.9% token reduction and 50% accuracy boost

Evaluated on CompSkillBench (300 multi-step queries using 2,209 real MCP skills), SkillWeaver with a 7B Qwen model achieved 67.7% decomposition accuracy with SAD versus 51% without. On hard tasks requiring 4-5 skills, accuracy improved by 50%. Token consumption plummeted from 884,000 tokens (LLM-Direct baseline) to just 1,160 tokens per query, a 99.9% reduction. Larger models (14B) performed worse without SAD due to over-decomposition, while the ReAct baseline achieved 0% accuracy. These results translate to drastically lower API costs and faster response times.

Practical considerations for developers

Although the source code is not yet released, SAD can be implemented via prompt engineering and retrieval loops using libraries like LangChain or LlamaIndex. The embedding model (all-MiniLM-L6-v2) is open source; indexing 2,209 skills takes 15 seconds with retrieval latency under 15 ms. Teams should consider adding a cross-encoder reranker to improve top-1 accuracy. A current limitation is the lack of error recovery: developers need to build fallback and retry mechanisms for production. For related AI developments, see our coverage of leaked Galaxy Glasses videos and Hopper FTC settlement. For broader context, refer to the Wikipedia page on AI agents.

Sponsored Protocol

Source: https://venturebeat.com/orchestration/new-alibaba-ai-framework-skips-loading-every-tool-cutting-agent-token-use-99

Ing. Calogero Bono

> AUTHOR_EXTRACTED

Ing. Calogero Bono

Ingegnere informatico, fondatore di Meteora Web e Zenith OS. System administrator e progettista di piattaforme, app e CMS proprietari, con esperienza in sviluppo full-stack, marketing digitale ed ecosistema Google.
[ Read Full Dossier ]

> METEORA_WEB // DIGITAL AGENCY

We build the digital presence your business deserves.

Websites, social media, online advertising, e-commerce and high-performance hosting, engineered with method by computer engineers in Sciacca, for all of Italy.

> MW_JOURNAL

> READ_ALL()