MRAgent cuts token use to 118k: beats LangMem on agentic memory benchmarks • Meteora Web Agency

Large language models face a fundamental hurdle when handling long conversations or multi-step reasoning tasks: context windows fill up quickly, and traditional retrieval systems return noise instead of signal. To address this, researchers at the National University of Singapore developed MRAgent, a framework that abandons the static retrieve-then-reason approach in favor of a mechanism where the agent dynamically builds its memory based on accumulating evidence.

The limits of passive retrieval in long-horizon tasks

In classic Retrieval-Augmented Generation pipelines, documents are retrieved via vector search or graph traversal and passed to the LLM for reasoning. This passive approach fails because it cannot integrate reasoning with memory access, creating three major bottlenecks. First, the system cannot revise its retrieval strategy mid-reasoning; if an agent fetches a document but finds a missing cue, it cannot issue a new query based on that finding. Second, fixed similarity scores and predefined graph expansions return surface-level matches that flood the LLM's context window with irrelevant noise, degrading reasoning. Third, current systems rely heavily on pre-constructed structures such as top-k results and static relevance functions, limiting flexibility for scaling across unpredictable, long-horizon user interactions.

The Cue-Tag-Content mechanism for active memory reconstruction

To overcome these limitations, MRAgent adopts a concept inspired by cognitive neuroscience: an active and associative reconstruction process. Instead of viewing memory as a static database, the framework treats it as an interactive environment. When processing a complex query, the agent uses the LLM's reasoning abilities to explore multiple candidate retrieval paths across a structured memory graph. At each step, the LLM evaluates intermediate evidence and iteratively optimizes its search, inferring new constraints, pursuing the best paths, and pruning irrelevant branches. This allows MRAgent to piece together deeply buried information without filling the context with noise. To make this active exploration efficient, the framework organizes its database using a three-layer Cue-Tag-Content mechanism. Cues are fine-grained keywords such as entities or contextual attributes; Content is the actual stored memory units, divided into episodic memory for concrete events and semantic memory for stable facts; Tags are semantic bridges summarizing the associations between Cues and Content. The LLM first navigates from Cues to candidate Tags, evaluating these short summaries for relevance, and only then accesses detailed contents. For example, if a user asks "How did Nate use the prize money when he won his third video game tournament?", MRAgent extracts initial cues, maps them to the graph, sees Tags like "Tournament Victory" and "Tournament Participation", discards the latter, retrieves linked episodic content, selects the most relevant memory, updates cues with "tournament earnings", and iterates until it can answer.

Record-breaking performance and cost savings

The researchers tested MRAgent on the LoCoMo and LongMemEval benchmarks, which evaluate agents on long-horizon tasks across dozens of sessions and hundreds of dialogue turns. Backbone models were Gemini 2.5 Flash and Claude Sonnet 4.5. MRAgent significantly outperformed all baselines, including standard RAG, A-MEM, MemoryOS, LangMem, and Mem0, in both accuracy and efficiency. On LongMemEval, MRAgent slashed prompt token consumption to just 118k per sample, compared to 632k for A-Mem and 3.26 million for LangMem. Runtime was also halved relative to A-Mem, dropping from 1,122 seconds to 586 seconds. This savings comes from on-demand behavior: evaluating Tags and pruning irrelevant paths before retrieval avoids wasting tokens and context space. The system also knows when to stop, eliminating redundant exploration.

Practical implementation and developer considerations

Despite its effectiveness, MRAgent requires the Cue-Tag-Content structure to be prepared before the agent can query it. Developers must architect the underlying memory database to enable efficient navigation and pruning without exploding compute costs. Fortunately, the framework includes an automated distillation pipeline that uses LLMs to process raw interaction histories and automatically populate the memory graph. Developers simply implement and orchestrate this ingestion pipeline, for instance by setting up a background job or streaming pipeline that passes user interactions through prompt templates to extract metadata before storing them in a graph database. The authors emphasize that this construction phase is lightweight and kept simple. The code is released on GitHub. For developers working under the EU AI Act, MRAgent offers a way to reduce computational costs while complying with regulatory requirements. More information is available at the National University of Singapore website.

Source: https://venturebeat.com/orchestration/new-agentic-memory-framework-uses-118k-tokens-per-query-langmem-burns-through-3-26m