What Is Agent Memory? Definition, How It Works

Key takeaways

Agent memory is everything an AI agent knows across time about users, the business, and the work — so it can reason without starting over each turn.
Naive approaches (chat buffers, vector-store RAG, markdown) break on provenance, governance, and temporality; a temporal context graph handles all three.
At enterprise scale, agent memory is implemented as a Context Lake. Zep leads LoCoMo and LongMemEval on accuracy, latency, and token use at once (benchmark results).

Agent memory is everything an AI agent knows across time about the users, the business, and the work it's doing — so it can reason, personalize, and act without starting from scratch every turn. It is the layer that lets an agent remember a user's preferences from last month, a decision made two sessions ago, and a fact that changed yesterday. Chat-history buffers, vector-store RAG, and markdown files are all attempts to implement agent memory; most break down on three things — provenance, governance, and temporality.

Why agents need memory

A stateless agent re-derives everything from the current prompt. That works for one-shot tasks and fails for anything ongoing: it forgets what the user told it, contradicts earlier facts, and can't personalize. Most agent failures aren't model-quality problems — they're context problems. The right information wasn't available at the right moment, so the agent guessed. Agent memory is how you make the right context reliably available.

What agent memory has to do

A real agent-memory system has to handle four things that naive approaches miss:

Requirement	What it means	Where naive approaches fail
Unification	Combine chat, business data, and documents into one view of the user	Chat buffers see only conversations
Temporality	Track how facts change; know what's true now vs then	Flat stores mix stale and current facts
Provenance	Trace every fact back to its source	RAG returns chunks with no audit trail
Governance	Access control, retention, and audit across all memory	Bolted-on memory has none of this

How agent memory works

The strongest implementations use a temporal context graph. The agent's inputs — messages, JSON business data, documents — are ingested and the system extracts entities, relationships, and facts into a graph. The graph is bi-temporal: each fact records where it came from (provenance) and when it was valid (a validity window). When new information contradicts an old fact, the old fact is invalidated rather than overwritten, so history is preserved and the agent can answer “what's true now?” or “what was true on this date?” correctly.

RRobbie2024-09-07 · 14:27

I only wear Adidas shoes. I love them!

Facts

Robbie only wears Adidas shoes.
Robbie strongly favors Adidas shoes.

soleworks.com/account/returns/SO-48219

SoleworksReturn · Order #SO-48219 · Adidas Ultraboost 22

Reason for return

Product fell apart

Additional comments

These Adidas fell apartafter three weeks and I'm furious. I'll be buying Nike from now on.

Facts

Robbie only wears Adidas shoes.
Robbie strongly favors Adidas shoes.
Robbie’s Adidas shoes fell apart.
Robbie is returning their Adidas shoes.
Robbie is angry about their Adidas shoes.
Robbie intends to wear Nike shoes.

At retrieval time, the system returns the relevant, token-efficient slice of memory for the current task — not the whole transcript. That keeps the context window clean (less noise, fewer hallucinations) and fast.

Short-term vs. long-term agent memory

Agents need two kinds of memory, and conflating them causes bugs.

Short-term (working) memory is the current session — the running conversation and immediate task state. It lives in the context window and is bounded by it.
Long-term memory is everything that must persist across sessions: who the user is, decisions made, preferences, and the state of the business objects the agent touches. It can't live in the context window (too large, too noisy), so it needs a durable store that returns only the relevant slice on demand.

A capable system serves both: it keeps the working context coherent within a session and persists durable facts across sessions, then assembles the right mix into each prompt.

What's inside agent memory: episodes, entities, and Observations

A temporal context graph organizes long-term memory in layers, each answering a different question:

Episodes — the raw, lossless record of every input (a message, a JSON event, a document). The source of truth everything else traces back to.
Entities and facts — the semantic layer. The system extracts entities (people, accounts, products) and the facts and relationships between them, each with a validity window, so the graph represents what is true and when.
Observations — a derived layer that captures patterns across many episodes: recurring behaviors, decisions, preferences, and state transitions that no single fact holds. Discovered automatically and surfaced as one retrievable claim, they close the aggregation gap — e.g., “the user has upgraded within two weeks of each of the last three launches.”

Adding and retrieving agent memory: a worked example

In practice, agent memory is a write path (ingest signal) and a read path (assemble context). With Zep it's a few calls, framework-agnostic:

from zep_cloud import Zep
client = Zep(api_key="YOUR_API_KEY")

# WRITE — add a conversation turn and business data to the user's graph
client.thread.add_messages(
    thread_id=thread_id,
    messages=[Message(name="Jane", role="user", content="I'd like to upgrade to Pro.")],
)
client.graph.add(
    user_id=user_id, type="json",
    data=json.dumps({"event": "plan_upgrade", "to": "pro", "mrr": 49}),
)

# READ — assemble relevant, token-efficient context for the next turn
context = client.thread.get_user_context(thread_id=thread_id).context

# Or query the graph directly, including derived patterns
observations = client.graph.search(
    user_id=user_id,
    query="What does this user do before upgrading?",
    scope="observations",
    limit=5,
)

The agent never receives the whole history — only the relevant facts, entities, and Observations for the task. For a full walkthrough, see how to give an AI agent long-term memory.

Agent memory vs. related ideas

vs. chat history: chat history is one input to agent memory, not memory itself. It sees only conversations and grows unboundedly.
vs. RAG: RAG retrieves static documents by similarity. Agent memory tracks evolving facts about users and the business over time. They're complementary — use RAG for documents, memory for state.
vs. a bigger context window: more tokens isn't memory. Dumping full history adds noise and cost; memory selects what matters.

Agent memory at enterprise scale: the Context Lake

Single-user memory is one thing; running memory for millions of users, agents, and data sources under enterprise governance is another. A Context Lake is the infrastructure that implements agent memory at enterprise scale — a governed system of context graphs that manages and serves everything agents need to know. It's the data-lake pattern applied to agent context: different data, different consumers, the same governance rigor.

This is what Zep provides. Zep manages, governs, and serves agent memory at scale on temporal context graphs — built on the open-source library Graphiti, running on Zep's Context Graph Engine. Governance (attribute-based access control, retention, audit) lives in the substrate; retrieval stays under 200ms p95 across millions of graphs; and on the LoCoMo and LongMemEval benchmarks Zep leads on accuracy, latency, and token efficiency at once. You can add agent memory in three lines of code, with any framework or none.

Frequently asked questions

What is agent memory in simple terms?

It's the agent's persistent knowledge of the user, the business, and the work — across sessions and sources — so it doesn't start over every conversation.

Is agent memory the same as RAG?

No. RAG retrieves static documents by similarity. Agent memory tracks evolving facts about users and the business over time, with provenance and governance. Most production agents use both.

How do you give an AI agent long-term memory?

Use a memory platform that builds a temporal context graph from the agent's inputs and serves relevant context at query time — rather than stuffing chat history into the prompt. With Zep this is three lines of code and works with any agent framework.

What's the best way to do agent memory at enterprise scale?

A Context Lake — a governed system of context graphs that manages, governs, and serves memory across millions of users with millisecond retrieval, access control, retention, and audit built in.

What are the types of agent memory?

Short-term (working) memory holds the current session in the context window; long-term memory persists facts about the user and business across sessions. A temporal context graph organizes long-term memory into episodes (raw), entities and facts (semantic), and Observations (derived patterns).

Is agent memory the same as a vector database?

No. A vector database is one storage primitive that retrieves text by similarity. Agent memory is the system that builds and serves an evolving, governed, temporal model of the user and business — it may use vector search internally, but adds relationships, temporality, provenance, and governance.

Does agent memory replace the context window?

No — it feeds it. The context window is finite; agent memory decides which facts, entities, and Observations to place into it each turn, so the agent gets what it needs without overflow or noise.

How do you evaluate agent memory?

Measure whether the system retrieves the right facts across sessions and over time — context completeness first, then answer correctness — using multi-session benchmarks like LoCoMo and LongMemEval. See how to test agent memory.