Agent memory is what an AI agent knows over time about users, the business, and the work. How it works, why chat history and RAG fall short, and scaling it.
Reason for return
Agent memory is everything an AI agent knows across time about the users, the business, and the work it's doing — so it can reason, personalize, and act without starting from scratch every turn. It is the layer that lets an agent remember a user's preferences from last month, a decision made two sessions ago, and a fact that changed yesterday. Chat-history buffers, vector-store RAG, and markdown files are all attempts to implement agent memory; most break down on three things — provenance, governance, and temporality.
A stateless agent re-derives everything from the current prompt. That works for one-shot tasks and fails for anything ongoing: it forgets what the user told it, contradicts earlier facts, and can't personalize. Most agent failures aren't model-quality problems — they're context problems. The right information wasn't available at the right moment, so the agent guessed. Agent memory is how you make the right context reliably available.
A real agent-memory system has to handle four things that naive approaches miss:
| Requirement | What it means | Where naive approaches fail |
|---|---|---|
| Unification | Combine chat, business data, and documents into one view of the user | Chat buffers see only conversations |
| Temporality | Track how facts change; know what's true now vs then | Flat stores mix stale and current facts |
| Provenance | Trace every fact back to its source | RAG returns chunks with no audit trail |
| Governance | Access control, retention, and audit across all memory | Bolted-on memory has none of this |
The strongest implementations use a temporal context graph. The agent's inputs — messages, JSON business data, documents — are ingested and the system extracts entities, relationships, and facts into a graph. The graph is bi-temporal: each fact records where it came from (provenance) and when it was valid (a validity window). When new information contradicts an old fact, the old fact is invalidated rather than overwritten, so history is preserved and the agent can answer “what's true now?” or “what was true on this date?” correctly.
Reason for return
Additional comments
At retrieval time, the system returns the relevant, token-efficient slice of memory for the current task — not the whole transcript. That keeps the context window clean (less noise, fewer hallucinations) and fast.
Agents need two kinds of memory, and conflating them causes bugs.
A capable system serves both: it keeps the working context coherent within a session and persists durable facts across sessions, then assembles the right mix into each prompt.
A temporal context graph organizes long-term memory in layers, each answering a different question:
In practice, agent memory is a write path (ingest signal) and a read path (assemble context). With Zep it's a few calls, framework-agnostic:
from zep_cloud import Zep
client = Zep(api_key="YOUR_API_KEY")
# WRITE — add a conversation turn and business data to the user's graph
client.thread.add_messages(
thread_id=thread_id,
messages=[Message(name="Jane", role="user", content="I'd like to upgrade to Pro.")],
)
client.graph.add(
user_id=user_id, type="json",
data=json.dumps({"event": "plan_upgrade", "to": "pro", "mrr": 49}),
)
# READ — assemble relevant, token-efficient context for the next turn
context = client.thread.get_user_context(thread_id=thread_id).context
# Or query the graph directly, including derived patterns
observations = client.graph.search(
user_id=user_id,
query="What does this user do before upgrading?",
scope="observations",
limit=5,
)The agent never receives the whole history — only the relevant facts, entities, and Observations for the task. For a full walkthrough, see how to give an AI agent long-term memory.
Single-user memory is one thing; running memory for millions of users, agents, and data sources under enterprise governance is another. A Context Lake is the infrastructure that implements agent memory at enterprise scale — a governed system of context graphs that manages and serves everything agents need to know. It's the data-lake pattern applied to agent context: different data, different consumers, the same governance rigor.
This is what Zep provides. Zep manages, governs, and serves agent memory at scale on temporal context graphs — built on the open-source library Graphiti, running on Zep's Context Graph Engine. Governance (attribute-based access control, retention, audit) lives in the substrate; retrieval stays under 200ms p95 across millions of graphs; and on the LoCoMo and LongMemEval benchmarks Zep leads on accuracy, latency, and token efficiency at once. You can add agent memory in three lines of code, with any framework or none.
Related: Agent memory vs RAG · What is a temporal knowledge graph? · What is a Context Lake? · How to give an AI agent long-term memory · Reducing LLM hallucinations · AI agent memory guides
It's the agent's persistent knowledge of the user, the business, and the work — across sessions and sources — so it doesn't start over every conversation.
No. RAG retrieves static documents by similarity. Agent memory tracks evolving facts about users and the business over time, with provenance and governance. Most production agents use both.
Use a memory platform that builds a temporal context graph from the agent's inputs and serves relevant context at query time — rather than stuffing chat history into the prompt. With Zep this is three lines of code and works with any agent framework.
A Context Lake — a governed system of context graphs that manages, governs, and serves memory across millions of users with millisecond retrieval, access control, retention, and audit built in.
Short-term (working) memory holds the current session in the context window; long-term memory persists facts about the user and business across sessions. A temporal context graph organizes long-term memory into episodes (raw), entities and facts (semantic), and Observations (derived patterns).
No. A vector database is one storage primitive that retrieves text by similarity. Agent memory is the system that builds and serves an evolving, governed, temporal model of the user and business — it may use vector search internally, but adds relationships, temporality, provenance, and governance.
No — it feeds it. The context window is finite; agent memory decides which facts, entities, and Observations to place into it each turn, so the agent gets what it needs without overflow or noise.
Measure whether the system retrieves the right facts across sessions and over time — context completeness first, then answer correctness — using multi-session benchmarks like LoCoMo and LongMemEval. See how to test agent memory.