Agent memory is what an AI agent knows over time about users, the business, and the work. How it works, why chat history and RAG fall short, and scaling it.
Reason for return
Agent memory is everything an AI agent knows across time about the users, the business, and the work it's doing — so it can reason, personalize, and act without starting from scratch every turn. It is the layer that lets an agent remember a user's preferences from last month, a decision made two sessions ago, and a fact that changed yesterday. Chat-history buffers, vector-store RAG, and markdown files are all attempts to implement agent memory; most break down on three things — provenance, governance, and temporality.
A stateless agent re-derives everything from the current prompt. That works for one-shot tasks and fails for anything ongoing: it forgets what the user told it, contradicts earlier facts, and can't personalize. Most agent failures aren't model-quality problems — they're context problems. The right information wasn't available at the right moment, so the agent guessed. Agent memory is how you make the right context reliably available.
A real agent-memory system has to handle four things that naive approaches miss:
| Requirement | What it means | Where naive approaches fail |
|---|---|---|
| Unification | Combine chat, business data, and documents into one view of the user | Chat buffers see only conversations |
| Temporality | Track how facts change; know what's true now vs then | Flat stores mix stale and current facts |
| Provenance | Trace every fact back to its source | RAG returns chunks with no audit trail |
| Governance | Access control, retention, and audit across all memory | Bolted-on memory has none of this |
The strongest implementations use a temporal context graph. The agent's inputs — messages, JSON business data, documents — are ingested and the system extracts entities, relationships, and facts into a graph. The graph is bi-temporal: each fact records where it came from (provenance) and when it was valid (a validity window). When new information contradicts an old fact, the old fact is invalidated rather than overwritten, so history is preserved and the agent can answer “what's true now?” or “what was true on this date?” correctly.
Reason for return
Additional comments
At retrieval time, the system returns the relevant, token-efficient slice of memory for the current task — not the whole transcript. That keeps the context window clean (less noise, fewer hallucinations) and fast.
Single-user memory is one thing; running memory for millions of users, agents, and data sources under enterprise governance is another. A Context Lake is the infrastructure that implements agent memory at enterprise scale — a governed system of context graphs that manages and serves everything agents need to know. It's the data-lake pattern applied to agent context: different data, different consumers, the same governance rigor.
This is what Zep provides. Zep manages, governs, and serves agent memory at scale on temporal context graphs — built on the open-source library Graphiti, running on Zep's Context Graph Engine. Governance (attribute-based access control, retention, audit) lives in the substrate; retrieval stays under 200ms p95 across millions of graphs; and on the LoCoMo and LongMemEval benchmarks Zep leads on accuracy, latency, and token efficiency at once. You can add agent memory in three lines of code, with any framework or none.
Related: Agent memory vs RAG · What is a Context Lake? · Reducing LLM hallucinations
It's the agent's persistent knowledge of the user, the business, and the work — across sessions and sources — so it doesn't start over every conversation.
No. RAG retrieves static documents by similarity. Agent memory tracks evolving facts about users and the business over time, with provenance and governance. Most production agents use both.
Use a memory platform that builds a temporal context graph from the agent's inputs and serves relevant context at query time — rather than stuffing chat history into the prompt. With Zep this is three lines of code and works with any agent framework.
A Context Lake — a governed system of context graphs that manages, governs, and serves memory across millions of users with millisecond retrieval, access control, retention, and audit built in.