We're hiring! Come build with us
Zep
AI Agents Guide

What Is Agent Memory?

Agent memory is what an AI agent knows over time about users, the business, and the work. How it works, why chat history and RAG fall short, and scaling it.

Chat
RRobbie2024-09-07
I only wear Adidas shoes. I love them!
Business data
soleworks.com/returns/SO-48219
SoleworksReturn · Adidas Ultraboost 22

Reason for return

Product fell apart
These Adidas fell apartafter three weeks. I’ll be buying Nike from now on.
FactsExtracted · 3
  • Robbie strongly favors Adidas shoes.
  • Robbie’s Adidas Ultraboost 22 fell apart.
  • Robbie will buy Nike next.
EntitiesRelationshipsTimeline

Key takeaways

  • Agent memory is everything an AI agent knows across time about users, the business, and the work — so it can reason without starting over each turn.
  • Naive approaches (chat buffers, vector-store RAG, markdown) break on provenance, governance, and temporality; a temporal context graph handles all three.
  • At enterprise scale, agent memory is implemented as a Context Lake. Zep leads LoCoMo and LongMemEval on accuracy, latency, and token use at once (benchmark results).

Agent memory is everything an AI agent knows across time about the users, the business, and the work it's doing — so it can reason, personalize, and act without starting from scratch every turn. It is the layer that lets an agent remember a user's preferences from last month, a decision made two sessions ago, and a fact that changed yesterday. Chat-history buffers, vector-store RAG, and markdown files are all attempts to implement agent memory; most break down on three things — provenance, governance, and temporality.

Why agents need memory

A stateless agent re-derives everything from the current prompt. That works for one-shot tasks and fails for anything ongoing: it forgets what the user told it, contradicts earlier facts, and can't personalize. Most agent failures aren't model-quality problems — they're context problems. The right information wasn't available at the right moment, so the agent guessed. Agent memory is how you make the right context reliably available.

What agent memory has to do

A real agent-memory system has to handle four things that naive approaches miss:

RequirementWhat it meansWhere naive approaches fail
UnificationCombine chat, business data, and documents into one view of the userChat buffers see only conversations
TemporalityTrack how facts change; know what's true now vs thenFlat stores mix stale and current facts
ProvenanceTrace every fact back to its sourceRAG returns chunks with no audit trail
GovernanceAccess control, retention, and audit across all memoryBolted-on memory has none of this

How agent memory works

The strongest implementations use a temporal context graph. The agent's inputs — messages, JSON business data, documents — are ingested and the system extracts entities, relationships, and facts into a graph. The graph is bi-temporal: each fact records where it came from (provenance) and when it was valid (a validity window). When new information contradicts an old fact, the old fact is invalidated rather than overwritten, so history is preserved and the agent can answer “what's true now?” or “what was true on this date?” correctly.

RRobbie2024-09-07 · 14:27
I only wear Adidas shoes. I love them!
Facts
  • Robbie only wears Adidas shoes.
  • Robbie strongly favors Adidas shoes.
soleworks.com/account/returns/SO-48219
SoleworksReturn · Order #SO-48219 · Adidas Ultraboost 22

Reason for return

Product fell apart

Additional comments

These Adidas fell apartafter three weeks and I'm furious. I'll be buying Nike from now on.
Facts
  • Robbie only wears Adidas shoes.
  • Robbie strongly favors Adidas shoes.
  • Robbie’s Adidas shoes fell apart.
  • Robbie is returning their Adidas shoes.
  • Robbie is angry about their Adidas shoes.
  • Robbie intends to wear Nike shoes.

At retrieval time, the system returns the relevant, token-efficient slice of memory for the current task — not the whole transcript. That keeps the context window clean (less noise, fewer hallucinations) and fast.

Agent memory vs. related ideas

  • vs. chat history: chat history is one input to agent memory, not memory itself. It sees only conversations and grows unboundedly.
  • vs. RAG: RAG retrieves static documents by similarity. Agent memory tracks evolving facts about users and the business over time. They're complementary — use RAG for documents, memory for state.
  • vs. a bigger context window: more tokens isn't memory. Dumping full history adds noise and cost; memory selects what matters.

Agent memory at enterprise scale: the Context Lake

Single-user memory is one thing; running memory for millions of users, agents, and data sources under enterprise governance is another. A Context Lake is the infrastructure that implements agent memory at enterprise scale — a governed system of context graphs that manages and serves everything agents need to know. It's the data-lake pattern applied to agent context: different data, different consumers, the same governance rigor.

This is what Zep provides. Zep manages, governs, and serves agent memory at scale on temporal context graphs — built on the open-source library Graphiti, running on Zep's Context Graph Engine. Governance (attribute-based access control, retention, audit) lives in the substrate; retrieval stays under 200ms p95 across millions of graphs; and on the LoCoMo and LongMemEval benchmarks Zep leads on accuracy, latency, and token efficiency at once. You can add agent memory in three lines of code, with any framework or none.


Related: Agent memory vs RAG · What is a Context Lake? · Reducing LLM hallucinations

Frequently asked questions

What is agent memory in simple terms?

It's the agent's persistent knowledge of the user, the business, and the work — across sessions and sources — so it doesn't start over every conversation.

Is agent memory the same as RAG?

No. RAG retrieves static documents by similarity. Agent memory tracks evolving facts about users and the business over time, with provenance and governance. Most production agents use both.

How do you give an AI agent long-term memory?

Use a memory platform that builds a temporal context graph from the agent's inputs and serves relevant context at query time — rather than stuffing chat history into the prompt. With Zep this is three lines of code and works with any agent framework.

What's the best way to do agent memory at enterprise scale?

A Context Lake — a governed system of context graphs that manages, governs, and serves memory across millions of users with millisecond retrieval, access control, retention, and audit built in.