We're hiring! Come build with us
Zep
AI Agents Guide

Agent Memory vs. RAG: What's the Difference, and When to Use Each

RAG retrieves static documents by similarity; agent memory tracks evolving facts about users over time. The differences, where RAG breaks, and using both.

RRobbie2024-09-07 · 14:27
I only wear Adidas shoes. I love them!
Facts
  • Robbie only wears Adidas shoes.
  • Robbie strongly favors Adidas shoes.
soleworks.com/account/returns/SO-48219
SoleworksReturn · Order #SO-48219 · Adidas Ultraboost 22

Reason for return

Product fell apart

Additional comments

These Adidas fell apartafter three weeks and I'm furious. I'll be buying Nike from now on.
Facts
  • Robbie only wears Adidas shoes.
  • Robbie strongly favors Adidas shoes.
  • Robbie’s Adidas shoes fell apart.
  • Robbie is returning their Adidas shoes.
  • Robbie is angry about their Adidas shoes.
  • Robbie intends to wear Nike shoes.

Key takeaways

  • RAG retrieves static documents by similarity; agent memory tracks evolving facts about users and the business over time, with provenance.
  • Using RAG as memory causes forgetting and contradictions — it can't handle change, relationships, or “true now vs. true then.”
  • Use both: RAG for documents, a temporal context graph for state. Zep reports 94.7% LoCoMo accuracy at sub-200ms retrieval (benchmark results).

RAG retrieves static documents by semantic similarity; agent memory tracks evolving facts about users and the business over time, with provenance and the ability to know what's true now versus what was true before. They solve different problems and the best agents use both — but using RAG as your agent's memory is a common mistake that causes forgetting and contradictions.

The short version

RAGAgent memory
StoresDocument chunks + embeddingsEntities, relationships, facts in a temporal graph
Retrieves bySemantic similarity to the queryRelevance + recency + relationships, scoped to the user
Handles change over timeNo — chunks are staticYes — facts have validity windows; old facts are invalidated
Tracks provenanceLimited (which chunk)Yes — every fact traces to its source episode
Best forStatic knowledge: docs, FAQs, policiesStateful context: users, customers, decisions across sessions

Where RAG works

RAG is the right tool for grounding answers in a static body of knowledge— product documentation, support articles, policies, research. You chunk the corpus, embed it, retrieve the nearest chunks to a query, and the model answers from them. For “what does the docs say about X,” RAG is excellent and usually sufficient.

Where RAG breaks as memory

The moment your agent needs to remember a user across time, document RAG starts to fail:

  • It doesn't track change. If a user's preference, plan, or status changes, RAG may retrieve both the old and new statements with no idea which is current. The agent contradicts itself.
  • “Similar” isn't “needed.” Similarity search returns chunks that mention the query terms, not necessarily the specific fact required to answer. The “5 tools and it doesn't call the right one” problem.
  • No relationships. RAG sees isolated chunks, not that “Sarah is the admin for Acme, which downgraded last month.” Memory is relational.
  • History grows without structure. Dumping conversation history into a vector store re-introduces the noise and staleness memory is supposed to solve.

This is why “just use RAG” produces agents that feel forgetful even with a big retrieval index.

What agent memory does instead

Agent memory builds a temporal context graph from the agent's inputs: it extracts entities, relationships, and facts; records provenance and a validity window for each; and invalidates facts when they change. At query time it returns the relevant, token-efficient slice of what's true now (or what was true at a chosen time) for this user — not the nearest document chunks. That's what makes an agent consistent across sessions.

Robbie strongly favors Adidas shoes.

traced_from
Chat messageuser_8a32e1f92024-09-07
“I only wear Adidas shoes. I love them!”

Use both — here's how they fit

The right architecture is usually layered: RAG for documents, agent memory for state. Ground factual answers about your knowledge base with RAG; ground the agent's understanding of the user, customer, and ongoing work with memory. Zep is built for the memory layer — it's the Context Lake for AI agents, managing, governing, and serving agent memory on temporal context graphs (built on the open-source Graphiti), with sub-200ms retrieval and benchmark-leading accuracy on LoCoMo and LongMemEval. It complements your document RAG rather than replacing it.


Related: What is agent memory? · Reducing LLM hallucinations · Research & benchmarks

Frequently asked questions

Can I just use RAG for agent memory?

You can, but it breaks on change-over-time, relationships, and “similar vs. needed.” For multi-session, personalized agents, a temporal context graph is the right tool; keep RAG for static documents.

Is agent memory a type of RAG?

No. Both retrieve context, but RAG retrieves static document chunks by similarity, while agent memory retrieves evolving, provenance-tracked facts from a temporal graph scoped to the user.

Do I need both?

Usually yes. Documents → RAG. User/business state over time → agent memory. Most production agents combine them.

What does Zep replace — my vector database or my RAG stack?

Neither, necessarily. Zep provides the agent-memory layer (the Context Lake). It works alongside document RAG and integrates with any framework.