We're hiring! Come build with us
Zep
AI Agents Guide

How to Give an AI Agent Long-Term Memory

Give an AI agent long-term memory with a temporal context graph that persists facts across sessions and serves relevant context per turn. Approaches compared.

Chat
RRobbie2024-09-07
I only wear Adidas shoes. I love them!
Business data
soleworks.com/returns/SO-48219
SoleworksReturn · Adidas Ultraboost 22

Reason for return

Product fell apart
These Adidas fell apartafter three weeks. I’ll be buying Nike from now on.
FactsExtracted · 3
  • Robbie strongly favors Adidas shoes.
  • Robbie’s Adidas Ultraboost 22 fell apart.
  • Robbie will buy Nike next.
EntitiesRelationshipsTimeline

Key takeaways

  • Give an agent long-term memory with a memory layer that persists facts across sessions and serves the relevant slice per turn — not by stuffing chat history into the context window.
  • Use a temporal context graph: facts with provenance and validity, retrieved relevantly; add it in three lines of code, with any framework.
  • Govern and scale it as a Context Lake — ABAC, retention, audit, sub-200ms retrieval. Zep reports 94.7% LoCoMo accuracy (results).

You give an AI agent long-term memory by adding a memory layer that persists what the agent learns — about the user, the business, and the work — across sessions, and serves the relevant slice back into the prompt at run time. The wrong way is to keep appending chat history to the context window; it grows without bound, adds noise, and mixes stale facts with current ones. The right way is a temporal context graph that stores facts with provenance and validity and retrieves only what's relevant. Here's how.

The approaches, compared

ApproachHow it worksBreaks down when
Stuff full chat history into the promptAppend every turnContext window fills; cost/noise rise; stale facts contradict current ones
Summarize historyPeriodically compress the transcriptLossy; loses specifics; no provenance or “as of” time
Vector store / RAG over historyEmbed turns, retrieve similar ones“Similar” ≠ “needed”; no relationships; no temporal reasoning
Temporal context graphExtract facts + relationships with validity + provenance; retrieve relevant current contextThe right tool — handles change, relationships, and provenance

Step by step

1. Choose a memory layer, not a bigger window

A larger context window lets you pass more text; it isn't memory. You still need to decide what to pass. A dedicated memory layer builds persistent structure from the agent's inputs and selects the relevant context per turn.

2. Ingest every source the agent touches

Long-term memory should unify more than chat: messages, business data (JSON), and documents. Feeding all sources into one graph is what lets the agent know the user and the business — not just the conversation.

3. Let the system build a temporal context graph

Rather than storing raw text, extract entities, relationships, and facts into a bi-temporal graph: each fact records where it came from (provenance) and when it was valid. When a fact changes, the old one is invalidated, not overwritten — so the agent can answer “what's true now?” and “what was true then?”

4. Retrieve relevant context at run time — not the whole history

On each turn, fetch the token-efficient, relevant slice of memory and add it to the prompt. This keeps the context window clean and latency low, and it's the step that actually reduces hallucination.

8 candidates · ranked for task
ObsJane upgrades within 2 weeks of each launch.
FactJoined Aug 2024.
FactCurrently on Pro v4.
FactAccount billing monthly.
SumRecent chats: power-user features.
SumPast tickets: rate limits.
ObsTickets pair with plan changes.
FactLast login 12h ago.
Context block1,847 / 2,000
ObsJane upgrades within 2 weeks of each launch.
FactCurrently on Pro v4.
SumRecent chats: power-user features.
ObsTickets pair with plan changes.

5. Add memory in a few lines of code

With Zep, this is three lines and works with any agent framework — or none:

# Add the turn to memory and get assembled context back in one call
response = client.thread.add_messages(
    thread_id=thread_id,
    messages=[Message(name="Jane", role="user", content="I'd like to upgrade my plan...")],
    return_context=True,
)

# Add business data to the same user's graph
client.graph.add(user_id=user_id, type="json",
                 data=json.dumps({"event": "plan_upgrade", "to": "pro", "mrr": 49}))

# Retrieve relevant context for the next turn
user_context = client.thread.get_user_context(thread_id=thread_id)

6. Govern and scale it

In production you need more than storage: access control over what each agent can see, retention policies, audit, and millisecond retrieval across many users. That's the difference between a memory feature and memory infrastructure — a Context Lake.

What this gets you

An agent that remembers preferences and decisions across sessions, stays consistent when facts change, grounds answers in what's actually known about the user and business, and does it in milliseconds at scale. Zep is the Context Lake for AI agents — it manages, governs, and serves agent memory on temporal context graphs (built on the open-source Graphiti), with sub-200ms p95 retrieval and benchmark-leading accuracy on LoCoMo and LongMemEval.


Related: What is agent memory? · Agent memory vs RAG · What is a temporal knowledge graph? · Benchmark results

Frequently asked questions

What's the simplest way to give an agent long-term memory?

Add a memory layer that builds a context graph from the agent's inputs and returns relevant context per turn. With Zep it's three lines of code, framework-agnostic.

Why not just use a bigger context window?

More tokens isn't memory — you still must choose what to include, and dumping full history adds noise, cost, and contradictions. Memory selects the relevant, current facts.

Can I use long-term memory with my existing agent framework?

Yes. A good memory layer is framework-agnostic and works with LangGraph, custom agents, or none.

How is this different from RAG?

RAG retrieves static documents by similarity. Long-term agent memory tracks evolving, provenance-stamped facts about the user and business over time. Use both: RAG for documents, memory for state.