How to Give an AI Agent Long-Term Memory (Guide)

Key takeaways

Give an agent long-term memory with a memory layer that persists facts across sessions and serves the relevant slice per turn — not by stuffing chat history into the context window.
Use a temporal context graph: facts with provenance and validity, retrieved relevantly; add it in three lines of code, with any framework.
Govern and scale it as a Context Lake — ABAC, retention, audit, sub-200ms retrieval. Zep reports 94.7% LoCoMo accuracy (results).

You give an AI agent long-term memory by adding a memory layer that persists what the agent learns — about the user, the business, and the work — across sessions, and serves the relevant slice back into the prompt at run time. The wrong way is to keep appending chat history to the context window; it grows without bound, adds noise, and mixes stale facts with current ones. The right way is a temporal context graph that stores facts with provenance and validity and retrieves only what's relevant. Here's how.

The approaches, compared

Approach	How it works	Breaks down when
Stuff full chat history into the prompt	Append every turn	Context window fills; cost/noise rise; stale facts contradict current ones
Summarize history	Periodically compress the transcript	Lossy; loses specifics; no provenance or “as of” time
Vector store / RAG over history	Embed turns, retrieve similar ones	“Similar” ≠ “needed”; no relationships; no temporal reasoning
Temporal context graph	Extract facts + relationships with validity + provenance; retrieve relevant current context	The right tool — handles change, relationships, and provenance

Step by step

1. Choose a memory layer, not a bigger window

A larger context window lets you pass more text; it isn't memory. You still need to decide what to pass. A dedicated memory layer builds persistent structure from the agent's inputs and selects the relevant context per turn.

2. Ingest every source the agent touches

Long-term memory should unify more than chat: messages, business data (JSON), and documents. Feeding all sources into one graph is what lets the agent know the user and the business — not just the conversation.

3. Let the system build a temporal context graph

Rather than storing raw text, extract entities, relationships, and facts into a bi-temporal graph: each fact records where it came from (provenance) and when it was valid. When a fact changes, the old one is invalidated, not overwritten — so the agent can answer “what's true now?” and “what was true then?”

4. Retrieve relevant context at run time — not the whole history

On each turn, fetch the token-efficient, relevant slice of memory and add it to the prompt. This keeps the context window clean and latency low, and it's the step that actually reduces hallucination.

8 candidates · ranked for task

ObsJane upgrades within 2 weeks of each launch.

FactJoined Aug 2024.

FactCurrently on Pro v4.

FactAccount billing monthly.

SumRecent chats: power-user features.

SumPast tickets: rate limits.

ObsTickets pair with plan changes.

FactLast login 12h ago.

Context block1,847 / 2,000

ObsJane upgrades within 2 weeks of each launch.

FactCurrently on Pro v4.

SumRecent chats: power-user features.

ObsTickets pair with plan changes.

5. Add memory in a few lines of code

With Zep, this is three lines and works with any agent framework — or none:

# Add the turn to memory and get assembled context back in one call
response = client.thread.add_messages(
    thread_id=thread_id,
    messages=[Message(name="Jane", role="user", content="I'd like to upgrade my plan...")],
    return_context=True,
)

# Add business data to the same user's graph
client.graph.add(user_id=user_id, type="json",
                 data=json.dumps({"event": "plan_upgrade", "to": "pro", "mrr": 49}))

# Retrieve relevant context for the next turn
user_context = client.thread.get_user_context(thread_id=thread_id)

6. Govern and scale it

In production you need more than storage: access control over what each agent can see, retention policies, audit, and millisecond retrieval across many users. That's the difference between a memory feature and memory infrastructure — a Context Lake.

What belongs in long-term memory

Long-term memory is most valuable when it's fed more than chat. Useful sources to ingest into the same user graph:

Conversation — every user and assistant message (the running relationship).
Business data— transactions, plan changes, support tickets, app events, CRM records. This is what lets the agent know the user's situation, not just what they typed.
Documents — emails, transcripts, and files tied to the user.

Send any of it as messages or via graph.add (JSON, text, or message). The system extracts entities and facts from all of it into one temporal graph, so retrieval can draw on the whole picture.

Retrieving the right slice (including patterns)

At run time, pull the assembled context — and, when useful, the derived Observations (cross-session patterns) via a dedicated search scope:

# Assembled, token-efficient context for the current turn
context = client.thread.get_user_context(thread_id=thread_id).context

# Or query derived patterns directly
patterns = client.graph.search(
    user_id=user_id,
    query="What does this user do before they upgrade?",
    scope="observations",
    limit=5,
)

The agent receives the relevant facts, entities, and patterns — never the entire history.

What this gets you

An agent that remembers preferences and decisions across sessions, stays consistent when facts change, grounds answers in what's actually known about the user and business, and does it in milliseconds at scale. Zep is the Context Lake for AI agents — it manages, governs, and serves agent memory on temporal context graphs (built on the open-source Graphiti), with sub-200ms p95 retrieval and benchmark-leading accuracy on LoCoMo and LongMemEval.

Related: What is agent memory? · Agent memory vs RAG · What is a temporal knowledge graph? · Persistent memory tutorial · Quickstart (docs) · Benchmark results · AI agent memory guides

Frequently asked questions

What's the simplest way to give an agent long-term memory?

Add a memory layer that builds a context graph from the agent's inputs and returns relevant context per turn. With Zep it's three lines of code, framework-agnostic.

Why not just use a bigger context window?

More tokens isn't memory — you still must choose what to include, and dumping full history adds noise, cost, and contradictions. Memory selects the relevant, current facts.

Can I use long-term memory with my existing agent framework?

Yes. A good memory layer is framework-agnostic and works with LangGraph, custom agents, or none.

How is this different from RAG?

RAG retrieves static documents by similarity. Long-term agent memory tracks evolving, provenance-stamped facts about the user and business over time. Use both: RAG for documents, memory for state.

What should I store in long-term memory?

Conversation, business data (transactions, tickets, events), and documents — all ingested into one per-user temporal graph so the agent knows the user's full situation, not just the chat.

Where does long-term memory live?

Outside the context window, in a durable per-user store (a temporal context graph). At enterprise scale that store is a Context Lake — governed and served in milliseconds.

How do I surface patterns across sessions, not just facts?

Retrieve Observations — derived patterns (recurring behaviors, decisions, preferences) surfaced across the graph — via the Observations search scope or a context template.