Give an AI agent long-term memory with a temporal context graph that persists facts across sessions and serves relevant context per turn. Approaches compared.
Reason for return
You give an AI agent long-term memory by adding a memory layer that persists what the agent learns — about the user, the business, and the work — across sessions, and serves the relevant slice back into the prompt at run time. The wrong way is to keep appending chat history to the context window; it grows without bound, adds noise, and mixes stale facts with current ones. The right way is a temporal context graph that stores facts with provenance and validity and retrieves only what's relevant. Here's how.
| Approach | How it works | Breaks down when |
|---|---|---|
| Stuff full chat history into the prompt | Append every turn | Context window fills; cost/noise rise; stale facts contradict current ones |
| Summarize history | Periodically compress the transcript | Lossy; loses specifics; no provenance or “as of” time |
| Vector store / RAG over history | Embed turns, retrieve similar ones | “Similar” ≠ “needed”; no relationships; no temporal reasoning |
| Temporal context graph | Extract facts + relationships with validity + provenance; retrieve relevant current context | The right tool — handles change, relationships, and provenance |
A larger context window lets you pass more text; it isn't memory. You still need to decide what to pass. A dedicated memory layer builds persistent structure from the agent's inputs and selects the relevant context per turn.
Long-term memory should unify more than chat: messages, business data (JSON), and documents. Feeding all sources into one graph is what lets the agent know the user and the business — not just the conversation.
Rather than storing raw text, extract entities, relationships, and facts into a bi-temporal graph: each fact records where it came from (provenance) and when it was valid. When a fact changes, the old one is invalidated, not overwritten — so the agent can answer “what's true now?” and “what was true then?”
On each turn, fetch the token-efficient, relevant slice of memory and add it to the prompt. This keeps the context window clean and latency low, and it's the step that actually reduces hallucination.
With Zep, this is three lines and works with any agent framework — or none:
# Add the turn to memory and get assembled context back in one call
response = client.thread.add_messages(
thread_id=thread_id,
messages=[Message(name="Jane", role="user", content="I'd like to upgrade my plan...")],
return_context=True,
)
# Add business data to the same user's graph
client.graph.add(user_id=user_id, type="json",
data=json.dumps({"event": "plan_upgrade", "to": "pro", "mrr": 49}))
# Retrieve relevant context for the next turn
user_context = client.thread.get_user_context(thread_id=thread_id)In production you need more than storage: access control over what each agent can see, retention policies, audit, and millisecond retrieval across many users. That's the difference between a memory feature and memory infrastructure — a Context Lake.
An agent that remembers preferences and decisions across sessions, stays consistent when facts change, grounds answers in what's actually known about the user and business, and does it in milliseconds at scale. Zep is the Context Lake for AI agents — it manages, governs, and serves agent memory on temporal context graphs (built on the open-source Graphiti), with sub-200ms p95 retrieval and benchmark-leading accuracy on LoCoMo and LongMemEval.
Related: What is agent memory? · Agent memory vs RAG · What is a temporal knowledge graph? · Benchmark results
Add a memory layer that builds a context graph from the agent's inputs and returns relevant context per turn. With Zep it's three lines of code, framework-agnostic.
More tokens isn't memory — you still must choose what to include, and dumping full history adds noise, cost, and contradictions. Memory selects the relevant, current facts.
Yes. A good memory layer is framework-agnostic and works with LangGraph, custom agents, or none.
RAG retrieves static documents by similarity. Long-term agent memory tracks evolving, provenance-stamped facts about the user and business over time. Use both: RAG for documents, memory for state.