When does building your own agent memory stop working, and how do you decide before it costs a rewrite? A framework across build, buy, open source, and frameworks.
Every agent that does more than answer one-shot questions needs agent memory: what it knows across time about its users, the business, and the world it operates in. Without it, the agent starts from scratch every turn. With it, the agent can reason about the user and act on what it already knows.
Many teams build memory themselves. It starts as a chat-history buffer. Then a vector store gets added. Then a layer of glue code holds the two together. The whole thing works in the demo, so it ships. This page is about what happens after the demo: when the build path stops working, and how to decide before it costs you a rewrite. Building your own is sometimes the right call, and the framework below should hold up in a code review and a budget meeting.
The build-vs-buy math goes wrong in the same place almost every time: the problem gets under-scoped. A buffer of recent messages looks like memory, so the estimate gets written against that. The estimate is for a different, smaller problem. A production memory system has to do five things at once.
Each capability looks tractable on its own. The cost lives in the interactions. The hard part is making all of them hold together at once: temporal correctness that still respects access control while retrieval stays under 200ms. That is a much harder system than any single feature suggests.
The first version is a weekend of work: recent messages in a buffer, embeddings in a vector database, a similarity search at query time. For a prototype or a single-user tool, that is a reasonable place to stop. The trouble starts as usage grows, and it tends to arrive in a predictable order.
First, stale facts. The vector store has no concept of time, so it retrieves a fact the user has since contradicted. The agent acts on it. Now you are writing logic to detect and supersede outdated facts — the temporal problem you skipped earlier, back to collect its debt. Then context bloat. As history grows, similarity search returns more loosely related chunks. The prompt gets longer and inference costs rise; answer quality slips because the signal is buried under the noise. You start building re-ranking and summarization to compensate. Then access boundaries. A customer asks the obvious security question: can one user's data reach another user's agent? Nothing in a shared vector index prevents it, and retrofitting entity-level access control into a system built without it is expensive and risky. Then the queries the vector store cannot answer at all — “which of this user's projects share a stakeholder with the account that just churned?” is a graph question, and cosine similarity does not answer it.
The deeper cost is not any single fix. It is that you now own a system. Someone is on call for it. Someone maintains the eval harness, migrates it when the underlying models change, re-indexes when the schema moves, and debugs retrieval-quality regressions. That is a team working on memory infrastructure instead of the product you are actually trying to build. The initial build might be a few engineer-months; the maintenance is a standing tax with no end date. None of this means building is wrong. It means the build estimate has to include the second system — the one you grow into, not the buffer you start with.
Between hand-rolling memory and adopting a dedicated platform sits a real field of options. Three categories matter, and each is the right pick for some teams.
Each option is good enough for the job it was built for. None of them combines bi-temporal facts, entity-level governance on every query, ingestion across chat and business systems, and sub-200ms retrieval at scale on a neutral stack. That combination is the job of a dedicated platform.
| Requirement | Build from scratch | mem0 | Mastra | AgentCore | Graphiti | Zep |
|---|---|---|---|---|---|---|
| Ingests beyond chat (CRM, billing, docs, events) | DIY | Conversation only | Conversation only | Conversational + custom events | Yes | Yes |
| Bi-temporal facts / point-in-time | DIY | No, mutated in place | No | No, extraction-based | Yes | Yes |
| Entity-level governance (ABAC, retention, audit) | DIY | Account-level | Outside memory layer | Outside memory layer | DIY | Yes, in the substrate |
| Sub-200ms at millions of subjects | DIY | Depends on setup | Framework-bound | Managed on AWS | Depends on setup | Yes, 155–162ms |
| Neutral across model and cloud | Yes | Yes | Yes | No, AWS/Bedrock | Yes | Yes |
| Who runs the surrounding system | You | You | You | AWS | You | Zep |
Buying agent memory means the five capabilities above arrive as infrastructure rather than a roadmap. Ingestion, temporal facts, governed retrieval, and scale are someone else's maintenance burden.
Zep is agent memory at enterprise scale. The Context Lake is how we believe it should be done: a governed system of context graphs that manages, governs, and serves everything an agent needs to know. It ingests every source the agent touches and unifies them into one context graph per subject — the conversation is one input among many. Zep runs the open-source Graphiti framework inside a managed system and serves the result through its proprietary Context Graph Engine, with sub-200ms retrieval whether you have one context graph or millions. Every query is filtered by entity-level access control; facts carry validity windows and invalidate cleanly when they change; retrieval returns a prompt-ready context block that fits a token budget. SOC 2 Type II and HIPAA come with it, and it stays neutral across model and cloud — the opposite trade from a hyperscaler primitive. On the public benchmarks Zep reports 94.7% on LoCoMo at 155ms and 90.2% on LongMemEval at 162ms (benchmark results). The durable difference is architectural.
What you give up is real, and worth stating plainly. You take on a dependency and a pricing relationship. You have less control over the internals than you would with code you wrote. And adopting any memory system is integration work, not a switch you flip. Three objections usually decide this. Data control: where does the data live, and who can touch it? Deployment models include fully managed, plus bring-your-own-key and bring-your-own-cloud for keeping data in your own environment. Lock-in: the graph-construction layer, Graphiti, is open source, so the foundation is not a black box you can never leave. The “we're different” worry: customization lives in how graphs are built and how context is assembled, not in forking the storage engine.
One question decides it. Is agent memory your product, or plumbing beneath your product?
Build it yourself when memory is the thing you sell, or a core differentiator you have to own end to end. Build when your requirements are narrow and stable: a single tenant, with no cross-user governance and a modest scale that is not going to move. Build when you have the team and the real appetite to carry the maintenance for years, not the launch.
Buy when memory sits beneath the product rather than being the product. Buy when enterprise requirements show up early, because access control, audit, retention, and data residency are the capabilities that are most expensive to retrofit. Buy when time-to-production and keeping your engineers on your core product matter more than owning the internals.
Take these questions to your team before you commit:
In the cost comparison, build is engineer-months plus a recurring maintenance tax; buy is a subscription plus integration effort. The line item that dominates is rarely the license. It is the maintenance.
The right answer depends on whether memory is your product or your plumbing, and that is a question only your team can answer. If you want the open core and intend to run the surrounding system yourself, start with Graphiti. If memory is plumbing you would rather not build and operate, talk to us.
Related: What is agent memory? · What is a Context Lake? · What is a temporal knowledge graph? · How to give an AI agent long-term memory · AI agent memory guides
Build it when memory is the product you sell or a differentiator you must own end to end, your requirements are narrow and stable, and you have a team to carry the maintenance for years. Buy it when memory sits beneath the product, enterprise requirements (access control, audit, retention, residency) show up early, and time-to-production matters more than owning the internals.
Usually in a predictable order as usage grows: stale facts (the vector store has no concept of time), context bloat (similarity search returns more loosely related chunks), access boundaries (a shared index can't keep one user's data out of another's agent), and graph-shaped queries cosine similarity can't answer. The deeper cost is that you now own a system someone is on call for.
Five things at once: ingest and extract facts from every source (not just chat), track bi-temporal validity so it knows what's true now versus then, retrieve the most valuable context within a token budget, enforce governance (entity-level access control, retention, audit), and do all of it for millions of subjects with sub-200ms retrieval. The cost lives in the interactions between these, not in any single feature.
Yes. Graphiti builds a bi-temporal context graph with entity extraction, fact invalidation, and hybrid retrieval — the most capable open option. mem0 is a self-hosted vector + key-value store with a graph add-on. Either way, the system around the store — scale, multi-tenant isolation, governance, and compliance — is yours to run.
The initial build is a few engineer-months; the maintenance is a standing tax with no end date — an eval harness, model migrations, re-indexing on schema changes, and retrieval-quality debugging. In the cost comparison, build is engineer-months plus that recurring tax; buy is a subscription plus integration effort. The line item that dominates is rarely the license — it's the maintenance.
With Zep, the graph-construction layer is the open-source Graphiti, so the foundation isn't a black box you can never leave, and it stays neutral across model and cloud. Deployment models include fully managed plus bring-your-own-key and bring-your-own-cloud for keeping data in your own environment — the opposite trade from a hyperscaler primitive bound to one vendor's stack.