Zep vs. Vectorize Hindsight

A neutral look at token efficiency and governance

Vectorize Hindsight and Zep are both dedicated agent-memory systems that score near the top of LongMemEval. The decision isn't which is more accurate — it's which token-efficiency and operational profile fit your use case.

Start building

Key takeaways

The accuracy is roughly equal; the context cost isn't

Both are agent-memory systems that score at the top of LongMemEval (Hindsight reports 91.4%; Zep reports 90.2%) — a single benchmark won't decide your choice.
Hindsight's 91.4% is measured at an 8,192-token retrieval budget— about 2× the ~4,408 tokens Zep uses for 90.2% on the same benchmark. That extra context is fed to your answer LLM on every call, so at the headline numbers Hindsight costs roughly double the memory tokens per query — real money and added latency at scale.
The other production differences are enterprise governance, managed operations, proven scale, and deployment control, where Zep is purpose-built. (Both systems do temporal reasoning and provenance — see the table.)
Hindsight is MIT open source with a biomimetic memory model that you self-host and operate; its managed/hosted offering is newer. Zep ships a managed Context Lake with SOC 2 Type II, HIPAA, and BYOK/BYOC today.
Pick by need: open-source self-hosting and a human-memory-style model → Hindsight; governed agent memory at enterprise scale, at lower per-query context cost → Zep.

The distinction

An open-source self-hosted system vs. a managed Context Lake

What Vectorize Hindsight is. Hindsight (vectorize.io, GitHub) is an open-source (MIT) agent-memory system from Vectorize. It organizes memory with biomimetic structures — World (facts), Experiences (the agent's own history), and Mental Models(learned understanding formed by reflecting on raw memories) — integrates in about two lines of code, and reports 91.4% on LongMemEval. It offers self-hosting (Docker/embedded), and Vectorize is building a hosted cloud version for managed, production features.

What Zep is. Zep is a dedicated, managed agent-memory platform — the Context Lake. It builds bi-temporal context graphs (via the open-source Graphiti) in which every fact carries a validity window and provenance, so the agent reasons over what's true now vs then and can audit any answer to its source. It serves millions of graphs at sub-200ms p95, governs memory in the substrate (ABAC, retention, audit), and deploys managed, BYOK, or BYOC. Zep reports 90.2% on LongMemEval and 94.7% on LoCoMo (results); architecture in the Zep paper.

Agent Runtime

LangChain·LlamaIndex·CrewAI·Google ADK·custom

Any agent framework — or none. The Context Lake is invoked through a single SDK.

Ingestion

chat·JSON·documents·app events

Raw signal arrives from any source the agent touches.

Context Assembly

context blocks·templates·token-efficient

Relevant context is assembled on demand into token-efficient blocks.

Graphiti

Learn more

entity extraction·relationships·ontology·invalidation

Signal becomes a temporal context graph as new facts arrive and stale ones are invalidated.

Retrieval

sub-200ms·auto-optimized·provenance-linked·policy-filtered

Selects what's relevant and what adds the most information within the token budget.

Governance

ABAC·multi-tenant isolation·customer key encryption·retention policies·audit·provenance

Native to the substrate, not a layer bolted on. Every read and write is policy-gated for access and provenance; retention runs across the data lifecycle.

Context Graph Engine

entities·facts & edges·decision traces·episodes

Temporal context graph with provenance — sub-200ms retrieval at scale.

How they compare

Vectorize Hindsight vs. Zep, side by side

	Vectorize Hindsight	Zep
Model	Biomimetic (World / Experiences / Mental Models)	Bi-temporal temporal context graph (facts + provenance + validity)
LongMemEval (self-reported)	91.4%	90.2% (also 94.7% LoCoMo)
Context per query at that score	~8,192 tokens (Budget.HIGH in their runner)	~4,408 tokens — roughly half
Temporal reasoning	Yes — temporal retrieval arm + temporal indexes on fact lifespans; facts carry temporal links (capped at 20/fact)	Bi-temporal edges: “what's true now / what was true then,” automatic fact invalidation, point-in-time queries
Provenance	Yes — facts trace to the originating message; observations record their source facts	Yes — every fact traces to its source episode
Open source	Yes (MIT), self-hosted	Graphiti (the graph library) is open source
Managed / hosted	Cloud version in development	Managed cloud available today
Access control	No built-in RBAC or ABAC (no users/roles/attribute policies). Static API key, off by default; multi-tenant isolation only via a custom-coded extension. MCP tool allowlisting limits surface area, not per-user access	ABAC in the substrate — attribute-based policies govern what each agent/user can read
Audit & retention	audit_log table + /audit-logs endpoint, disabled by default; configurable audit retention. No legal hold	Audit, retention policies + legal hold in the substrate
Compliance / operations	Self-hosted OSS — you certify and operate it. No SOC 2 / HIPAA from the vendor	Managed service: SOC 2 Type II, HIPAA
Deployment	Self-host (Docker/embedded) + forthcoming cloud	Managed, BYOK, or BYOC (AWS/GCP/Azure)
Scale (public)	Single Postgres (pgvector/HNSW + BM25 + graph + temporal indexes); stateless API + worker processes scale horizontally; vector search ~10–50ms on 100K+ facts. No published multi-tenant scale figures	Millions of graphs per deployment, sub-200ms p95

A note on the benchmark comparison

Read accuracy alongside tokens, on a matched backbone

The published comparison isn't a controlled, matched-backbone head-to-head — by Vectorize's own account. Their repo states that only Hindsight's scorewas independently reproduced (Virginia Tech, The Washington Post) and that “other scores are self-reported by software vendors.” The Zep figure they cite (71.2%) is Zep's 2025-papernumber, not Zep's current 90.2% — so the comparison pairs Hindsight's reproduced Gemini-3 Pro result against Zep's older self-reported figure.

Mechanically the two systems are similar: Hindsight extracts facts with an LLM on ingest (“retain”), and on recall runs vector + BM25 + graph + temporal retrieval merged with reciprocal-rank fusion and a cross-encoder reranker, trimmed to a token limit — the same shape as Graphiti/Zep. (Its “LLM-free recall” claim applies to recall, not ingest.) Hindsight publishes a separate speed/cost benchmark but no per-query latency/token figure in the accuracy table.

Context size (from Hindsight's own benchmark code). Their LongMemEval runner defaults to an 8,192-token retrieval budget at Budget.HIGH (thinking_budget=500), with the answer model at high reasoning effort. That's roughly 1.9× the ~4,408-token context Zep reports on the same benchmark — so the 91.4% is achieved with about double the retrieved context (and a top-tier backbone). On a token-matched basis the gap narrows or reverses. (Hindsight's leaner LoCoMo quality benchmark uses a 4,096-token “low” budget.) Read accuracy alongside latency and tokens, on a matched backbone, against Zep's current numbers. (Hindsight paper: arXiv 2512.12818.)

When to choose

Pick the tool that fits the problem

Choose Vectorize Hindsight when

You want an open-source system you can self-host or embed, you like the biomimetic World / Experiences / Mental-Models model, and your priority is LongMemEval-style recall.

Open-source self-hosting or embedding is a hard requirement
You prefer the biomimetic, human-memory-style model
LongMemEval-style recall is the top priority, and the MIT license makes it easy to try

Choose Zep when

Memory has to be governed and operated at enterprise scale today, at lower per-query context cost.

Bi-temporal reasoning and provenance for auditability
Attribute-based access control and retention in the substrate
SOC 2 Type II / HIPAA and deployment control (managed, BYOK, or BYOC)
Proven performance across millions of graphs at sub-200ms p95
S&P Global Market Intelligence (451 Research) initiated coverage on Zep as a likely de facto partner in the enterprise agent stack

Get started

Add governed agent memory at half the context cost

Start building

FAQ

Frequently asked questions

Is Hindsight or Zep more accurate?

Both score near the top of LongMemEval (Hindsight 91.4%, Zep 90.2%; Zep also reports 94.7% on LoCoMo) — effectively a tie. The more useful question is at what cost: Hindsight's number is measured at an ~8,192-token retrieval budget vs Zep's ~4,408 — about double the context your answer model processes on every query. For the same accuracy, that's roughly 2× the memory-token cost and added latency at scale.

Which is cheaper to run per query?

At the published accuracy levels, Zep feeds your answer LLM about half the memory tokens Hindsight does (~4,408 vs ~8,192), so per-query token cost and latency are lower. Hindsight's recall path itself is LLM-free and fast (100–600ms); the cost difference is in how much retrieved context each system hands to the answer model.

Is Hindsight open source?

Yes, MIT-licensed. Zep's graph library, Graphiti, is also open source; Zep's managed platform and Context Graph Engine are commercial.

Which is better for enterprise?

Zep is purpose-built for governed memory at scale (ABAC, retention, audit, SOC 2 Type II, HIPAA, BYOK/BYOC, millions of graphs). Evaluate both against your governance, deployment, and scale requirements.

Does Hindsight support RBAC, ABAC, and audit logging?

Hindsight has no built-in RBAC or ABAC — no users, roles, or attribute-based access policies. Its built-in auth is a single static API key (off by default); multi-tenant isolation requires coding a custom extension, and the only finer-grained controls are MCP tool allowlisting (which tools are exposed) and a config-field permission hook. Audit logging exists but is disabled by default, and there's no legal hold or vendor compliance certification. Zep provides ABAC, retention with legal hold, and audit in the substrate, as a managed SOC 2 Type II / HIPAA service. If access control and auditability are requirements, that's a meaningful gap to weigh.

Can I self-host?

Hindsight supports self-hosting today. Zep offers managed cloud, BYOK, and BYOC (in your VPC); Graphiti can also be self-hosted standalone.