The runtime underneath Zep

The Context Graph Engine

Most graph databases are built to hold one large graph. The Context Lake workload is the opposite: millions of smaller graphs, mostly cold, all temporal, all governed. Different workload, different runtime.

Read the docs

Architecture

Built for the workload, not retrofitted.

The Engine is shaped for the Context Lake workload from the data model up. Millions of graphs. Sparse activity per graph. High aggregate throughput. Governance applied independently to each.

Every architectural decision — tiered storage, in-memory adjacency, native ABAC, bi-temporal edges — follows from the workload.

API

Write

mutate

Read

search (BFS, PageRank, Semantic, BM25)·patterns·list / get

Typed surface for mutations, traversals, graph algorithms, and lookups.

From API · writes

WAL

append-only·replicated·ordered

Mutations durably appended to the write-ahead log before acknowledgment.

From API · reads

Graph Primitives

Algorithms

BFS·PageRank·pattern matching·path analysis·temporal weighting

Structure

adjacency & CSR matrices·vector index·BM25 index

Graph algorithms, traversals, and pattern matching run over compact in-memory structures.

Governance

ABAC·multi-tenant isolation·customer key encryption·retention policies·audit·provenance

Native to the substrate, not a layer bolted on. Every read and write is policy-gated for access and provenance; retention runs across the data lifecycle.

Tiered Data Layer

HotRAMmicroseconds

WarmLocal NVMemicroseconds

ColdObject Storemilliseconds

Hot graphs serve at memory speed. Inactive graphs are evicted to NVMe and object-store, and rehydrated in milliseconds.

Durable Storage

WAL·content·metadata·graphs·indexes

Snapshots and WAL persist on a multi-AZ, highly durable object and document store.

Context Graph Engine — request path from typed API down to durable storage

Scale

Scaling in constant time.

Retrieval latency holds near-constant as the graph count grows. Current production runs sustain thousands of mutations and queries per second across millions of graphs — p50 latency unchanged from a thousand graphs to a million.

Retrieval latency vs. graph countproduction · last 30 days

p50 retrievalp95 retrieval

Tiered storage

Storage that tracks activity.

Three tiers. Hot graphs live in RAM at microsecond latency. Warm graphs sit on local NVMe. Cold graphs rest on object storage and rehydrate in milliseconds.

Cost tracks active graphs, not total graphs. A deployment with one million graphs and one percent of them hot pays for one percent of the memory.

HotRAMmicroseconds · residentAdjacency & CSR · vector & BM25 indexes

WarmLocal NVMemicroseconds · pagedRecent inactives, ready to rehydrate

ColdObject Storemilliseconds · rehydrateLong-tail graphs · cost-optimized

§ 05 · Retrieval

Unified retrieval.

Vector similarity, BM25, graph traversal, and pattern matching run over the same data. One query, one ranked answer — no separate retrieval stack to stitch together.

§ 06 · Primitives

Graph primitives in memory.

Hot graphs are held as adjacency lists and CSR matrices. Cache-friendly, deterministic layout, ready for matrix operations.

BFS, PageRank, pattern matching, path analysis, and temporal weighting run at microsecond latencies over those structures.

§ 07 · Governance & temporality

Governance and temporality, native.

ABAC, multi-tenant isolation, customer key encryption, retention policies, audit, and provenance are properties of the substrate, not a layer above it. Every read and write is policy-gated.

Every edge carries four timestamps. Point-in-time queries, automatic invalidation, and temporal weighting follow from the data model.

§ 08 · Durability

Durable by default.

Mutations are appended to a replicated, ordered write-ahead log before acknowledgment. Snapshots, indexes, and content persist on multi-AZ durable storage. Point-in-time recovery across the lifecycle.

Get started

Talk to the team building it.

Read the docs