Detailed comparison

Names, sources, and where others beat us.

Evaluating memory tools side by side? The landing page keeps its comparison plain and vendor-free — this page is the detailed version. Below: the public benchmark context and the full capability matrix, with names named, sources cited, and the rows where competitors lead left honestly unclaimed.

How we compare

Everyone benchmarks recall. We compete one layer up.

On the public long-memory benchmarks the leaders sit inside a band — and we'll publish our own measured number rather than borrow one. The row that actually decides whether your agent is trustworthy is the one no one else fills.

Supermemory

~85%

self-reported

Zep

~64%

temporal graph

Mem0

~49%

vector

UltraMemory

measuring

our own harness

LongMemEval (GPT-4o judge). Competitor figures are self-reported or from 2026 third-party write-ups (vectorize.io, agentmarketcap, supermemory.ai); long-memory benchmarks are highly methodology-dependent and disputed between vendors.

Capability	UltraMemory	Mem0	Zep	Supermemory
Finds the right memory by meaning	✓	✓	✓	✓
Keeps facts straight as they change over time	✓	—	✓	◐
Says “I’m not sure” instead of guessing	✓	—	—	◐
Learns which approaches work — and keeps them	✓	—	—	—
Gets sharper on its own, every night	✓	◐	◐	◐
Plugs into any agent (open standard)	✓	✓	◐	✓

✓ offered · ◐ partial or different mechanism · — not offered. Reflects publicly documented capabilities as of mid-2026; vendors ship fast — tell us if something's out of date. Zep leads on temporal modeling; we don't claim otherwise.

Want to see how updates and supersedes actually behave? How updates work (bitemporal supersede) →