Detailed comparison
Names, sources, and where others beat us.
Evaluating memory tools side by side? The landing page keeps its comparison plain and vendor-free — this page is the detailed version. Below: the public benchmark context and the full capability matrix, with names named, sources cited, and the rows where competitors lead left honestly unclaimed.
How we compare
Everyone benchmarks recall. We compete one layer up.
On the public long-memory benchmarks the leaders sit inside a band — and we'll publish our own measured number rather than borrow one. The row that actually decides whether your agent is trustworthy is the one no one else fills.
LongMemEval (GPT-4o judge). Competitor figures are self-reported or from 2026 third-party write-ups (vectorize.io, agentmarketcap, supermemory.ai); long-memory benchmarks are highly methodology-dependent and disputed between vendors.
| Capability | UltraMemory | Mem0 | Zep | Supermemory |
|---|---|---|---|---|
| Finds the right memory by meaning | ✓ | ✓ | ✓ | ✓ |
| Keeps facts straight as they change over time | ✓ | — | ✓ | ◐ |
| Says “I’m not sure” instead of guessing | ✓ | — | — | ◐ |
| Learns which approaches work — and keeps them | ✓ | — | — | — |
| Gets sharper on its own, every night | ✓ | ◐ | ◐ | ◐ |
| Plugs into any agent (open standard) | ✓ | ✓ | ◐ | ✓ |
✓ offered · ◐ partial or different mechanism · — not offered. Reflects publicly documented capabilities as of mid-2026; vendors ship fast — tell us if something's out of date. Zep leads on temporal modeling; we don't claim otherwise.
Want to see how updates and supersedes actually behave? How updates work (bitemporal supersede) →