Nandeshou

Why AI Memory Is Broken — And What It Actually Takes to Fix It

By Tony Richards

CEO & Chief Architect, Nandeshou

If you have ever worked seriously with AI tools, you have probably tried to build your own memory system.

Maybe it was a folder of markdown files you fed into context. Maybe it was a vector database with a retrieval layer on top. Maybe it was an elaborate note-taking system that you kept perfectly synchronized with your AI workflow for about three weeks before it quietly fell apart.

You are not bad at this. The tools are.

The Wall Everyone Hits

The standard approach to AI memory is what I call the Pull model. You ask a question. The system searches for relevant context. It appends whatever it finds to your prompt. The AI responds.

This works well enough for simple lookups. It falls apart for everything else.

The problem is fundamental. A Pull model is reactive — it only knows to retrieve context when you ask for something that triggers a retrieval. If you change subjects without using the right keywords, the system misses the switch. If the relevant context is buried in the middle of a long conversation, the AI loses track of it even when it is technically still in the window. If you are working across multiple domains — a software project, a novel, a business strategy — the system has no way to know which version of reality you are currently operating in, so it blends them together in ways that range from unhelpful to actively wrong.

The result is an AI that feels brilliant for twenty minutes and then starts confidently making things up — not because the model is bad, but because the memory architecture failed it.

Context Is Not Retrieval. It Is State.

Here is the insight that changed how we think about this problem.

The goal of an AI memory system is not to find relevant information when asked. It is to maintain an accurate model of what is true, what is in progress, and what matters — and to keep that model current as the conversation evolves.

That is not a search problem. It is a state management problem.

The difference matters enormously in practice. A search system asks: “What chunks of text are similar to this query?” A state management system asks: “What does the AI need to know right now to be genuinely useful?” The first question is easy to answer mechanically. The second requires the system to understand what is happening in the conversation, not just what was said.

This is the foundation of how Kura — the memory system inside Omni Core — works. Instead of treating your conversation history as a log of text to search, it treats it as a living state that the system actively maintains. We call this the Push model: rather than waiting for you to trigger a retrieval, the system continuously tracks where you are and ensures the right context is always present before you need it.

Memory That Knows What It Is

Not all knowledge behaves the same way. A fact is different from a plan. A plan is different from a timeline. A timeline is different from a persona. Treating them identically — storing everything as text chunks in a flat vector store — means losing the structure that makes each type of knowledge useful.

Kura maintains different kinds of memory for different kinds of thought.

Facts are stored as stable knowledge that updates only when something is genuinely corrected. Plans are stored as state machines — tracking current status, blockers, decisions made, and what comes next. Timelines are stored non-linearly, because narrative time and calendar time are different things and conflating them breaks both. Personas define not what the AI knows, but how it expresses what it knows — the filter, not the content.

When you are working on a project and the conversation shifts from architecture decisions to timeline planning, Kura does not perform a new search. It switches cognitive modes — bringing forward the right structure for the right kind of thinking. The AI does not just remember what you said. It remembers what kind of thing you were doing, and it picks up in that mode.

The Dragon Problem

Here is the concrete version of why this matters.

Suppose you use the same AI assistant for your work life and your creative writing. In one conversation you are planning a product roadmap. In another you are writing a fantasy novel with elaborate lore, fictional characters, and dragons.

In a naive memory system, these contexts bleed into each other. The AI that helped you build a siege scene last Tuesday brings that energy to your quarterly planning on Wednesday. Details leak. Tones mix. The fictional version of a colleague’s name surfaces at the wrong moment.

Kura maintains hard isolation between contexts. Your work branch and your fiction branch are structurally separate — not just tagged differently, but architecturally walled off from each other. The AI operating in your work context is structurally blind to your fiction context. It will never hallucinate a dragon when you ask for a flight to London, because the dragon does not exist in the reality it is currently operating in.

This sounds like an edge case. It is not. Anyone who uses AI seriously for more than one domain of their life has already felt this problem. Most have just accepted it as an unavoidable limitation of the tools.

Memory That Grows Without Breaking

There is one more problem that every serious memory system eventually confronts: scale.

Conversations accumulate. Topics expand. A discussion that started as a single thread about authentication architecture eventually contains hundreds of exchanges covering dozens of decisions, reversals, and refinements. At some point the context becomes too large and too tangled to reason about coherently — and the AI starts to struggle.

Kura handles this through what we call topic mitosis. When a topic grows beyond the point where it supports clear reasoning, the system splits it — preserving the parent as a summary and spawning a focused child that carries forward only what is relevant to the current thread. The history is not lost. The working context is kept sharp.

This is not a workaround for a limitation. It is how good memory actually works. You do not carry every detail of every conversation you have ever had into every new conversation. You carry a distilled model of what matters, updated continuously, organized by domain and purpose. Kura does the same thing — automatically, invisibly, without requiring you to manage it.

What This Looks Like in Practice

You open Omni Core and start working. You do not configure a memory system. You do not tag your messages or maintain a separate knowledge base. You do not remind the AI what you discussed last week.

The AI already knows. Not because it searched for something similar to what you just said, but because it has been maintaining a structured model of your work, your projects, your decisions, and your context — and that model is current, organized, and immediately available the moment it becomes relevant.

This is what memory is supposed to feel like. Not a feature you configure. Not a retrieval layer you tune. Just an AI that knows what you are working on, where you left off, and what you need — because the architecture was built to make that possible.

We spent a long time building it. We think you will notice immediately that something is different.

Sign In