Nandeshou

The Guardrail Illusion: Why Asking AI to Police Itself Is Not a Security Model

By Tony Richards

CEO & Chief Architect, Nandeshou

In my last two posts, I argued that autonomous AI agents without a verifiable application boundary are a hazard, and that the answer is an architecture built like an organism — with a mind to reason, a body to act, a forge to adapt, and a nervous system to hold it all together.

Today I want to talk about the part of that organism most companies are getting catastrophically wrong: security.

The Question Nobody Is Asking

When an AI agent in your enterprise wants to send an email, query a database, call an external API, or delete a file, something has to decide whether it should be allowed to do that. In most agentic systems being built and sold today, that something is another AI.

The agent proposes an action. A guardrail model evaluates the proposal. The guardrail model decides whether the action is safe, compliant, and within scope. If it says yes, the action proceeds.

Read that again slowly.

We have built enterprise security systems where the last line of defense is asking a language model — a system that works by predicting plausible-sounding responses — whether something is dangerous. And then we trust the answer.

This is not a security model. It is the appearance of one.

The Math That Should End the Conversation

Independent research on LLM-based guardrail systems consistently shows accuracy rates in the range of 80 to 90 percent. The vendors selling these systems consider that a success. In many applications, it genuinely is.

Enterprise security is not one of those applications.

If your agentic system handles a thousand actions per day — a modest number for any serious operation — an 85% accurate guardrail allows 150 unsafe actions through every single day. Not occasionally. Not in edge cases. Every day, by design, because the math demands it.

A security model that fails 15% of the time is not a security model with an acceptable error rate. It is an open door with a sign on it.

And 90% accuracy is not the ceiling. It is the optimistic case. Guardrail models degrade under adversarial conditions, novel prompts, and edge cases that were not in the training data. The closer you look, the worse it gets.

What Security Actually Requires

The fundamental error is treating AI security as a content moderation problem — something you solve by getting better and better at evaluating text. It is not. It is an access control problem, and access control has a known, proven solution that has existed for decades.

You define what each entity is allowed to do. You enforce those permissions at the infrastructure level. You do not ask anyone — human or AI — whether the action seems reasonable. You check whether the action is permitted. If it is not, it does not happen. Full stop.

At Nandeshou, every AI agent operates within an explicit permission boundary. Not a prompt that says “only access data relevant to your task.” An actual, enforced list of what that agent can touch, what it can read, what it can write, and what it can call. Defined by humans. Enforced by the system. Not negotiable by a clever prompt.

A customer service agent can read order history and update ticket status. It cannot query the payroll database, because that permission does not exist for it. There is no prompt that changes this. There is no clever jailbreak that works around it. The capability simply is not there.

The Human in the Loop Is Not Optional

Explicit permissions handle the clear cases. But enterprise operations also produce ambiguous ones — actions that are technically within scope but consequential enough that a human should weigh in before they proceed.

Our architecture handles this with a third category that most systems never consider: actions that require human approval before execution. Not logging the action after the fact. No

Sign In

Check your email

By Tony Richards

The Question Nobody Is Asking

The Math That Should End the Conversation

What Security Actually Requires

The Human in the Loop Is Not Optional