Agents have tools, and tools have side effects. Those side effects can touch real systems: databases, file systems, browsers, APIs, payment systems, internal networks, and customer data.

That changes the risk profile. If a chatbot gives a bad answer, the damage may be limited to a bad response. If an agent makes the wrong tool call, it can delete data, leak private information, send money, or change production state.

These risks are well understood now. Prompt injection can push an agent toward actions the developer never intended. A malicious document in a RAG pipeline can tell the agent to leak data through a tool call. An agent with broad permissions can reach administrative APIs that were never meant to be part of the workflow.

The attack surface of an agent is the combination of every tool it can call, every input it reads, and every system or network endpoint it can reach.

The core challenge is simple: you are giving a probabilistic model the ability to act. Prompts help guide behavior, but prompts are not a security boundary. Real agent security comes from controls outside the model: permissions, isolation, network limits, monitoring, and a way to stop the run.

The Agent Threat Model

Premium Content

This content is for premium members only.

Agent Sandboxing and Security

The Agent Threat Model

Premium Content

Get Premium