AI 编程AI Agent

Claude Code's Defense in Depth: How It Prevents You from Pretending to Be It

The leaked Claude Code TypeScript source code reveals a defense-in-depth system spanning at least 8 layers. The goal is clear: verify that every API request genuinely originates from an authentic client, degrade gracefully when verification is impossible, and retain server-side validation as a last resort even in the worst case (complete reverse engineering of the code).

This article analyzes 6 of the most engineering-insightful layers, ordered from compile-time to runtime. For each layer, I explain what it defends against, how it works, and what trade-offs the engineers made in its design.

Layer 1: Compile-Time Dead Code Elimination

The foundation of all defenses is a seemingly ordinary build option. Claude Code uses Bun as its runtime and bundler, leveraging Bun’s compile-time constant folding capability.

The core mechanism is the process.env.USER_TYPE variable. It is injected at build time via --define, and Bun treats it as a compile-time constant during bundling. All branches of the form process.env.USER_TYPE === 'ant' evaluate to false in external builds, and the entire branch—along with all referenced code—is removed by tree-shaking.

// utils/model/antModels.ts
export function getAntModels(): AntModel[] {
  if (process.env.USER_TYPE !== 'ant') {
    return []
  }
  return getAntModelOverrideConfig()?.antModels ?? []
}

In the external binary, this code is optimized to directly return an empty array. The internal model registry (containing full configurations with codenames like capybara and tengu) physically disappears from the binary.

The scope is notable: USER_TYPE === 'ant' appears 357 times across 165 files. This means the functional differences between internal and external builds are far greater than one might expect. The internal build includes full debugging tools, undercover mode, a bash classifier, and connector text summarization—none of which are visible in the external build.

Bun also provides another compile-time primitive: feature() from bun:bundle. Flags like feature('ANTI_DISTILLATION_CC'), feature('NATIVE_CLIENT_ATTESTATION'), and feature('CONNECTOR_TEXT') are replaced with boolean constants at bundle time, controlling more fine-grained feature switches.

// constants/system.ts
const cch = feature('NATIVE_CLIENT_ATTESTATION') ? ' cch=00000;' : ''

The CI pipeline has one final check: scripts/excluded-strings.txt lists all internal model codenames and sensitive strings. After the build, the external binary is scanned, and any residual matches cause the build to fail. Source code comments repeatedly remind developers:

// @[MODEL LAUNCH]: Add the codename to scripts/excluded-strings.txt
// to prevent it from leaking to external builds.

The design philosophy of this layer is clear: compile-time elimination is far more reliable than runtime checks. Runtime checks can be patched; compile-time elimination means the information never existed at the binary level in the first place.

An intuitive question might arise here: if the code is eliminated, what happens to the features that depended on it? The answer is: everything eliminated at compile time consists entirely of Anthropic-internal development tools and debugging facilities, including internal model registries, undercover mode, and internal debug entry points. These are only useful to Anthropic’s engineers—external users never see or use them. All features valuable to external users—API calls, tool execution, file operations, conversation management—are fully preserved in the external build. As for anti-distillation, its actual enforcement happens server-side, where the server independently makes decisions through DRM attestation and GrowthBook remote switches; client-side code only provides auxiliary signals.

Layer 2: Zig-Layer DRM Attestation

This is the most sophisticated layer in the entire system and the core mechanism for countering client impersonation. Understanding it requires some background.

Bun is a JavaScript/TypeScript runtime, competing in the same space as Node.js, but with an integrated bundler and package manager. For Claude Code, Bun’s most critical capability is bun build --compile: it can compile an entire JS/TS project into a standalone native binary (Mach-O on macOS, ELF on Linux) that runs completely independently of Node.js. This is what Claude Code distributes to users.

Zig is a systems-level programming language, positioned similarly to C and Rust. Bun itself is written in Zig, which means Bun’s core infrastructure—including the HTTP networking stack—is native Zig code running outside the JavaScript engine’s memory space. This fact is the prerequisite for this defense layer: when an HTTP request has been serialized at the JS layer and is ready to be sent over the network, the Zig code can search and replace within the request body’s byte stream before transmission, and any interception mechanisms at the JS layer (overriding fetch, installing interceptors, monkey-patching http.request) are completely unaware of this. Note that the Zig language itself is open source, but Anthropic’s attestation implementation (Attestation.zig) resides in Anthropic’s private Bun fork (bun-anthropic) and cannot be found in the open-source version of Bun.

With this background, let’s look at the specific mechanism. Claude Code’s HTTP requests embed an attestation token whose generation happens entirely below the JavaScript layer. Specifically, the JS layer inserts a fixed-length placeholder cch=00000 into the request body, and then Bun’s native HTTP stack locates this placeholder in the serialized request body byte stream before actual transmission and overwrites it with an attestation hash computed in Zig.

// constants/system.ts (lines 64-91)
/**
 * When NATIVE_CLIENT_ATTESTATION is enabled, includes a `cch=00000` placeholder.
 * Before the request is sent, Bun's native HTTP stack finds this placeholder
 * in the request body and overwrites the zeros with a computed hash. The
 * server verifies this token to confirm the request came from a real Claude
 * Code client. See bun-anthropic/src/http/Attestation.zig for implementation.
 *
 * We use a placeholder (instead of injecting from Zig) because same-length
 * replacement avoids Content-Length changes and buffer reallocation.
 */
const cch = feature('NATIVE_CLIENT_ATTESTATION') ? ' cch=00000;' : ''

Several engineering details are worth expanding on.

Same-length replacement. The placeholder 00000 and the final hash are the same length, so the Content-Length remains unchanged after overwriting, avoiding header recalculation and buffer reallocation. This is a classic performance-first design: injecting a new field from the Zig layer would require re-serializing the entire JSON body.

Embedding location. The token is embedded in the system prompt’s JSON body rather than as an HTTP header. This means standard HTTP proxies, middleware, and API gateways cannot observe or tamper with it at the transport layer. The full format of the attribution header is:

x-anthropic-billing-header: cc_version=1.2.3.abc; cc_entrypoint=cli; cch=fa318;

Execution level. As mentioned above, the Zig code runs outside the JS engine’s memory space, making JS-layer interception mechanisms ineffective against it. To bypass this layer, an attacker would need to directly modify Bun’s compiled artifacts.

The engineering cost of this layer is also clear: Claude Code is thereby locked into the Bun runtime. Other JS runtimes (Node.js, Deno) cannot provide equivalent native HTTP stack injection capabilities. This is a deliberate trade-off of vendor lock-in for security.

Layer 3: Message Fingerprinting

Attestation proves the client’s identity, but message fingerprinting addresses a different problem: whether the request’s content has been tampered with by a man-in-the-middle.

// utils/fingerprint.ts
export const FINGERPRINT_SALT = '59cf53e54c78'

export function computeFingerprint(
  messageText: string,
  version: string,
): string {
  const indices = [4, 7, 20]
  const chars = indices.map(i => messageText[i] || '0').join('')
  const fingerprintInput = `${FINGERPRINT_SALT}${chars}${version}`
  const hash = createHash('sha256').update(fingerprintInput).digest('hex')
  return hash.slice(0, 3)
}

The algorithm is SHA256(hardcoded salt + characters at positions 4, 7, 20 of the first user message + version number), truncated to the first 3 hexadecimal characters. This 3-character fingerprint is appended to the version number in the attribution header, resulting in a format like cc_version=1.2.3.abc.

The design intent is clear: the server can use the same algorithm to recompute the fingerprint from the message content and version number, then compare it against the value in the header. If a proxy layer in between has modified the message content (e.g., injecting a system prompt or replacing user messages), the fingerprint will mismatch.

The collision space of 3 characters (12 bits) is small—only 4,096 possibilities. This indicates its design goal is statistical detection rather than cryptographic-grade tamper resistance. The server can perform statistical analysis over large volumes of requests: if a particular API key shows an abnormally high fingerprint mismatch rate, it suggests the requests may have been modified by a proxy.

One comment in the code is worth noting:

// IMPORTANT: Do not change this method without careful coordination with
// 1P and 3P (Bedrock, Vertex, Azure) APIs.

The fingerprint algorithm needs to be synchronized across multiple backends, and any modification requires cross-team coordination. This also explains why such a simple algorithm was chosen: simplicity means it’s easy to keep consistent across different languages and platforms.

Layer 4: Anti-Distillation System

The first three layers are about identity verification and integrity checking. The anti-distillation layer addresses a different problem: even if the other party is genuinely using an authentic client, it must prevent them from using Claude’s outputs as training data to distill a competing model.

Anti-distillation has two independent mechanisms, operating on inputs and outputs respectively.

Input side: fake_tools injection. When certain conditions are met, the client adds an anti_distillation: ['fake_tools'] field to the request body, telling the server to inject fake tool definitions.

// services/api/claude.ts (lines 301-313)
if (
  feature('ANTI_DISTILLATION_CC')
    ? process.env.CLAUDE_CODE_ENTRYPOINT === 'cli' &&
      shouldIncludeFirstPartyOnlyBetas() &&
      getFeatureValue_CACHED_MAY_BE_STALE(
        'tengu_anti_distill_fake_tool_injection',
        false,
      )
    : false
) {
  result.anti_distillation = ['fake_tools']
}

Upon receiving this signal, the server mixes plausible but meaningless fake tool definitions into the tool list. If someone records Claude Code’s request-response pairs to build training data, these fake tool definitions will poison the training set. The model will learn a tool-use paradigm that includes descriptions of nonexistent capabilities, and the distilled model will attempt to call tools that simply don’t exist.

The trigger conditions have triple gating: the compile-time feature flag ANTI_DISTILLATION_CC, the entry point must be cli (excluding SDK calls), and the GrowthBook remote switch tengu_anti_distill_fake_tool_injection. Triple gating means Anthropic can precisely control this feature’s on/off state server-side without releasing a new client version.

Output side: thinking signature protection. Claude’s thinking blocks and connector text carry encrypted signatures that are bound to the API key and model that generated them.

// utils/messages.ts (lines 5060-5099)
/**
 * Strip signature-bearing blocks (thinking, redacted_thinking, connector_text)
 * from all assistant messages. Their signatures are bound to the API key that
 * generated them; after a credential change (e.g. /login) they're invalid and
 * the API rejects them with a 400.
 */
export function stripSignatureBlocks(messages: Message[]): Message[] {
  // ...filters out thinking blocks and connector text blocks
}

The core constraint of signatures is cross-boundary non-reusability, not hiding the thinking content itself. During normal usage (same API key, same model, continuous conversation), signatures are completely transparent—thinking blocks are output and returned normally, and users don’t perceive their existence.

Signatures prevent three specific scenarios. The first is replay after switching API keys: when a user executes /login to switch accounts, thinking blocks from previous conversations carry signatures from the old key, which fail verification with the new key, causing the API to return 400. Claude Code handles this by calling stripSignatureBlocks() to strip all signature-bearing blocks after a credential change, discarding reasoning content from old sessions. The second is cross-model fallback: under high load, the system automatically falls back from Capybara to Opus, and thinking blocks generated by Capybara fail signature verification when sent to Opus, also triggering a 400. This is one of the reasons Capybara adaptation logic surfaced in the leaked source code. The third is distillation data collection: if someone systematically records request-response pairs to build a training dataset, the signatures in thinking blocks are bound to the API key that generated them. The signatures themselves are plaintext and the thinking content is readable, but the signatures create a tracking mechanism through which Anthropic can trace the data’s origin.

It’s important to distinguish the redact-thinking beta header, which is a separate opt-in mechanism addressing a different concern. It tells the API to remove thinking content before returning the response, used in specific scenarios (ISP model integration, non-interactive sessions, and other cases where exposing the reasoning process is unnecessary). Claude Code does not enable this header by default—thinking is output normally. The API documentation guarantees that thinking output and this header are compatible: the default behavior is to output thinking, and redact-thinking is an additional option that the caller must actively enable.

These two mechanisms are only enabled for the 1P CLI; Bedrock, Vertex, and SDK calls are unaffected. The reason is that 3P users have already obtained usage authorization through cloud service provider authentication systems, and distillation risk primarily comes from direct API access.

fake_tools injection and thinking signature protection target precisely this high-value input: tool definitions shape the model’s behavioral space, and thinking blocks record the model’s reasoning path—together, they constitute the training signal that distillers most want to capture.

Layer 5: Anti-Debugging and Token Protection

The preceding layers all address external threats. The anti-debugging layer defends against a more subtle scenario: a prompt injection attacker causing the model to execute shell commands (such as gdb -p $PPID) to scrape API tokens from process memory.

// upstreamproxy/upstreamproxy.ts (lines 220-252)
/**
 * prctl(PR_SET_DUMPABLE, 0) via libc FFI. Blocks same-UID ptrace of this
 * process, so a prompt-injected `gdb -p $PPID` can't scrape the token from
 * the heap. Linux-only; silently no-ops elsewhere.
 */
function setNonDumpable(): void {
  if (process.platform !== 'linux' || typeof Bun === 'undefined') return
  try {
    const ffi = require('bun:ffi') as typeof import('bun:ffi')
    const lib = ffi.dlopen('libc.so.6', {
      prctl: {
        args: ['int', 'u64', 'u64', 'u64', 'u64'],
        returns: 'int',
      },
    } as const)
    const PR_SET_DUMPABLE = 4
    const rc = lib.symbols.prctl(PR_SET_DUMPABLE, 0n, 0n, 0n, 0n)
    // ...
  }
}

Through Bun’s FFI, this calls the Linux prctl system call to mark the process as non-dumpable. The effect is: even if an attacker gains execution privileges under the same UID, they cannot attach to the Claude Code process via ptrace to read heap memory.

The token lifecycle management is equally noteworthy. The session token is read from the file /run/ccr/session_token, and the file is immediately unlinked afterward. The token exists only in heap memory, leaving no trace on the filesystem. Moreover, the timing of the unlink is carefully designed: it must happen only after the relay has successfully started. If CA download or port listening fails, the token file remains on disk so the supervisor can retry.

// Only unlink after the listener is up: if CA download or listen()
// fails, a supervisor restart can retry with the token still on disk.
await unlink(tokenPath).catch(() => {
  logForDebugging('[upstreamproxy] token file unlink failed', {
    level: 'warn',
  })
})

This defense layer is only active in CCR (Claude Code Remote, i.e., cloud-containerized runtime) mode; the local CLI is unaffected. The design follows the fail-open principle: if any step fails, it only logs a warning and disables the proxy—it never lets a security mechanism’s failure block normal usage.

Layer 6: Gateway Detection

Finally, let’s look at an infrastructure-facing defense layer. Claude Code inspects HTTP response headers after every API response to identify whether the request passed through an AI proxy gateway.

// services/api/logging.ts (lines 66-105)
const GATEWAY_FINGERPRINTS: Partial<Record<KnownGateway, { prefixes: string[] }>> = {
  litellm:                 { prefixes: ['x-litellm-'] },
  helicone:                { prefixes: ['helicone-'] },
  portkey:                 { prefixes: ['x-portkey-'] },
  'cloudflare-ai-gateway': { prefixes: ['cf-aig-'] },
  kong:                    { prefixes: ['x-kong-'] },
  braintrust:              { prefixes: ['x-bt-'] },
}

const GATEWAY_HOST_SUFFIXES: Partial<Record<KnownGateway, string[]>> = {
  databricks: [
    '.cloud.databricks.com',
    '.azuredatabricks.net',
    '.gcp.databricks.com',
  ],
}

There are two detection methods: for self-hosted gateways (LiteLLM, Helicone, Portkey, Kong, Braintrust), it checks whether response headers contain characteristic prefixes; for SaaS gateways (Databricks), it checks the domain suffix of ANTHROPIC_BASE_URL.

Detection results are recorded in telemetry logs. The source code shows no logic for immediately blocking requests upon gateway detection—it is currently purely monitoring and data collection.

Understanding this layer requires considering the role gateways play in the distillation pipeline. AI proxy gateways (LiteLLM, Helicone, Portkey, etc.) form the infrastructure layer for distillation. Through these gateways, attackers can record request-response pairs at scale, manage usage allocation across multiple API keys, and inject, modify, or filter content at the middleware layer. In other words, gateways are the critical stepping stone from scattered API calls to a systematized distillation pipeline.

Claude Code’s purpose in detecting gateways is to gather intelligence, not to immediately block. Through telemetry data, Anthropic can understand how much traffic passes through proxy gateways, assess the scale and distribution of distillation risk, and build a data foundation for subsequent targeted strategies. This embodies the previously mentioned principle of client-side tagging, server-side decision-making: the client only identifies and reports; decision authority remains on the server. The information asymmetry itself is part of the defense—users may think they are transparently using an intermediary proxy, but Claude Code has already identified this fact and reported it to the server.

Systemic Observations

Looking back at these six layers of defense, several design principles run throughout.

Layered depth, each layer independent. Compile-time elimination, Zig attestation, message fingerprinting, anti-distillation, anti-debugging, and gateway detection each work independently. Breaching one layer (e.g., reverse-engineering the fingerprint algorithm) does not compromise the others.

Client-side tagging, server-side decision-making. All client-side mechanisms attach information to requests (attestation, fingerprint, gateway markers), and the final trust determination is made server-side. This ensures that even if the client is fully reverse-engineered, the server can still update its verification logic. The GrowthBook feature flag system plays a critical role here: switches like tengu_attribution_header, tengu_anti_distill_fake_tool_injection, and tengu-off-switch allow the server to change client behavior at any time without requiring a release.

Fail-open principle. Security mechanism failures must never impact normal functionality. Every step in the upstream proxy has try/catch blocks and degradation paths, the prctl call silently no-ops on non-Linux platforms, and the attribution header can be disabled via environment variables or GrowthBook. This is dictated by Claude Code’s positioning as a productivity tool (not a security product): if an anti-distillation mechanism causes a user’s programming experience to suffer, that mechanism is producing net negative value.

Trading lock-in for security. Zig attestation ties Claude Code to Bun, prctl ties anti-debugging to Linux, and fake_tools is only enabled for the 1P CLI. Each defense layer has explicit scope boundaries and platform dependencies. The engineering team did not pursue a universal, cross-platform security solution but instead chose the most effective specific solution for each concrete scenario. This is pragmatic engineering judgment.

One final observation: the true goal of this system may be adversary selection rather than absolute defense. By continuously raising the technical bar for impersonation and distillation, the cost for attackers keeps increasing. Any single layer can potentially be bypassed, but the cost of bypassing all layers is high enough to send most potential attackers looking for other targets. The Claude Code source leak itself is a stress test of this system: even with the complete source code in hand, server-side attestation verification and GrowthBook switches still constitute the final line of defense.