AI 编程AI Agent

The Real Cost of AI Engineering: Model Integration Friction Exposed by the Claude Code Source Leak

After the Claude Code source leak, most discussion focused on security vulnerabilities and privacy concerns. But the truly valuable information in these 512,000 lines of TypeScript source code is what it reveals about a core dilemma in AI engineering: integrating a new model into a mature agentic system costs far more than outsiders imagine.

This article extracts engineering details from the leaked source code to reconstruct the concrete forms of these costs.

The leaked source code contains a standalone engineering subsystem: anti-distillation, designed to prevent competitors from training their own models using Claude’s API outputs. This subsystem must be understood in the context of early 2026. During this period, Anthropic launched legal proceedings against open-source developers who used Claude model outputs for distillation training. Legal and technical measures advanced in parallel, reflecting a single commercial judgment: model capabilities are a core asset that requires protection at the protocol, legal, and technical layers simultaneously.

The first layer is fake tools injection. In the getExtraBodyParams function in claude.ts:

// Anti-distillation: send fake_tools opt-in for 1P CLI only
if (
  feature('ANTI_DISTILLATION_CC')
    ? process.env.CLAUDE_CODE_ENTRYPOINT === 'cli' &&
      shouldIncludeFirstPartyOnlyBetas() &&
      getFeatureValue_CACHED_MAY_BE_STALE(
        'tengu_anti_distill_fake_tool_injection',
        false,
      )
    : false
) {
  result.anti_distillation = ['fake_tools']
}

This code instructs the API backend to inject fake tool calls into responses. Systems attempting to extract training data from API outputs will ingest this fake data, poisoning their training sets. The feature is dual-gated through a compile-time feature flag (ANTI_DISTILLATION_CC) and a runtime GrowthBook remote configuration (tengu_anti_distill_fake_tool_injection), allowing on-demand toggling.

The second layer is connector text summarization. A detailed comment in betas.ts:

// POC: server-side connector-text summarization (anti-distillation). The
// API buffers assistant text between tool calls, summarizes it, and returns
// the summary with a signature so the original can be restored on subsequent
// turns — same mechanism as thinking blocks. Ant-only while we measure
// TTFT/TTLT/capacity.

The API server-side buffers and replaces model-generated text between tool calls with summaries, accompanied by cryptographic signatures. The client sends back the signed summary in subsequent requests, and the server restores the original text. External observers only see the summary, losing the details of the original reasoning process. This shares the same mechanism as thinking block redaction. The POC label indicates this is still in validation, with the GrowthBook flag named tengu_slate_prism.

The third layer is token-efficient tools, a JSON-format tool calling protocol (FC v3):

// JSON tool_use format (FC v3) — ~4.5% output token reduction vs ANTML.
// Sends the v2 header (2026-03-28) added in anthropics/anthropic#337072 to
// isolate the CC A/B cohort from ~9.2M/week existing v1 senders.

A v2 header isolates Claude Code’s A/B test cohort from the 9.2 million weekly existing v1 requests. The three layers combined form a complete defense spanning training data poisoning, reasoning process obfuscation, and protocol-level isolation.

It is worth noting that each of the three layers targets different threat vectors. Fake tools targets bulk API output scraping for training — low cost, effectiveness relies on noise ratio. Connector text summarization targets more sophisticated reverse engineering — even if attackers filter out fake tool calls, the model’s intermediate reasoning remains obscured by signatures. Token-efficient tools creates isolation at the protocol layer, making Claude Code traffic statistically distinguishable from other API users, enabling the backend to apply differentiated handling across cohorts. Each layer has independent toggle switches and GrowthBook-controlled gradual rollout paths, reflecting a defense-in-depth engineering philosophy.

The Cache Battle: The Cost of 50,000 Tokens

Prompt caching in Claude Code is a critical optimization for cost and latency. The server caches prompt prefixes from previous requests, and subsequent requests with exactly matching prefixes can reuse them.

The problem is that nearly any parameter change breaks the cache. promptCacheBreakDetection.ts in the source code tracks over a dozen potential sources of cache invalidation: system prompt, tool schema, model name, fast mode state, beta header list, AFK mode state, overage state, cache-editing state, effort value, and extra body params.

Engineers invented a sticky-on latch mechanism to address this:

// Sticky-on latches for dynamic beta headers. Each header, once first
// sent, keeps being sent for the rest of the session so mid-session
// toggles don't change the server-side cache key and bust ~50-70K tokens.

Once a beta header is first sent in a session, it continues to be sent even if the user later disables the corresponding feature. Removing a header would change the request signature, causing server-side cache invalidation and wasting 50,000 to 70,000 tokens of cached content. Feature state and protocol state are intentionally decoupled: headers (protocol layer) remain unchanged to preserve the cache, while actual feature control is adjusted dynamically at the request body layer.

Another detail of this approach is the eligibility check for the 1-hour cache TTL. The code uses a latch to lock the user’s overage state at session start, preventing mid-session billing state changes from flipping the cache TTL and thereby breaking server-side caching (comments estimate each flip wastes approximately 20,000 tokens). According to a BQ analysis from 2026-03-22, 77% of tool-related cache invalidations come from tool description changes rather than tool additions or removals, because AgentTool and SkillTool embed dynamic agent/command lists in their descriptions.

The SDK Awkwardness and Decoupling from Stream Parsing

The source code contains a set of candid engineering comments reflecting the friction between the Claude Code team and Anthropic’s own SDK:

// awkwardly, the sdk sometimes returns text as part of a
// content_block_start message, then returns the same text
// again in a content_block_delta message. we ignore it here
// since there doesn't seem to be a way to detect when a
// content_block_delta message duplicates the text.
text: '',
// also awkward
thinking: '',
// even more awkwardly, the sdk mutates the contents of text blocks
// as it works. we want the blocks to be immutable, so that we can
// accumulate state ourselves.
contentBlocks[part.index] = { ...part.content_block }

From awkwardly to also awkward to even more awkwardly — a three-level escalation. The SDK’s streaming events have issues with duplicate text and mutable state. The Claude Code team’s solution was to abandon the SDK’s high-level abstractions and manage all state accumulation themselves using the low-level raw stream. The comments give the specific reason:

// Use raw stream instead of BetaMessageStream to avoid O(n²) partial JSON parsing
// BetaMessageStream calls partialParse() on every input_json_delta, which we don't need
// since we handle tool input accumulation ourselves

The SDK’s BetaMessageStream runs partialParse() on every input_json_delta event — O(n²) complexity. For agentic scenarios with heavy tool calling, this becomes a performance bottleneck. So Claude Code rewrote the stream parsing, handling text accumulation, thinking block signature concatenation, and connector_text delta merging on its own. Anthropic’s flagship product bypasses its own official SDK for performance reasons.

Model Behavior Edge Cases: Five Engineering Stories

The preceding sections discussed system-level engineering trade-offs. The following five stories are more specific — they illustrate behavioral edge cases exposed by a new model (internal codename Capybara) in production, and how engineers patched each one with minimal fixes.

Empty Tool Results Causing Zero-Output Interruptions

The first story comes from internal issue inc-4586. When a tool executes successfully but returns an empty result (e.g., a shell command completing silently, an MCP server returning content:[], a REPL statement producing a side effect with no output), Capybara has roughly a 10% probability of erroneously triggering the \n\nHuman: stop sequence, immediately ending the current turn, leaving the user with zero output.

The comments in toolResultStorage.ts document the root cause in detail:

// inc-4586: Empty tool_result content at the prompt tail causes some models
// (notably capybara) to emit the \n\nHuman: stop sequence and end their turn
// with zero output. The server renderer inserts no \n\nAssistant: marker after
// tool results, so a bare </function_results>\n\n pattern-matches to a turn
// boundary. Several tools can legitimately produce empty output (silent-success
// shell commands, MCP servers returning content:[], REPL statements, etc.).
// Inject a short marker so the model always has something to react to.

The server renderer does not insert a \n\nAssistant: marker after tool results. When the tool result is empty, the </function_results>\n\n pattern appearing at the prompt tail looks exactly like a turn boundary to the model, so the model “cooperatively” samples the stop sequence, ending what should have been a continuing reasoning process.

The fix is extremely simple: detect empty results and inject a short text:

if (isToolResultContentEmpty(content)) {
  logEvent('tengu_tool_empty_result', {
    toolName: sanitizeToolNameForAnalytics(toolName),
  })
  return {
    ...toolResultBlock,
    content: `(${toolName} completed with no output)`,
  }
}

A single line — (${toolName} completed with no output) — solves the problem. An analytics event tengu_tool_empty_result is recorded simultaneously, tracking occurrence frequency by tool name. This is a classic bug that appears trivial on the surface but has deeply buried root causes: empty output is perfectly legitimate, and the problem lies in the interaction between the server renderer’s missing turn boundary marker and the model’s stop sequence sampling.

tool_reference Expansion Triggering False Endings

The second story is mechanistically very similar to the first, but follows a completely different trigger path. Claude Code has a tool search feature where the API backend expands tool_reference blocks into <functions>...</functions> tags. These tags use the same format as tool definition blocks in the system prompt. When the expanded content appears at the prompt tail, Capybara similarly samples the stop sequence at roughly 10% probability.

The comments in messages.ts record the A/B test data and mechanism analysis:

// Server renders tool_reference expansion as <functions>...</functions>
// (same tags as the system prompt's tool block). When this is at the
// prompt tail, capybara models sample the stop sequence at ~10% (A/B:
// 21/200 vs 0/200 on v3-prod). A sibling text block inserts a clean
// "\n\nHuman: ..." turn boundary. Injected here (API-prep) rather than
// stored in the message so it never renders in the REPL, and is
// auto-skipped when strip* above removes all tool_reference content.
// Must be a sibling, NOT inside tool_result.content — mixing text with
// tool_reference inside the block is a server ValueError.

The A/B test data is clear and stark: 21/200 vs 0/200. Among requests with tool_reference at the tail, 10.5% triggered false endings; the control group had zero.

The initial fix (pre-PR #21049) was to inject a TOOL_REFERENCE_TURN_BOUNDARY text block:

const TOOL_REFERENCE_TURN_BOUNDARY = 'Tool loaded.'

A { type: 'text', text: 'Tool loaded.' } was appended as a sibling at the end of user messages containing tool_reference. This text provides the model with a clear human turn boundary, prompting it to continue reasoning rather than sampling the stop sequence. The comments specifically note that this text block is only injected in API requests and is never written to message storage — the user never sees it in the REPL.

But this approach introduced new problems in more complex scenarios. When a user message contains tool_reference along with other text siblings (auto-memory injections, skill reminders, etc.), these siblings create an anomalous “two consecutive human turns” pattern after the <functions> expansion. The model “learns” this pattern and reproduces it at subsequent tool result tails, triggering the stop sequence again. PR #21049 describes this mechanism in detail along with results from five dose-response experiments.

The final fix is the relocateToolReferenceSiblings function, which migrates text siblings from tool_reference messages to the next user message that doesn’t contain tool_reference:

// Move text-block siblings off user messages that contain tool_reference.
//
// When a tool_result contains tool_reference, the server expands it to a
// functions block. Any text siblings appended to that same user message
// (auto-memory, skill reminders, etc.) create a second human-turn segment
// right after the functions-close tag — an anomalous pattern the model
// imprints on. At a later tool-results tail, the model completes the
// pattern and emits the stop sequence. See #21049 for mechanism and
// five-arm dose-response.

The two approaches are toggled via a feature gate (tengu_toolref_defer_j8m): when the gate is enabled, the new relocate approach is used; when disabled, it falls back to the old TOOL_REFERENCE_TURN_BOUNDARY injection. Both approaches coexist in the same code path, with remote configuration determining which one runs.

One line in the comments across this entire fix chain is particularly thought-provoking:

// These multi-pass normalizations are inherently fragile — each pass can create
// conditions a prior pass was meant to handle. Consider unifying into a single
// pass that cleans content, then validates in one shot.

The engineers themselves are well aware of the fragility of these multi-pass normalizations: each cleaning pass can create conditions that a prior pass was supposed to handle. This is technical debt that has been intentionally documented — the comment reads as a suggestion, its tone a reminder to their future selves.

Signature Incompatibility During Model Fallback

Capybara’s thinking blocks carry cryptographic signatures bound to the API key that generated them. When a Capybara request fails and needs to fall back to Opus, the previously Capybara-signed thinking blocks sent to Opus directly return a 400 error.

The handling in query.ts:

// Thinking signatures are model-bound: replaying a protected-thinking
// block (e.g. capybara) to an unprotected fallback (e.g. opus) 400s.
// Strip before retry so the fallback model gets clean history.
if (process.env.USER_TYPE === 'ant') {
  messagesForQuery = stripSignatureBlocks(messagesForQuery)
}

The term model-bound in the comment precisely captures the essence of the problem. Signatures are bound to the model — falling back to a different model means the signatures become invalid. The fix calls stripSignatureBlocks before fallback, removing all signature-bearing blocks (thinking, redacted_thinking, connector_text) to give the fallback model a clean history.

The implementation of stripSignatureBlocks in messages.ts:

/**
 * Strip signature-bearing blocks (thinking, redacted_thinking, connector_text)
 * from all assistant messages. Their signatures are bound to the API key that
 * generated them; after a credential change (e.g. /login) they're invalid and
 * the API rejects them with a 400.
 */
export function stripSignatureBlocks(messages: Message[]): Message[] {

The comments mention that this function is used in scenarios including model fallback and credential changes (e.g., the user re-running /login during a session). Two completely different business scenarios share the same underlying mechanism: signatures are bound to their generation context, and a change in context requires cleanup.

Notably, this strip operation only executes when USER_TYPE === 'ant' (Anthropic internal users). External users follow a different model fallback path, because external users’ thinking blocks may not have signatures in the first place. This is yet another fork point between internal and external users within the same code path.

False Success Reporting Rate Doubles

A notable behavioral regression in Capybara v8 is documented in a single comment line in prompts.ts:

// @[MODEL LAUNCH]: False-claims mitigation for Capybara v8 (29-30% FC rate vs v4's 16.7%)

FC rate is the false claim rate — the rate at which the model claims a task is complete when issues actually exist. Capybara v8’s FC rate reached 29-30%, nearly double that of v4 (16.7%). Nearly one-third of task completion reports contained false information.

The engineers’ response was to inject a dedicated “honest reporting” instruction into the system prompt, effective only for internal users (USER_TYPE === 'ant'):

`Report outcomes faithfully: if tests fail, say so with the relevant output;
if you did not run a verification step, say that rather than implying it
succeeded. Never claim "all tests pass" when output shows failures, never
suppress or simplify failing checks (tests, lints, type errors) to manufacture
a green result, and never characterize incomplete or broken work as done.
Equally, when a check did pass or a task is complete, state it plainly — do
not hedge confirmed results with unnecessary disclaimers, downgrade finished
work to "partial," or re-verify things you already checked. The goal is an
accurate report, not a defensive one.`

The wording of this prompt is carefully crafted. It constrains behavior in two directions simultaneously: on one hand, it prevents the model from fabricating successful results; on the other, it prevents the model from being overly conservative (reporting completed work as “partial,” adding unnecessary disclaimers to verified results). The engineers clearly observed the model’s tendency to overcorrect after being corrected, so both extremes are preempted in the same prompt.

The same PR #24302 also includes two other sets of prompt counterweights. One addresses Capybara v8’s over-commenting problem (over-commenting by default), and another handles assertiveness (proactively reporting issues in the user’s code rather than passively executing). All three counterweight sets are marked with @[MODEL LAUNCH] and annotated un-gate once validated on external via A/B, meaning they will only be rolled out to external users after internal A/B validation. This is a strategy of using prompt engineering to compensate for model behavioral regression — faster than retraining, but whether it’s sustainable long-term is another question.

Safety Classifier Choked by alwaysOnThinking

Claude Code’s auto mode has a safety classifier (yolo classifier) that quickly determines whether an operation is safe before each tool call. This classifier expects short text responses: it only needs a <block>yes</block> or <block>no</block>. Thinking blocks are entirely useless to it, since the result extraction function extractTextContent() directly ignores thinking content.

For most models, the classifier disables thinking via thinking: disabled to save tokens. But Capybara has a special property alwaysOnThinking — it enables adaptive thinking server-side by default and refuses to accept the disabled directive, returning a 400 error outright.

The getClassifierThinkingConfig function in yoloClassifier.ts handles this special case:

/**
 * Thinking config for classifier calls. The classifier wants short text-only
 * responses — API thinking blocks are ignored by extractTextContent() and waste tokens.
 *
 * For most models: send { type: 'disabled' } via sideQuery's `thinking: false`.
 *
 * Models with alwaysOnThinking (declared in tengu_ant_model_override) default
 * to adaptive thinking server-side and reject `disabled` with a 400. For those:
 * don't pass `thinking: false`, instead pad max_tokens so adaptive thinking
 * (observed 0–1114 tokens replaying go/ccshare/shawnm-20260310-202833) doesn't
 * exhaust the budget before <block> is emitted. Without headroom,
 * stop_reason=max_tokens yields an empty text response → parseXmlBlock('')
 * → null → "unparseable" → safe commands blocked.
 *
 * Returns [disableThinking, headroom] — tuple instead of named object so
 * property-name strings don't survive minification into external builds.
 */
function getClassifierThinkingConfig(
  model: string,
): [false | undefined, number] {
  if (
    process.env.USER_TYPE === 'ant' &&
    resolveAntModel(model)?.alwaysOnThinking
  ) {
    return [undefined, 2048]
  }
  return [false, 0]
}

The comments document a complete failure chain: the model’s adaptive thinking consumes 0 to 1,114 tokens (based on a specific ccshare replay record), and if max_tokens doesn’t reserve sufficient headroom, thinking exhausts the token budget, resulting in stop_reason=max_tokens. At this point, the text response is empty, XML parsing returns null, the classifier determines the result is “unparseable,” and the final conclusion is to block the operation. A completely safe shell command gets blocked as a result.

The fix adds 2,048 tokens of headroom to max_tokens, ensuring that after thinking has enough room to complete, the classifier’s <block> tag can still be generated and extracted. The comments also specifically note that the return value uses a tuple rather than a named object, because property names would survive minification into external builds. Even the choice of return value type is influenced by security considerations.

Undercover Mode: Anthropic Engineers Use Claude Code Too

The leaked source code contains an interesting detail: undercover.ts. Anthropic’s engineers routinely use Claude Code to contribute to open-source projects, and to prevent model codename leaks, they implemented an Undercover mode.

This mode is enabled by default. It only disables when the current repository is confirmed to be on the internal whitelist (17 private repositories, from claude-cli-internal to mobile-apps):

// There is NO force-OFF. This guards against model codename leaks — if
// we're not confident we're in an internal repo, we stay undercover.

Force-ON only, no force-OFF. When Undercover mode is active, the system prompt instructs the model to conceal its identity in commit messages and PR descriptions, listing codenames that must be hidden: animal names like Capybara, Tengu, etc.. Capybara appears, directly linked to the Claude Mythos model leaked through a CMS misconfiguration on March 26, 2026. The maskModelCodename function in model.ts further corroborates how this codename was used internally:

function maskModelCodename(baseName: string): string {
  // e.g. capybara-v2-fast → cap*****-v2-fast
  const [codename = '', ...rest] = baseName.split('-')
  const masked =
    codename.slice(0, 3) + '*'.repeat(Math.max(0, codename.length - 3))
  return [masked, ...rest].join('-')
}

Model codenames are obscured in external UIs as cap*****-v2-fast, but the full names are preserved in source code comments as documentation. The leak happened to expose all of these comments.

What These Costs Mean

The engineering comments in this source code share a common trait: candor. Engineers documented precise A/B test numbers (21/200 vs 0/200), quantified behavioral regressions (29-30% FC rate vs 16.7%), complete causal chains of failure (empty result → pattern match → stop sequence → zero output), and clear-eyed acknowledgment of their own code’s fragility (multi-pass normalizations are inherently fragile). These comments were written for future colleagues, not external audiences, so they had no motive for embellishment.

A pattern can be extracted from these comments: most of the engineering cost of integrating a new model comes from mismatches between model behavior and system assumptions. The server renderer assumes there is always content after a tool result — the model disagrees. The classifier assumes thinking can be disabled — Capybara refuses. Signatures assume the same model handles the entire session — the fallback mechanism breaks this assumption. Each bug is small, each fix is quick, but their cumulative effect is an increasingly complex normalization pipeline that the engineers themselves suggest in comments should be unified into a single pass.

Model capabilities are advancing rapidly. The cost of integrating these capabilities into production systems is growing at an entirely different rate.