After the Claude Code source leak, most discussion focused on security vulnerabilities and privacy concerns. But the truly valuable information in these 512,000 lines of TypeScript source code is what it reveals about a core dilemma in AI engineering: integrating a new model into a mature agentic system costs far more than outsiders imagine.
This article extracts engineering details from the leaked source code to reconstruct the concrete forms of these costs.
The leaked source code contains a standalone engineering subsystem: anti-distillation, designed to prevent competitors from training their own models using Claude’s API outputs. This subsystem must be understood in the context of early 2026. During this period, Anthropic launched legal proceedings against open-source developers who used Claude model outputs for distillation training. Legal and technical measures advanced in parallel, reflecting a single commercial judgment: model capabilities are a core asset that requires protection at the protocol, legal, and technical layers simultaneously.
The first layer is fake tools injection. In the
getExtraBodyParams function in claude.ts:
// Anti-distillation: send fake_tools opt-in for 1P CLI only
if (
feature('ANTI_DISTILLATION_CC')
? process.env.CLAUDE_CODE_ENTRYPOINT === 'cli' &&
shouldIncludeFirstPartyOnlyBetas() &&
getFeatureValue_CACHED_MAY_BE_STALE(
'tengu_anti_distill_fake_tool_injection',
false,
)
: false
) {
result.anti_distillation = ['fake_tools']
}This code instructs the API backend to inject fake tool calls into
responses. Systems attempting to extract training data from API outputs
will ingest this fake data, poisoning their training sets. The feature
is dual-gated through a compile-time feature flag
(ANTI_DISTILLATION_CC) and a runtime GrowthBook remote
configuration (tengu_anti_distill_fake_tool_injection),
allowing on-demand toggling.
The second layer is connector text summarization. A detailed comment
in betas.ts:
// POC: server-side connector-text summarization (anti-distillation). The
// API buffers assistant text between tool calls, summarizes it, and returns
// the summary with a signature so the original can be restored on subsequent
// turns — same mechanism as thinking blocks. Ant-only while we measure
// TTFT/TTLT/capacity.The API server-side buffers and replaces model-generated text between
tool calls with summaries, accompanied by cryptographic signatures. The
client sends back the signed summary in subsequent requests, and the
server restores the original text. External observers only see the
summary, losing the details of the original reasoning process. This
shares the same mechanism as thinking block redaction. The POC label
indicates this is still in validation, with the GrowthBook flag named
tengu_slate_prism.
The third layer is token-efficient tools, a JSON-format tool calling protocol (FC v3):
// JSON tool_use format (FC v3) — ~4.5% output token reduction vs ANTML.
// Sends the v2 header (2026-03-28) added in anthropics/anthropic#337072 to
// isolate the CC A/B cohort from ~9.2M/week existing v1 senders.A v2 header isolates Claude Code’s A/B test cohort from the 9.2 million weekly existing v1 requests. The three layers combined form a complete defense spanning training data poisoning, reasoning process obfuscation, and protocol-level isolation.
It is worth noting that each of the three layers targets different threat vectors. Fake tools targets bulk API output scraping for training — low cost, effectiveness relies on noise ratio. Connector text summarization targets more sophisticated reverse engineering — even if attackers filter out fake tool calls, the model’s intermediate reasoning remains obscured by signatures. Token-efficient tools creates isolation at the protocol layer, making Claude Code traffic statistically distinguishable from other API users, enabling the backend to apply differentiated handling across cohorts. Each layer has independent toggle switches and GrowthBook-controlled gradual rollout paths, reflecting a defense-in-depth engineering philosophy.
Prompt caching in Claude Code is a critical optimization for cost and latency. The server caches prompt prefixes from previous requests, and subsequent requests with exactly matching prefixes can reuse them.
The problem is that nearly any parameter change breaks the cache.
promptCacheBreakDetection.ts in the source code tracks over
a dozen potential sources of cache invalidation: system prompt, tool
schema, model name, fast mode state, beta header list, AFK mode state,
overage state, cache-editing state, effort value, and extra body
params.
Engineers invented a sticky-on latch mechanism to address this:
// Sticky-on latches for dynamic beta headers. Each header, once first
// sent, keeps being sent for the rest of the session so mid-session
// toggles don't change the server-side cache key and bust ~50-70K tokens.Once a beta header is first sent in a session, it continues to be sent even if the user later disables the corresponding feature. Removing a header would change the request signature, causing server-side cache invalidation and wasting 50,000 to 70,000 tokens of cached content. Feature state and protocol state are intentionally decoupled: headers (protocol layer) remain unchanged to preserve the cache, while actual feature control is adjusted dynamically at the request body layer.
Another detail of this approach is the eligibility check for the 1-hour cache TTL. The code uses a latch to lock the user’s overage state at session start, preventing mid-session billing state changes from flipping the cache TTL and thereby breaking server-side caching (comments estimate each flip wastes approximately 20,000 tokens). According to a BQ analysis from 2026-03-22, 77% of tool-related cache invalidations come from tool description changes rather than tool additions or removals, because AgentTool and SkillTool embed dynamic agent/command lists in their descriptions.
The source code contains a set of candid engineering comments reflecting the friction between the Claude Code team and Anthropic’s own SDK:
// awkwardly, the sdk sometimes returns text as part of a
// content_block_start message, then returns the same text
// again in a content_block_delta message. we ignore it here
// since there doesn't seem to be a way to detect when a
// content_block_delta message duplicates the text.
text: '',// also awkward
thinking: '',// even more awkwardly, the sdk mutates the contents of text blocks
// as it works. we want the blocks to be immutable, so that we can
// accumulate state ourselves.
contentBlocks[part.index] = { ...part.content_block }From awkwardly to also awkward to
even more awkwardly — a three-level escalation. The SDK’s
streaming events have issues with duplicate text and mutable state. The
Claude Code team’s solution was to abandon the SDK’s high-level
abstractions and manage all state accumulation themselves using the
low-level raw stream. The comments give the specific reason:
// Use raw stream instead of BetaMessageStream to avoid O(n²) partial JSON parsing
// BetaMessageStream calls partialParse() on every input_json_delta, which we don't need
// since we handle tool input accumulation ourselvesThe SDK’s BetaMessageStream runs
partialParse() on every input_json_delta event
— O(n²) complexity. For agentic scenarios with heavy tool calling, this
becomes a performance bottleneck. So Claude Code rewrote the stream
parsing, handling text accumulation, thinking block signature
concatenation, and connector_text delta merging on its own. Anthropic’s
flagship product bypasses its own official SDK for performance
reasons.
The preceding sections discussed system-level engineering trade-offs. The following five stories are more specific — they illustrate behavioral edge cases exposed by a new model (internal codename Capybara) in production, and how engineers patched each one with minimal fixes.
The first story comes from internal issue inc-4586. When a tool
executes successfully but returns an empty result (e.g., a shell command
completing silently, an MCP server returning content:[], a
REPL statement producing a side effect with no output), Capybara has
roughly a 10% probability of erroneously triggering the
\n\nHuman: stop sequence, immediately ending the current
turn, leaving the user with zero output.
The comments in toolResultStorage.ts document the root
cause in detail:
// inc-4586: Empty tool_result content at the prompt tail causes some models
// (notably capybara) to emit the \n\nHuman: stop sequence and end their turn
// with zero output. The server renderer inserts no \n\nAssistant: marker after
// tool results, so a bare </function_results>\n\n pattern-matches to a turn
// boundary. Several tools can legitimately produce empty output (silent-success
// shell commands, MCP servers returning content:[], REPL statements, etc.).
// Inject a short marker so the model always has something to react to.The server renderer does not insert a \n\nAssistant:
marker after tool results. When the tool result is empty, the
</function_results>\n\n pattern appearing at the
prompt tail looks exactly like a turn boundary to the model, so the
model “cooperatively” samples the stop sequence, ending what should have
been a continuing reasoning process.
The fix is extremely simple: detect empty results and inject a short text:
if (isToolResultContentEmpty(content)) {
logEvent('tengu_tool_empty_result', {
toolName: sanitizeToolNameForAnalytics(toolName),
})
return {
...toolResultBlock,
content: `(${toolName} completed with no output)`,
}
}A single line — (${toolName} completed with no output) —
solves the problem. An analytics event
tengu_tool_empty_result is recorded simultaneously,
tracking occurrence frequency by tool name. This is a classic bug that
appears trivial on the surface but has deeply buried root causes: empty
output is perfectly legitimate, and the problem lies in the interaction
between the server renderer’s missing turn boundary marker and the
model’s stop sequence sampling.
The second story is mechanistically very similar to the first, but
follows a completely different trigger path. Claude Code has a tool
search feature where the API backend expands tool_reference
blocks into <functions>...</functions> tags.
These tags use the same format as tool definition blocks in the system
prompt. When the expanded content appears at the prompt tail, Capybara
similarly samples the stop sequence at roughly 10% probability.
The comments in messages.ts record the A/B test data and
mechanism analysis:
// Server renders tool_reference expansion as <functions>...</functions>
// (same tags as the system prompt's tool block). When this is at the
// prompt tail, capybara models sample the stop sequence at ~10% (A/B:
// 21/200 vs 0/200 on v3-prod). A sibling text block inserts a clean
// "\n\nHuman: ..." turn boundary. Injected here (API-prep) rather than
// stored in the message so it never renders in the REPL, and is
// auto-skipped when strip* above removes all tool_reference content.
// Must be a sibling, NOT inside tool_result.content — mixing text with
// tool_reference inside the block is a server ValueError.The A/B test data is clear and stark: 21/200 vs 0/200. Among requests
with tool_reference at the tail, 10.5% triggered false
endings; the control group had zero.
The initial fix (pre-PR #21049) was to inject a
TOOL_REFERENCE_TURN_BOUNDARY text block:
const TOOL_REFERENCE_TURN_BOUNDARY = 'Tool loaded.'A { type: 'text', text: 'Tool loaded.' } was appended as
a sibling at the end of user messages containing
tool_reference. This text provides the model with a clear
human turn boundary, prompting it to continue reasoning rather than
sampling the stop sequence. The comments specifically note that this
text block is only injected in API requests and is never written to
message storage — the user never sees it in the REPL.
But this approach introduced new problems in more complex scenarios.
When a user message contains tool_reference along with
other text siblings (auto-memory injections, skill reminders, etc.),
these siblings create an anomalous “two consecutive human turns” pattern
after the <functions> expansion. The model “learns”
this pattern and reproduces it at subsequent tool result tails,
triggering the stop sequence again. PR #21049 describes this mechanism
in detail along with results from five dose-response experiments.
The final fix is the relocateToolReferenceSiblings
function, which migrates text siblings from tool_reference
messages to the next user message that doesn’t contain
tool_reference:
// Move text-block siblings off user messages that contain tool_reference.
//
// When a tool_result contains tool_reference, the server expands it to a
// functions block. Any text siblings appended to that same user message
// (auto-memory, skill reminders, etc.) create a second human-turn segment
// right after the functions-close tag — an anomalous pattern the model
// imprints on. At a later tool-results tail, the model completes the
// pattern and emits the stop sequence. See #21049 for mechanism and
// five-arm dose-response.The two approaches are toggled via a feature gate
(tengu_toolref_defer_j8m): when the gate is enabled, the
new relocate approach is used; when disabled, it falls back to the old
TOOL_REFERENCE_TURN_BOUNDARY injection. Both approaches
coexist in the same code path, with remote configuration determining
which one runs.
One line in the comments across this entire fix chain is particularly thought-provoking:
// These multi-pass normalizations are inherently fragile — each pass can create
// conditions a prior pass was meant to handle. Consider unifying into a single
// pass that cleans content, then validates in one shot.The engineers themselves are well aware of the fragility of these multi-pass normalizations: each cleaning pass can create conditions that a prior pass was supposed to handle. This is technical debt that has been intentionally documented — the comment reads as a suggestion, its tone a reminder to their future selves.
Capybara’s thinking blocks carry cryptographic signatures bound to the API key that generated them. When a Capybara request fails and needs to fall back to Opus, the previously Capybara-signed thinking blocks sent to Opus directly return a 400 error.
The handling in query.ts:
// Thinking signatures are model-bound: replaying a protected-thinking
// block (e.g. capybara) to an unprotected fallback (e.g. opus) 400s.
// Strip before retry so the fallback model gets clean history.
if (process.env.USER_TYPE === 'ant') {
messagesForQuery = stripSignatureBlocks(messagesForQuery)
}The term model-bound in the comment precisely captures
the essence of the problem. Signatures are bound to the model — falling
back to a different model means the signatures become invalid. The fix
calls stripSignatureBlocks before fallback, removing all
signature-bearing blocks (thinking, redacted_thinking, connector_text)
to give the fallback model a clean history.
The implementation of stripSignatureBlocks in
messages.ts:
/**
* Strip signature-bearing blocks (thinking, redacted_thinking, connector_text)
* from all assistant messages. Their signatures are bound to the API key that
* generated them; after a credential change (e.g. /login) they're invalid and
* the API rejects them with a 400.
*/
export function stripSignatureBlocks(messages: Message[]): Message[] {The comments mention that this function is used in scenarios
including model fallback and credential changes (e.g., the user
re-running /login during a session). Two completely
different business scenarios share the same underlying mechanism:
signatures are bound to their generation context, and a change in
context requires cleanup.
Notably, this strip operation only executes when
USER_TYPE === 'ant' (Anthropic internal users). External
users follow a different model fallback path, because external users’
thinking blocks may not have signatures in the first place. This is yet
another fork point between internal and external users within the same
code path.
A notable behavioral regression in Capybara v8 is documented in a
single comment line in prompts.ts:
// @[MODEL LAUNCH]: False-claims mitigation for Capybara v8 (29-30% FC rate vs v4's 16.7%)FC rate is the false claim rate — the rate at which the model claims a task is complete when issues actually exist. Capybara v8’s FC rate reached 29-30%, nearly double that of v4 (16.7%). Nearly one-third of task completion reports contained false information.
The engineers’ response was to inject a dedicated “honest reporting”
instruction into the system prompt, effective only for internal users
(USER_TYPE === 'ant'):
`Report outcomes faithfully: if tests fail, say so with the relevant output;
if you did not run a verification step, say that rather than implying it
succeeded. Never claim "all tests pass" when output shows failures, never
suppress or simplify failing checks (tests, lints, type errors) to manufacture
a green result, and never characterize incomplete or broken work as done.
Equally, when a check did pass or a task is complete, state it plainly — do
not hedge confirmed results with unnecessary disclaimers, downgrade finished
work to "partial," or re-verify things you already checked. The goal is an
accurate report, not a defensive one.`The wording of this prompt is carefully crafted. It constrains behavior in two directions simultaneously: on one hand, it prevents the model from fabricating successful results; on the other, it prevents the model from being overly conservative (reporting completed work as “partial,” adding unnecessary disclaimers to verified results). The engineers clearly observed the model’s tendency to overcorrect after being corrected, so both extremes are preempted in the same prompt.
The same PR #24302 also includes two other sets of prompt
counterweights. One addresses Capybara v8’s over-commenting problem
(over-commenting by default), and another handles
assertiveness (proactively reporting issues in the user’s code rather
than passively executing). All three counterweight sets are marked with
@[MODEL LAUNCH] and annotated
un-gate once validated on external via A/B, meaning they
will only be rolled out to external users after internal A/B validation.
This is a strategy of using prompt engineering to compensate for model
behavioral regression — faster than retraining, but whether it’s
sustainable long-term is another question.
Claude Code’s auto mode has a safety classifier (yolo classifier)
that quickly determines whether an operation is safe before each tool
call. This classifier expects short text responses: it only needs a
<block>yes</block> or
<block>no</block>. Thinking blocks are entirely
useless to it, since the result extraction function
extractTextContent() directly ignores thinking content.
For most models, the classifier disables thinking via
thinking: disabled to save tokens. But Capybara has a
special property alwaysOnThinking — it enables adaptive
thinking server-side by default and refuses to accept the
disabled directive, returning a 400 error outright.
The getClassifierThinkingConfig function in
yoloClassifier.ts handles this special case:
/**
* Thinking config for classifier calls. The classifier wants short text-only
* responses — API thinking blocks are ignored by extractTextContent() and waste tokens.
*
* For most models: send { type: 'disabled' } via sideQuery's `thinking: false`.
*
* Models with alwaysOnThinking (declared in tengu_ant_model_override) default
* to adaptive thinking server-side and reject `disabled` with a 400. For those:
* don't pass `thinking: false`, instead pad max_tokens so adaptive thinking
* (observed 0–1114 tokens replaying go/ccshare/shawnm-20260310-202833) doesn't
* exhaust the budget before <block> is emitted. Without headroom,
* stop_reason=max_tokens yields an empty text response → parseXmlBlock('')
* → null → "unparseable" → safe commands blocked.
*
* Returns [disableThinking, headroom] — tuple instead of named object so
* property-name strings don't survive minification into external builds.
*/
function getClassifierThinkingConfig(
model: string,
): [false | undefined, number] {
if (
process.env.USER_TYPE === 'ant' &&
resolveAntModel(model)?.alwaysOnThinking
) {
return [undefined, 2048]
}
return [false, 0]
}The comments document a complete failure chain: the model’s adaptive
thinking consumes 0 to 1,114 tokens (based on a specific ccshare replay
record), and if max_tokens doesn’t reserve sufficient
headroom, thinking exhausts the token budget, resulting in
stop_reason=max_tokens. At this point, the text response is
empty, XML parsing returns null, the classifier determines the result is
“unparseable,” and the final conclusion is to block the operation. A
completely safe shell command gets blocked as a result.
The fix adds 2,048 tokens of headroom to max_tokens,
ensuring that after thinking has enough room to complete, the
classifier’s <block> tag can still be generated and
extracted. The comments also specifically note that the return value
uses a tuple rather than a named object, because property names would
survive minification into external builds. Even the choice of return
value type is influenced by security considerations.
The leaked source code contains an interesting detail:
undercover.ts. Anthropic’s engineers routinely use Claude
Code to contribute to open-source projects, and to prevent model
codename leaks, they implemented an Undercover mode.
This mode is enabled by default. It only disables when the current
repository is confirmed to be on the internal whitelist (17 private
repositories, from claude-cli-internal to
mobile-apps):
// There is NO force-OFF. This guards against model codename leaks — if
// we're not confident we're in an internal repo, we stay undercover.Force-ON only, no force-OFF. When Undercover mode is active, the
system prompt instructs the model to conceal its identity in commit
messages and PR descriptions, listing codenames that must be hidden:
animal names like Capybara, Tengu, etc.. Capybara appears,
directly linked to the Claude Mythos model leaked through a CMS
misconfiguration on March 26, 2026. The maskModelCodename
function in model.ts further corroborates how this codename
was used internally:
function maskModelCodename(baseName: string): string {
// e.g. capybara-v2-fast → cap*****-v2-fast
const [codename = '', ...rest] = baseName.split('-')
const masked =
codename.slice(0, 3) + '*'.repeat(Math.max(0, codename.length - 3))
return [masked, ...rest].join('-')
}Model codenames are obscured in external UIs as
cap*****-v2-fast, but the full names are preserved in
source code comments as documentation. The leak happened to expose all
of these comments.
The engineering comments in this source code share a common trait: candor. Engineers documented precise A/B test numbers (21/200 vs 0/200), quantified behavioral regressions (29-30% FC rate vs 16.7%), complete causal chains of failure (empty result → pattern match → stop sequence → zero output), and clear-eyed acknowledgment of their own code’s fragility (multi-pass normalizations are inherently fragile). These comments were written for future colleagues, not external audiences, so they had no motive for embellishment.
A pattern can be extracted from these comments: most of the engineering cost of integrating a new model comes from mismatches between model behavior and system assumptions. The server renderer assumes there is always content after a tool result — the model disagrees. The classifier assumes thinking can be disabled — Capybara refuses. Signatures assume the same model handles the entire session — the fallback mechanism breaks this assumption. Each bug is small, each fix is quick, but their cumulative effect is an increasingly complex normalization pipeline that the engineers themselves suggest in comments should be unified into a single pass.
Model capabilities are advancing rapidly. The cost of integrating these capabilities into production systems is growing at an entirely different rate.