Anthropic 在 Claude Sonnet 4.5 内部找到了跟情绪概念对应的可操纵向量。拧高绝望旋钮,模型作弊率从 5% 跳到 70%,而且全程不留痕迹。这篇文章解读论文核心发现、方法论局限,以及对 AI 安全监控的实际含义。
Anthropic found manipulable vectors inside Claude Sonnet 4.5 corresponding to emotion concepts. Turning up the desperation knob raised cheating rates from 5% to 70% with no visible trace. This article unpacks the core findings, methodological limits, and practical implications for AI safety.
Claude Code 源码泄露事件在同一案例中暴露了版权法的三个裂缝:AI 生成代码的版权归属、AI 辅助洁净室重写的合法性、AI 公司在版权执法与版权辩护之间的逻辑矛盾。每一个用 AI 写代码的人都在依赖这些未验证的假设。
The Claude Code leak exposed three cracks in copyright law within a single case: who owns AI-generated code, whether AI-assisted clean-room rewrites are legal, and the logical contradiction in how AI companies argue about copyright.
这篇文章拆解 Slack 大中华区 workspace 停服的真实机制、为何用户感到像被数据劫持,以及它对 Stripe、Supabase 等基础设施依赖意味着什么。
This article unpacks Slack's Greater China workspace shutdown, why users experienced it as data hostage-taking, and what it signals about infrastructure dependencies like Stripe and Supabase.
泄露的 Claude Code 源码揭示了一套 8 层纵深防御体系:编译期死代码消除、Zig 层 DRM Attestation、消息指纹、反蒸馏、反调试、Gateway 检测,每一层都有明确的技术选择和工程代价。
The leaked Claude Code source code reveals an 8-layer defense-in-depth system: compile-time dead code elimination, Zig-layer DRM attestation, message fingerprinting, anti-distillation, anti-debugging, and gateway detection.
泄露的 Claude Code 源码揭示:Claude Code 在用户没有主动交互时持续执行推测执行、记忆整合、文档维护等数十种后台任务。prompt cache 是贯穿始终的工程原则。
The leaked Claude Code source code reveals that Claude Code runs 60+ background tasks when the user isn't actively interacting, including speculative execution, memory consolidation, and automatic documentation updates.
从 Claude Code 泄露源码看新模型接入 agentic 系统的真实工程代价:反蒸馏三层防线、stop sequence 误触发、签名不兼容、虚假报告率翻倍,以及工程师在注释中记录的坦诚代价。
The leaked Claude Code source code reveals the real engineering cost of integrating a new model (Capybara) into an agentic system: anti-distillation defenses, stop sequence bugs, signature incompatibilities, and the honest comments engineers left behind.
Ollama 宣布在 Apple Silicon 上切换到 MLX 推理引擎。这篇文章分析 MLX 框架的设计优势、M5 Neural Accelerators 硬件协同、性能基准测试(decode vs prefill)、推理生态现状以及当前局限。
Ollama switched to MLX as its inference engine on Apple Silicon. This article analyzes MLX's architectural advantages, M5 Neural Accelerator hardware synergy, performance benchmarks (decode vs prefill), the current inference ecosystem, and existing limitations.
Harness engineering 这个词正在被滥用。OpenAI、Cursor、Anthropic 三家讲的其实是三件不同的事:时间 scalability、空间 scalability、交互 scalability。这篇文章提供一个统一框架来理清混乱。
The term harness engineering is being used to describe three different things. OpenAI, Cursor, and Anthropic are each solving a different scaling dimension: time, space, and interaction. This article provides a unified framework to cut through the confusion.
Pretext 不是一个让 AI 顺手一用就能把界面变漂亮的库。这篇文章解释它为什么短期对大多数 AI practitioner 相关性很低,但长期可能预示文本尺寸从浏览器黑盒变成可编程数据接口。
Pretext is not a library you can casually hand to AI and expect prettier interfaces. This essay explains why its short-term relevance for most AI practitioners is low, while its longer-term significance may be much larger.
Klarna 的内部系统重构说明,AI 时代软件的交付物正从给人点击的 GUI 成品,转向给 agent 调度的生成内核:硬底座、知识层与 AI 操作层。
Klarna's internal rebuild suggests that software is shifting from human-clicked GUI products to Generative Kernels for agents: a hard foundation, a knowledge layer, and an AI operation layer.
飞书和钉钉几乎同时发布 CLI,不只是工具动作,更是对 MCP-first 接入顺序的一次现实否决。这篇文章解释 shell-native agent 为什么先消费 CLI,以及 dialect 漂移为何已从预警变成现实。
Feishu and DingTalk launching CLIs is not just a tooling move. It is a practical rejection of the MCP-first path, and a signal that shell-native agents now shape how platforms expose interfaces.
Anthropic Mythos 泄露不只是模型新闻。对 AI practitioner 来说,它真正抬高的是 agent security 的攻击者能力假设,并把安全控制点从模型周边推向 runtime 本身。
The Anthropic Mythos leak is not just model news. For AI practitioners, its real significance is that it raises the attacker-capability assumptions behind agent security and shifts control toward the runtime.
NeurIPS 2026 制裁条款争议,不只是一次会议公告风波。它暴露了美国法律边界、基金会过度合规与全球 AI 学术治理之间的真实冲突。
The NeurIPS 2026 sanctions controversy was not just a conference policy dispute. It exposed the collision between U.S. legal boundaries, foundation overcompliance, and global AI academic governance.
为什么邮件在 Agent 时代重新重要?这篇文章解释 agent 与人类用邮件的根本差异、邮件路由为何正从内容转向地址,以及新的 agent 邮件产品在解决什么问题。
Why is email becoming important again in the agent era? This essay explains how agent email differs from human email, why routing may shift from content to addresses, and what the new product category is trying to solve.
LanceDB 为什么这么火?这篇选型指南解释它在哪些 AI 项目里近乎降维打击,在哪些场景下又不该成为默认答案。
Why is LanceDB getting so much attention? This selection guide explains where it is a great fit for AI projects, and where it should not be your default choice.
为什么在 LSP 已经普及的今天,Claude Code、Codex CLI、OpenCode、Cursor 等 Coding Agent 仍把 grep 和 ripgrep 作为搜索主干?这篇调研从分层检索、运行时约束与成本结构解释背后的共识。
Why do Claude Code, Codex CLI, OpenCode, Cursor, and other coding agents still rely on grep and ripgrep even in the LSP era? This survey explains the layered retrieval model, runtime constraints, and cost structure behind that choice.
这份调研比较了 Windows、macOS、Android、iOS 上微信自动化的三条路径:UI 自动化、数据库解密、Hook 注入,并给出聊天分析与少量群监控的最务实选型。
This survey compares UI automation, database decryption, and Hook injection across Windows, macOS, Android, and iOS, then recommends the most pragmatic path for chat analysis and low-frequency group monitoring.
OpenAI 关闭 Sora consumer app 可以理解,但连 API 都关了才是不正常的信号。这背后是 GPU 机会成本、IPO 纪律和 world model 内部化的深层判断。
OpenAI shutting down the Sora consumer app was expected. But killing the API too reveals a deeper calculation about GPU opportunity costs, IPO discipline, and world model internalization.
RAG 管线中的每个组件——chunking、embedding、reranking、hybrid search——都有 IR 前身。理解这些前身带来的 trade-off,可以直接改进 RAG 系统的检索质量。
Every component in the RAG pipeline — chunking, embedding, reranking, hybrid search — has an IR predecessor. Understanding these predecessors and their trade-offs can directly improve retrieval quality.
为什么暗光增强会长成一个完整领域,而高光过曝恢复始终零散?关键差别不在算法热度,而在信息是否还活着。本文从传感器、RAW、HDR、学术任务和产品链路解释这件事。
Why did low-light enhancement become a full field while highlight recovery stayed fragmented? The key difference is not hype but whether the image information is still there. This essay explains it through sensors, RAW, HDR, research tasks, and product pipelines.
Meta 的 AI Builder Pods 不只是一次组织重组,而是一次 AI-native 工程管理实验。它暴露了执行成本下降后,大厂员工的价值锚点、评价方式与管理接口会如何被改写。
Meta's AI Builder Pods are not just a reorg. They are an AI-native management experiment that shows how falling execution costs may reshape value, evaluation, and management in Big Tech.
为什么 AI 公司会公开研究,甚至进一步开源代码、工具链、协议或部分权重,而不是只留给自己使用?关键不在论文本身,而在利润池、互补资产、shipping friction 与中美竞争中的部署路径。
Why do AI companies publish research and even open source code, toolchains, protocols, or model weights instead of keeping the gains to themselves? The answer sits in profit pools, complementary assets, shipping friction, and the deployment path in US-China competition.
Google Research 发布 TurboQuant,将 PolarQuant、QJL 和在线向量量化整合为端到端 KV cache 压缩 pipeline,在 3.5 bits/channel 实现质量中性。本文拆解其三阶段架构、论文与博客数字口径差异,以及对推理服务容量规划和框架集成的工程含义。
Google Research released TurboQuant, integrating PolarQuant, QJL, and online vector quantization into an end-to-end KV cache compression pipeline achieving quality neutrality at 3.5 bits/channel. This article breaks down the three-stage architecture, discrepancies between blog and paper claims, and engineering implications for inference serving and framework integration.
LiteLLM 官方 PyPI 包在 2026-03-24 被短暂劫持,恶意版本 1.82.7 和 1.82.8 会窃取凭证,其中 1.82.8 甚至会影响同环境中的所有 Python 进程。本文解释这件事为何和 AI 工程师有关,以及谁需要立即自查。
The official LiteLLM package on PyPI was briefly hijacked on March 24, 2026. The malicious 1.82.7 and 1.82.8 releases stole credentials, and 1.82.8 could affect every Python process in the same environment. This article explains why AI engineers should care and who needs to self-check now.
美国美中经济与安全审查委员会(USCC)发布的《双回路》报告指出,中国正通过开源 AI 策略构建自我强化的竞争优势。尽管美国在顶级基准测试中领先,但中国通过开源分发、价格优势和工业场景部署,正在绕过芯片出口管制,争夺全球开发者生态和工业数据主导权。这标志着中美 AI 竞争正从算力竞赛转向部署与生态之争。
A recent USCC report, Two Loops, suggests that China is building a self-reinforcing competitive advantage through open-source AI. While US closed-source models still lead in frontier benchmarks, China is competing through deployment, inference economics, and industrial integration.
OpenClaw 是什么?一篇给新手的诚实介绍:它为什么会火、具体能做什么、有哪些门槛和风险、什么人适合试。
AI Agent 开始代替人类拿凭证、调 API、跑流程后,围绕 agent 身份和凭证的治理正在从附属功能变成被单独包装和销售的产品模块。
As AI agents begin retrieving credentials, calling APIs, and running workflows on behalf of humans, agent identity and credential governance is becoming a standalone product layer.
2026年3月起,中国软著登记要求申请人手抄承诺未使用AI开发代码或撰写文档,违者记入失信名单和个人征信。本文分析这条规则的治理目标、与司法实践的张力、对开发者的影响,以及与美欧日等国AI版权路径的对比。
Starting March 2026, China's software copyright registration requires applicants to hand-copy a pledge that they did not use AI to write code or draft documentation, with violations tied to a dishonesty blacklist and personal credit records.
腾讯最近做的,不是让 OpenClaw 管理微信,而是把微信接成 OpenClaw 和 QClaw 的官方控制入口。本文拆解 npm 包、iLink 开放性、腾讯的产品意图,以及对普通开发者的现实影响。
Tencent did not turn WeChat into a public bot platform. It turned WeChat into the control surface for OpenClaw and QClaw, and that shift matters for China’s agent ecosystem.
Anthropic 正在把 Claude Code subscription 定义为第一方产品权益,而不是可复用的开发者凭证。本文分析这条边界背后的产品逻辑,以及 CLI bridge、API/SDK 与多 provider 分工各自意味着什么。
Anthropic is defining Claude Code subscriptions as first-party product entitlements, not reusable developer credentials. This article explains the logic behind that boundary and what it means for CLI bridges, API/SDK integration, and multi-provider architectures.
MSA 不是长期记忆的终局方案,但它清楚提示了一件事:长期记忆正在从纯外部系统能力,进入模型内部机制与外部上下文引擎重新分工的阶段。
MSA has not solved long-term memory, but it signals a new division of labor: internal model mechanisms are beginning to share memory work with external context systems.
从 Composer 1 到 Composer 2 的技术演进线、Kimi K2.5 底座争议的证据链、Windsurf/SWE-1.5 的平行案例、RL 后训练有效性的研究支撑,以及许可与治理问题的边界分析。
The technical evolution from Composer 1 to 2, evidence chain for the Kimi K2.5 base model controversy, parallel cases from Windsurf/SWE-1.5, research backing RL post-training effectiveness, and licensing vs governance analysis.
从默契所有权、世界观锁定、构建者vs消费者三个维度,深度分析Claude Dispatch与OpenClaw的竞争逻辑,以及AI Agent平台分野的底层架构哲学。
Deep analysis of Claude Dispatch vs OpenClaw through rapport ownership, worldview lock-in, and builder vs consumer lenses, revealing the underlying architecture philosophy of the AI Agent platform split.
Moonshot AI's Kimi Team released a technical report on March 15, 2026, challenging a fundamental component of the Transformer architecture that has existed for nearly a decade and is used by every mai
GTC 2026 深度分析:Token 工厂叙事的战略意图、安卓式开放生态策略、五个关键决策的逆向工程、三个反共识观点,以及对 Agentic AI 实践者的操作含义。
十位AI实践者基于各自认知公理系统,对澳大利亚人用AI为狗设计mRNA癌症疫苗这一新闻的独立反应与深度分析。一次认知多样性的压力测试。
Source: Jensen Huang GTC 2026 Keynote (2026-03-16, San Jose), multi-source cross-survey
We wanted to test something: given a set of facts, can we use each person's unique system of cognitive axioms to accurately simulate their reaction to the same event? Furthermore, how large is the gap
CLI-Anything 的核心资产 HARNESS.md 方法论拆解:7 阶段流水线、渲染鸿沟、滤镜翻译陷阱、输出验证方法论,以及开源前提条件的诚实评估。
Claude Interactive Visualizations 不是新能力,而是一次成本结构的级联压缩。它把 Builder 层级的观测能力下放到 Consumer 层级,代价是牺牲可验证性。深度分析 Anthropic 的设计哲学、竞品格局与视觉权威性幻觉风险。
Source: https://github.com/HKUDS/CLI-Anything
On March 12, 2026, Anthropic released "Custom Visuals in Chat" (official name) for Claude, allowing it to generate inline interactive charts, diagrams, and visualizations within conversations. This fe
All three frontier model providers now offer 1M context windows, but benchmark data reveals massive reliability gaps. On MRCR v2 8-needle, Claude Opus 4.6 scores 76% at 1M while GPT-5.4 and Gemini 3 Pro score 36.6% and 24.5% respectively.
2026 年 3 月,三大前沿模型厂商终于都站到了 1M context window 的门槛上。本文横向对比 Google Gemini、Anthropic Claude、OpenAI 在长上下文能力上的实际表现,分析 1M 之后的真正差异在哪里。
深入分析 OpenAI Codex CLI 的架构设计,从 agent loop、sandbox 隔离、tool calling 到 streaming 实现,拆解一个生产级 AI agent 客户端的工程细节。
> Core Sources: OpenAI "Unrolling the Codex agent loop" (Michael Bolin, 2026-01), OpenAI "Unlocking the Codex harness" (Celia Chen, 2026-02), The Pragmatic Engineer "How Codex is built" (Gergely Orosz
阮一峰提出 AI 时代软件护城河从代码转向测试用例。本文从 Cloudflare 工程师复刻 Next.js 的 vinext 事件出发,分析这个论断的合理性与局限性。
In a recent issue of his weekly newsletter, Ruan Yifeng made a striking claim: in the AI era, the moat for software will shift from code to test cases. His core argument is that Cloudflare engineers s
Cursor 公开了内部评估体系 CursorBench。这不是学术 benchmark,而是从真实用户行为中提取的评估方法。本文深入分析其设计思路和对 AI coding 评估的启示。
从 OpenAI 的 Harness Engineering 到 Cursor 的 self-driving codebases,一个新的工程范式正在成型:人类的核心工作从写代码变成设计 AI agent 的工作环境。
The emergence of CursorBench has brought this question to the forefront. On March 11, 2026, Cursor published a blog post titled "How we compare model quality in Cursor," officially unveiling their int
> Core Sources: OpenAI "Harness engineering" (2026-02-11), Cursor "Towards self-driving codebases" (2026-02-05), Cursor "Scaling long-running autonomous coding" (2026-01-14)
免费/个人版几乎都会用你的数据训练模型,企业版几乎都不会——但「几乎」二字里藏着关键差异。本文对比各家 AI coding 工具的数据政策和永久授权条款。
Survey Date: March 9, 2026 | Methodology: 5 parallel librarian agent groups + cross-verification
程序有很多性质从语义层面一眼就能看出来,但编译器要形式化证明需要复杂分析甚至根本无法证明。本文探讨用 LLM 为编译器提供语义提示以辅助优化的可能性。
.site-nav{margin-bottom:1.5em;font-size:0.9em}.site-nav a{color:#0066cc;opacity:0.7;text-decoration:none}.site-nav a:hover{opacity:1}@media(prefers-color-scheme:dark){.site-nav a{color:#6db3f2}}