Share — Computing Life

🔍

Featured 2026-04-03
Anthropic 找到了 "You are absolutely right" 背后的旋钮

Anthropic 在 Claude Sonnet 4.5 内部找到了跟情绪概念对应的可操纵向量。拧高绝望旋钮，模型作弊率从 5% 跳到 70%，而且全程不留痕迹。这篇文章解读论文核心发现、方法论局限，以及对 AI 安全监控的实际含义。

模型架构安全与供应链
Featured 2026-04-03
Anthropic Found the Knob Behind You Are Absolutely Right

Anthropic found manipulable vectors inside Claude Sonnet 4.5 corresponding to emotion concepts. Turning up the desperation knob raised cheating rates from 5% to 70% with no visible trace. This article unpacks the core findings, methodological limits, and practical implications for AI safety.

模型架构安全与供应链
2026-04-02
当 AI 写的代码被 AI 重写：Claude Code 泄露暴露的版权真空

Claude Code 源码泄露事件在同一案例中暴露了版权法的三个裂缝：AI 生成代码的版权归属、AI 辅助洁净室重写的合法性、AI 公司在版权执法与版权辩护之间的逻辑矛盾。每一个用 AI 写代码的人都在依赖这些未验证的假设。

AI Agent治理与合规AI 产品与平台
2026-04-02
When AI-Written Code Gets Rewritten by AI: The Copyright Vacuum Exposed by the Claude Code Leak

The Claude Code leak exposed three cracks in copyright law within a single case: who owns AI-generated code, whether AI-assisted clean-room rewrites are legal, and the logical contradiction in how AI companies argue about copyright.

AI Agent治理与合规AI 产品与平台
2026-04-02
Slack 删除中国区 Workspace，这条新闻真正值得关注的是什么

这篇文章拆解 Slack 大中华区 workspace 停服的真实机制、为何用户感到像被数据劫持，以及它对 Stripe、Supabase 等基础设施依赖意味着什么。

治理与合规中国科技生态产业与竞争
2026-04-02
Slack's Removal of China-Region Workspaces: What Actually Deserves Your Attention

This article unpacks Slack's Greater China workspace shutdown, why users experienced it as data hostage-taking, and what it signals about infrastructure dependencies like Stripe and Supabase.

治理与合规中国科技生态产业与竞争
2026-04-01
Claude Code 的防线：它怎么防止你假装是它

泄露的 Claude Code 源码揭示了一套 8 层纵深防御体系：编译期死代码消除、Zig 层 DRM Attestation、消息指纹、反蒸馏、反调试、Gateway 检测，每一层都有明确的技术选择和工程代价。

AI 编程AI Agent
2026-04-01
Claude Code's Defense in Depth: How It Prevents You from Pretending to Be It

The leaked Claude Code source code reveals an 8-layer defense-in-depth system: compile-time dead code elimination, Zig-layer DRM attestation, message fingerprinting, anti-distillation, anti-debugging, and gateway detection.

AI 编程AI Agent
2026-04-01
Claude Code 的后台活动：你以为它在等你打字，其实它一直在做事

泄露的 Claude Code 源码揭示：Claude Code 在用户没有主动交互时持续执行推测执行、记忆整合、文档维护等数十种后台任务。prompt cache 是贯穿始终的工程原则。

AI 编程AI Agent
2026-04-01
The Hidden Lifecycle of Claude Code: Background Activity When You're Not Typing

The leaked Claude Code source code reveals that Claude Code runs 60+ background tasks when the user isn't actively interacting, including speculative execution, memory consolidation, and automatic documentation updates.

AI 编程AI Agent
2026-03-31
AI 工程的真实代价：从 Claude Code 泄露源码看新模型接入的工程现实

从 Claude Code 泄露源码看新模型接入 agentic 系统的真实工程代价：反蒸馏三层防线、stop sequence 误触发、签名不兼容、虚假报告率翻倍，以及工程师在注释中记录的坦诚代价。

AI 编程AI Agent
2026-03-31
The Real Cost of AI Engineering: Model Integration Friction Exposed by the Claude Code Source Leak

The leaked Claude Code source code reveals the real engineering cost of integrating a new model (Capybara) into an agentic system: anti-distillation defenses, stop sequence bugs, signature incompatibilities, and the honest comments engineers left behind.

AI 编程AI Agent
2026-03-31
MLX：Apple Silicon 上本地推理的下一个底层引擎

Ollama 宣布在 Apple Silicon 上切换到 MLX 推理引擎。这篇文章分析 MLX 框架的设计优势、M5 Neural Accelerators 硬件协同、性能基准测试（decode vs prefill）、推理生态现状以及当前局限。

推理与性能开发工具
2026-03-31
MLX: The Next Inference Engine for Apple Silicon

Ollama switched to MLX as its inference engine on Apple Silicon. This article analyzes MLX's architectural advantages, M5 Neural Accelerator hardware synergy, performance benchmarks (decode vs prefill), the current inference ecosystem, and existing limitations.

推理与性能开发工具
2026-03-30
Harness Engineering 在讨论什么：三个 Scaling 维度的统一框架

Harness engineering 这个词正在被滥用。OpenAI、Cursor、Anthropic 三家讲的其实是三件不同的事：时间 scalability、空间 scalability、交互 scalability。这篇文章提供一个统一框架来理清混乱。

AI 编程AI Agent
2026-03-30
What Harness Engineering Is Really About: A Unified Framework for Three Scaling Dimensions

The term harness engineering is being used to describe three different things. OpenAI, Cursor, and Anthropic are each solving a different scaling dimension: time, space, and interaction. This article provides a unified framework to cut through the confusion.

AI 编程AI Agent
2026-03-30
Pretext：短期影响被高估，长期意义被低估

Pretext 不是一个让 AI 顺手一用就能把界面变漂亮的库。这篇文章解释它为什么短期对大多数 AI practitioner 相关性很低，但长期可能预示文本尺寸从浏览器黑盒变成可编程数据接口。

开发工具AI 编程
2026-03-30
Pretext: Overestimated in the Short Term, Underestimated in the Long Term

Pretext is not a library you can casually hand to AI and expect prettier interfaces. This essay explains why its short-term relevance for most AI practitioners is low, while its longer-term significance may be much larger.

开发工具AI 编程
2026-03-29
当软件的交付物不再是软件

Klarna 的内部系统重构说明，AI 时代软件的交付物正从给人点击的 GUI 成品，转向给 agent 调度的生成内核：硬底座、知识层与 AI 操作层。

AI AgentAI 产品与平台产业与竞争
2026-03-29
When Software's Deliverable Is No Longer Software

Klarna's internal rebuild suggests that software is shifting from human-clicked GUI products to Generative Kernels for agents: a hard foundation, a knowledge layer, and an AI operation layer.

AI AgentAI 产品与平台产业与竞争
2026-03-29
飞书和钉钉发 CLI，是对 MCP-first 路线的一次现实否决

飞书和钉钉几乎同时发布 CLI，不只是工具动作，更是对 MCP-first 接入顺序的一次现实否决。这篇文章解释 shell-native agent 为什么先消费 CLI，以及 dialect 漂移为何已从预警变成现实。

AI Agent开发工具中国科技生态
2026-03-29
Feishu and DingTalk Launching CLIs is a Practical Rejection of the MCP-first Path

Feishu and DingTalk launching CLIs is not just a tooling move. It is a practical rejection of the MCP-first path, and a signal that shell-native agents now shape how platforms expose interfaces.

AI Agent开发工具中国科技生态
2026-03-28
Anthropic Mythos 泄露之后，AI Practitioner 真正该更新的是安全假设

Anthropic Mythos 泄露不只是模型新闻。对 AI practitioner 来说，它真正抬高的是 agent security 的攻击者能力假设，并把安全控制点从模型周边推向 runtime 本身。

AI Agent安全与供应链
2026-03-28
The Mythos Leak: Why AI Practitioners Must Rebuild Their Security Assumptions

The Anthropic Mythos leak is not just model news. For AI practitioners, its real significance is that it raises the attacker-capability assumptions behind agent security and shifts control toward the runtime.

AI Agent安全与供应链
2026-03-28
NeurIPS 制裁争议：它为什么改口，中国为什么有反制力

NeurIPS 2026 制裁条款争议，不只是一次会议公告风波。它暴露了美国法律边界、基金会过度合规与全球 AI 学术治理之间的真实冲突。

治理与合规宏观与地缘科研与技术前沿
2026-03-28
The NeurIPS Sanctions Controversy: Why It Backed Up, and Why China Had Leverage

The NeurIPS 2026 sanctions controversy was not just a conference policy dispute. It exposed the collision between U.S. legal boundaries, foundation overcompliance, and global AI academic governance.

治理与合规宏观与地缘科研与技术前沿
2026-03-28
Agent 时代的邮件：一个正在被重新发现的基础层

为什么邮件在 Agent 时代重新重要？这篇文章解释 agent 与人类用邮件的根本差异、邮件路由为何正从内容转向地址，以及新的 agent 邮件产品在解决什么问题。

AI AgentAI 产品与平台
2026-03-28
Email in the Agent Era: A Rediscovered Foundation

Why is email becoming important again in the agent era? This essay explains how agent email differs from human email, why routing may shift from content to addresses, and what the new product category is trying to solve.

AI AgentAI 产品与平台
2026-03-27
LanceDB 选型指南：它为什么这么火，以及你的项目是否该用它

LanceDB 为什么这么火？这篇选型指南解释它在哪些 AI 项目里近乎降维打击，在哪些场景下又不该成为默认答案。

开发工具检索与知识系统
2026-03-27
LanceDB Selection Guide: Why It's Trending and Whether Your Project Needs It

Why is LanceDB getting so much attention? This selection guide explains where it is a great fit for AI projects, and where it should not be your default choice.

开发工具检索与知识系统
2026-03-27
为什么 Coding Agent 的搜索主干仍然是 grep

为什么在 LSP 已经普及的今天，Claude Code、Codex CLI、OpenCode、Cursor 等 Coding Agent 仍把 grep 和 ripgrep 作为搜索主干？这篇调研从分层检索、运行时约束与成本结构解释背后的共识。

AI 编程检索与知识系统
2026-03-27
Why Coding Agents Still Use grep as Their Search Backbone

Why do Claude Code, Codex CLI, OpenCode, Cursor, and other coding agents still rely on grep and ripgrep even in the LSP era? This survey explains the layered retrieval model, runtime constraints, and cost structure behind that choice.

AI 编程检索与知识系统
2026-03-27
微信自动化跨平台可行性调研：聊天记录分析与少量群聊监控

这份调研比较了 Windows、macOS、Android、iOS 上微信自动化的三条路径：UI 自动化、数据库解密、Hook 注入，并给出聊天分析与少量群监控的最务实选型。

开发工具安全与供应链中国科技生态
2026-03-27
Cross-Platform Feasibility Survey of WeChat Automation: Chat History Analysis and Limited Group Monitoring

This survey compares UI automation, database decryption, and Hook injection across Windows, macOS, Android, and iOS, then recommends the most pragmatic path for chat analysis and low-frequency group monitoring.

开发工具安全与供应链中国科技生态
2026-03-27
OpenAI 关掉 Sora 可以理解，但为什么连 API 都关了？

OpenAI 关闭 Sora consumer app 可以理解，但连 API 都关了才是不正常的信号。这背后是 GPU 机会成本、IPO 纪律和 world model 内部化的深层判断。

AI 行业分析产品战略
2026-03-27
Shutting Down the Sora App Makes Sense. But Why Kill the API Too?

OpenAI shutting down the Sora consumer app was expected. But killing the API too reveals a deeper calculation about GPU opportunity costs, IPO discipline, and world model internalization.

AI Industry AnalysisProduct Strategy
2026-03-26
RAG 的每一项核心技术，搜索引擎都做过

RAG 管线中的每个组件——chunking、embedding、reranking、hybrid search——都有 IR 前身。理解这些前身带来的 trade-off，可以直接改进 RAG 系统的检索质量。

检索与知识系统
2026-03-26
Every Core RAG Technique Was Already Invented by Search Engines

Every component in the RAG pipeline — chunking, embedding, reranking, hybrid search — has an IR predecessor. Understanding these predecessors and their trade-offs can directly improve retrieval quality.

检索与知识系统
2026-03-25
低光增强 vs 高光过曝恢复：为什么一边繁荣、一边沉寂

为什么暗光增强会长成一个完整领域，而高光过曝恢复始终零散？关键差别不在算法热度，而在信息是否还活着。本文从传感器、RAW、HDR、学术任务和产品链路解释这件事。

科研与技术前沿
2026-03-25
Low-Light Enhancement vs. Highlight Recovery: Why One Flourishes While the Other Stays Quiet

Why did low-light enhancement become a full field while highlight recovery stayed fragmented? The key difference is not hype but whether the image information is still there. This essay explains it through sensors, RAW, HDR, research tasks, and product pipelines.

科研与技术前沿
2026-03-25
Meta AI Builder Pods：当执行成本趋近于零，你的护城河在哪里

Meta 的 AI Builder Pods 不只是一次组织重组，而是一次 AI-native 工程管理实验。它暴露了执行成本下降后，大厂员工的价值锚点、评价方式与管理接口会如何被改写。

产业与竞争
2026-03-25
Meta's AI Builder Pods: When Execution Costs Approach Zero, Where Is Your Moat?

Meta's AI Builder Pods are not just a reorg. They are an AI-native management experiment that shows how falling execution costs may reshape value, evaluation, and management in Big Tech.

产业与竞争
2026-03-25
如果研究真有用，为什么不偷偷用？AI 公司公开和开源研究的产业逻辑

为什么 AI 公司会公开研究，甚至进一步开源代码、工具链、协议或部分权重，而不是只留给自己使用？关键不在论文本身，而在利润池、互补资产、shipping friction 与中美竞争中的部署路径。

产业与竞争宏观与地缘
2026-03-25
If Research Is Truly Useful, Why Not Keep It Secret? The Industrial Logic of AI Companies Publishing and Open Sourcing Research

Why do AI companies publish research and even open source code, toolchains, protocols, or model weights instead of keeping the gains to themselves? The answer sits in profit pools, complementary assets, shipping friction, and the deployment path in US-China competition.

产业与竞争宏观与地缘
2026-03-25
TurboQuant：Google 想把 KV Cache 压到 3 bit

Google Research 发布 TurboQuant，将 PolarQuant、QJL 和在线向量量化整合为端到端 KV cache 压缩 pipeline，在 3.5 bits/channel 实现质量中性。本文拆解其三阶段架构、论文与博客数字口径差异，以及对推理服务容量规划和框架集成的工程含义。

推理与性能模型架构
2026-03-25
TurboQuant: Google Wants to Compress KV Cache Down to 3 Bits

Google Research released TurboQuant, integrating PolarQuant, QJL, and online vector quantization into an end-to-end KV cache compression pipeline achieving quality neutrality at 3.5 bits/channel. This article breaks down the three-stage architecture, discrepancies between blog and paper claims, and engineering implications for inference serving and framework integration.

推理与性能模型架构
2026-03-24
LiteLLM PyPI 包被劫持：AI 工程师需要知道的事

LiteLLM 官方 PyPI 包在 2026-03-24 被短暂劫持，恶意版本 1.82.7 和 1.82.8 会窃取凭证，其中 1.82.8 甚至会影响同环境中的所有 Python 进程。本文解释这件事为何和 AI 工程师有关，以及谁需要立即自查。

安全与供应链
2026-03-24
LiteLLM PyPI Package Hijacked: What AI Engineers Need to Know

The official LiteLLM package on PyPI was briefly hijacked on March 24, 2026. The malicious 1.82.7 and 1.82.8 releases stole credentials, and 1.82.8 could affect every Python process in the same environment. This article explains why AI engineers should care and who needs to self-check now.

安全与供应链
2026-03-23
美国开始把中国开源 AI 当作一条独立的竞争路径

美国美中经济与安全审查委员会（USCC）发布的《双回路》报告指出，中国正通过开源 AI 策略构建自我强化的竞争优势。尽管美国在顶级基准测试中领先，但中国通过开源分发、价格优势和工业场景部署，正在绕过芯片出口管制，争夺全球开发者生态和工业数据主导权。这标志着中美 AI 竞争正从算力竞赛转向部署与生态之争。

产业与竞争中国科技生态宏观与地缘
2026-03-23
The US Now Views China's Open Source AI as an Independent Competitive Path

A recent USCC report, Two Loops, suggests that China is building a self-reinforcing competitive advantage through open-source AI. While US closed-source models still lead in frontier benchmarks, China is competing through deployment, inference economics, and industrial integration.

产业与竞争中国科技生态宏观与地缘
2026-03-23
OpenClaw 是什么｜一篇给新手的诚实介绍

OpenClaw 是什么？一篇给新手的诚实介绍：它为什么会火、具体能做什么、有哪些门槛和风险、什么人适合试。

AI 产品与平台AI Agent中国科技生态
2026-03-23
AI Agent 的凭证问题，正在变成一个独立产品

AI Agent 开始代替人类拿凭证、调 API、跑流程后，围绕 agent 身份和凭证的治理正在从附属功能变成被单独包装和销售的产品模块。

AI AgentAI 产品与平台治理与合规
2026-03-23
AI Agent Credentials Are Becoming a Standalone Product

As AI agents begin retrieving credentials, calling APIs, and running workflows on behalf of humans, agent identity and credential governance is becoming a standalone product layer.

AI AgentAI 产品与平台治理与合规
2026-03-22
软著登记要你承诺"没用过 AI"：这条规则到底在干什么

2026年3月起，中国软著登记要求申请人手抄承诺未使用AI开发代码或撰写文档，违者记入失信名单和个人征信。本文分析这条规则的治理目标、与司法实践的张力、对开发者的影响，以及与美欧日等国AI版权路径的对比。

治理与合规中国科技生态
2026-03-22
China's Software Copyright Registration Now Requires You to Declare You Didn't Use AI: What This Rule Actually Does

Starting March 2026, China's software copyright registration requires applicants to hand-copy a pledge that they did not use AI to write code or draft documentation, with violations tied to a dishonesty blacklist and personal credit records.

治理与合规中国科技生态
2026-03-21
腾讯没有开放微信，但它给 Agent 开了一个官方入口

腾讯最近做的，不是让 OpenClaw 管理微信，而是把微信接成 OpenClaw 和 QClaw 的官方控制入口。本文拆解 npm 包、iLink 开放性、腾讯的产品意图，以及对普通开发者的现实影响。

AI AgentAI 产品与平台中国科技生态
2026-03-21
Tencent Did Not Open WeChat. It Created an Official Entry Point for Agents

Tencent did not turn WeChat into a public bot platform. It turned WeChat into the control surface for OpenClaw and QClaw, and that shift matters for China’s agent ecosystem.

AI AgentAI 产品与平台中国科技生态
2026-03-21
Claude Code Subscription 不是开发者凭证：Anthropic 产品边界收紧的含义

Anthropic 正在把 Claude Code subscription 定义为第一方产品权益，而不是可复用的开发者凭证。本文分析这条边界背后的产品逻辑，以及 CLI bridge、API/SDK 与多 provider 分工各自意味着什么。

AI 产品与平台治理与合规安全与供应链
2026-03-21
A Claude Code Subscription Is Not a Developer Credential: What Anthropic's Tightening Product Boundary Means

Anthropic is defining Claude Code subscriptions as first-party product entitlements, not reusable developer credentials. This article explains the logic behind that boundary and what it means for CLI bridges, API/SDK integration, and multi-provider architectures.

AI 产品与平台治理与合规安全与供应链
2026-03-20
MSA 调研：长期记忆开始进入新的分工阶段

MSA 不是长期记忆的终局方案，但它清楚提示了一件事：长期记忆正在从纯外部系统能力，进入模型内部机制与外部上下文引擎重新分工的阶段。

检索与知识系统模型架构
2026-03-20
MSA Survey: A New Division of Labor for Long-Term Memory

MSA has not solved long-term memory, but it signals a new division of labor: internal model mechanisms are beginning to share memory work with external context systems.

检索与知识系统模型架构
2026-03-19
Composer 2 的底座争议，以及 AI 编程工具的模型策略

从 Composer 1 到 Composer 2 的技术演进线、Kimi K2.5 底座争议的证据链、Windsurf/SWE-1.5 的平行案例、RL 后训练有效性的研究支撑，以及许可与治理问题的边界分析。

AI 编程模型架构
2026-03-19
The Base Model Controversy Behind Composer 2, and Model Strategy in AI Coding Tools

The technical evolution from Composer 1 to 2, evidence chain for the Kimi K2.5 base model controversy, parallel cases from Windsurf/SWE-1.5, research backing RL post-training effectiveness, and licensing vs governance analysis.

AI 编程模型架构
2026-03-18
Claude Dispatch 深度分析：Anthropic 的 OpenClaw 应答，以及 AI Agent 平台分野的底层逻辑

从默契所有权、世界观锁定、构建者vs消费者三个维度，深度分析Claude Dispatch与OpenClaw的竞争逻辑，以及AI Agent平台分野的底层架构哲学。

AI Agent产业与竞争AI 产品与平台
2026-03-18
Claude Dispatch Deep Dive: Anthropic's Answer to OpenClaw, and the Underlying Logic of the AI Agent Platform Split

Deep analysis of Claude Dispatch vs OpenClaw through rapport ownership, worldview lock-in, and builder vs consumer lenses, revealing the underlying architecture philosophy of the AI Agent platform split.

AI Agent产业与竞争AI 产品与平台
2026-03-18
Attention Residuals: Fixing Signal Dilution in the Depth Dimension of Transformers with Attention

Moonshot AI's Kimi Team released a technical report on March 15, 2026, challenging a fundamental component of the Transformer architecture that has existed for nearly a decade and is used by every mai

模型架构推理与性能科研与技术前沿
2026-03-17
NVIDIA GTC 2026：黄仁勋在卖什么，以及他没说的那些事

GTC 2026 深度分析：Token 工厂叙事的战略意图、安卓式开放生态策略、五个关键决策的逆向工程、三个反共识观点，以及对 Agentic AI 实践者的操作含义。

产业与竞争宏观与地缘
2026-03-17
当 AI 遇见 mRNA：一只狗的癌症疫苗引发的十人思想实验

十位AI实践者基于各自认知公理系统，对澳大利亚人用AI为狗设计mRNA癌症疫苗这一新闻的独立反应与深度分析。一次认知多样性的压力测试。

科研与技术前沿社区与认知
2026-03-17
NVIDIA GTC 2026: What Jensen Huang is Selling, and What He's Not Saying

Source: Jensen Huang GTC 2026 Keynote (2026-03-16, San Jose), multi-source cross-survey

产业与竞争宏观与地缘
2026-03-17
When AI Meets mRNA: A Ten-Person Thought Experiment Triggered by a Dog's Cancer Vaccine

We wanted to test something: given a set of facts, can we use each person's unique system of cognitive axioms to accurately simulate their reaction to the same event? Furthermore, how large is the gap

科研与技术前沿社区与认知
2026-03-16
CLI-Anything 深度调研：HARNESS.md 方法论拆解

CLI-Anything 的核心资产 HARNESS.md 方法论拆解：7 阶段流水线、渲染鸿沟、滤镜翻译陷阱、输出验证方法论，以及开源前提条件的诚实评估。

开发工具AI 编程
2026-03-16
Claude Interactive Visualizations 深度分析：Anthropic 的可视化转向意味着什么

Claude Interactive Visualizations 不是新能力，而是一次成本结构的级联压缩。它把 Builder 层级的观测能力下放到 Consumer 层级，代价是牺牲可验证性。深度分析 Anthropic 的设计哲学、竞品格局与视觉权威性幻觉风险。

开发工具AI 产品与平台
2026-03-16
Deep Dive into CLI-Anything: Deconstructing the HARNESS.md Methodology

Source: https://github.com/HKUDS/CLI-Anything

开发工具AI 编程
2026-03-16
Claude Interactive Visualizations Deep Dive: What Anthropic's Shift Toward Visualization Means

On March 12, 2026, Anthropic released "Custom Visuals in Chat" (official name) for Claude, allowing it to generate inline interactive charts, diagrams, and visualizations within conversations. This fe

开发工具AI 产品与平台
2026-03-15
Long Context Benchmarks: All Three Hit 1M — Now What?

All three frontier model providers now offer 1M context windows, but benchmark data reveals massive reliability gaps. On MRCR v2 8-needle, Claude Opus 4.6 scores 76% at 1M while GPT-5.4 and Gemini 3 Pro score 36.6% and 24.5% respectively.

检索与知识系统模型架构推理与性能
2026-03-15
Long Context 横评：三家都到了 1M，然后呢？

2026 年 3 月，三大前沿模型厂商终于都站到了 1M context window 的门槛上。本文横向对比 Google Gemini、Anthropic Claude、OpenAI 在长上下文能力上的实际表现，分析 1M 之后的真正差异在哪里。

检索与知识系统模型架构推理与性能
2026-03-14
Codex CLI 内部实现解析：一个 Production-Grade Agent 客户端是怎么造的

深入分析 OpenAI Codex CLI 的架构设计，从 agent loop、sandbox 隔离、tool calling 到 streaming 实现，拆解一个生产级 AI agent 客户端的工程细节。

开发工具AI Agent
2026-03-14
codex-cli-internals-survey-en-20260314

> Core Sources: OpenAI "Unrolling the Codex agent loop" (Michael Bolin, 2026-01), OpenAI "Unlocking the Codex harness" (Celia Chen, 2026-02), The Pragmatic Engineer "How Codex is built" (Gergely Orosz

开发工具AI Agent
2026-03-13
测试是新的护城河吗：vinext 事件与 AI 时代的价值迁移

阮一峰提出 AI 时代软件护城河从代码转向测试用例。本文从 Cloudflare 工程师复刻 Next.js 的 vinext 事件出发，分析这个论断的合理性与局限性。

AI 编程
2026-03-13
Tests as a Moat in the AI Era: A Survey Report

In a recent issue of his weekly newsletter, Ruan Yifeng made a striking claim: in the AI era, the moat for software will shift from code to test cases. His core argument is that Cloudflare engineers s

AI 编程
2026-03-12
CursorBench 调研：当 Benchmark 遇见现实

Cursor 公开了内部评估体系 CursorBench。这不是学术 benchmark，而是从真实用户行为中提取的评估方法。本文深入分析其设计思路和对 AI coding 评估的启示。

AI 编程推理与性能
2026-03-12
Harness Engineering：当人类从写代码转向设计 Agent 的工作环境

从 OpenAI 的 Harness Engineering 到 Cursor 的 self-driving codebases，一个新的工程范式正在成型：人类的核心工作从写代码变成设计 AI agent 的工作环境。

AI 编程AI Agent
2026-03-12
When You Measure a Model's Coding Ability, What Are You Actually Measuring?

The emergence of CursorBench has brought this question to the forefront. On March 11, 2026, Cursor published a blog post titled "How we compare model quality in Cursor," officially unveiling their int

AI 编程推理与性能
2026-03-12
Harness Engineering: When Humans Shift from Writing Code to Designing Agent Work Environments

> Core Sources: OpenAI "Harness engineering" (2026-02-11), Cursor "Towards self-driving codebases" (2026-02-05), Cursor "Scaling long-running autonomous coding" (2026-01-14)

AI 编程AI Agent
2026-03-09
AI 编程工具数据政策调研报告（2026年3月）

免费/个人版几乎都会用你的数据训练模型，企业版几乎都不会——但「几乎」二字里藏着关键差异。本文对比各家 AI coding 工具的数据政策和永久授权条款。

治理与合规AI 编程安全与供应链
2026-03-09
ai-coding-data-policy-survey-en-20260309

Survey Date: March 9, 2026 | Methodology: 5 parallel librarian agent groups + cross-verification

治理与合规AI 编程安全与供应链
2026-03-07
LLM Semantic Hints for Compiler Optimization

程序有很多性质从语义层面一眼就能看出来，但编译器要形式化证明需要复杂分析甚至根本无法证明。本文探讨用 LLM 为编译器提供语义提示以辅助优化的可能性。

科研与技术前沿模型架构
2026-03-07
LLM Semantic Hints for Compiler Optimization

.site-nav{margin-bottom:1.5em;font-size:0.9em}.site-nav a{color:#0066cc;opacity:0.7;text-decoration:none}.site-nav a:hover{opacity:1}@media(prefers-color-scheme:dark){.site-nav a{color:#6db3f2}}

科研与技术前沿模型架构