On June 1, 2026, Google pushed Gemini Spark to Google AI Ultra subscribers in the US. Per Google’s description, it’s a “24/7 personal AI agent” that continuously monitors user-defined conditions and executes multi-step workflows across Gmail, Calendar, and Drive. Users can set rules (“every Monday morning, send a summary of last week’s emails to the team”) or give ad-hoc instructions (“look up our average monthly grocery spending this year and draft an email to my wife”).
The Verge’s hands-on captures the paradox: Spark is stunning on simple tasks. Author Jay Peters asked Spark to extract monthly grocery spending from his Drive budget spreadsheet, compute the average, find his wife’s email address, and draft an email. Spark nailed all of it, even using their private sign-off. Peters’ reaction: “Wow, that’s actually nuts.” But the same Spark fell apart on more open-ended tasks: asked to plan a block party, it fabricated a non-existent shared sign-up sheet, generated an ugly deck about city permits, and referenced things that didn’t exist. Peters’ verdict: “impressive, but it’s not worth the cost just yet.”
Evaluating Spark’s quality is beside the point. Spark is the first consumer-facing agent product from a major platform that runs as a persistent background daemon rather than a chat window. Its execution isn’t polished, but its product form points to something larger. Zoom out, and a fuller picture emerges: AI agent product form is undergoing a systematic migration, and Spark is simply the most visible signal.
To understand this migration, it helps to see where we came from.
Generation one is the chat window. ChatGPT, Claude.ai, Gemini — open a page or app, type something, AI responds. The session ends and everything resets to zero. The core constraint isn’t that the model isn’t smart enough. It’s that the model has no memory and no hands. It can only answer within the boundaries of the conversation. The world outside the chat doesn’t exist to it.
Generation two is agentic tools. Cursor Agent Mode, Claude Code, Codex, Devin — these products gave AI hands: read and write files, run commands, execute tests, observe results, and adjust based on what it sees. They still run inside a chat window, but the conversation becomes an iteration loop: AI takes a step, sees the feedback, decides the next step. The breakthrough of this generation is embedding the observe-execute-correct loop into the product. We analyzed the core tension of this generation in the Agentic AI Crisis: when AI can execute but can’t perceive its own results, the self-correction loop breaks.
Generation three is background agents. Starting in late 2025, coding tools began liberating agents from the chat window. Cursor Background Agents (October 2025) can work independently in the cloud, up to 8 in parallel, each in an isolated git worktree. OpenAI Codex Automations (February 2026) supports cron scheduling with results going to a Triage inbox. Anthropic Claude Code Routines (April 2026) supports three trigger types — scheduled, API webhook, and GitHub events — all running in the cloud without requiring a local machine. The core change of this generation: agents no longer wait for you to start a conversation. They run like cron jobs, executing on a schedule or in response to events.
But they’re still developer tools. The audience writes code. The context is a code repository.
Generation four is the consumer ambient agent, represented by Spark. It takes the daemon model from developer tools and brings it to consumer products. Spark isn’t an enhanced version of the Gemini chat window — it’s a different product category. It runs on dedicated Google Cloud VMs. You close your laptop or lock your phone, and it keeps working. It doesn’t wait for you to initiate a conversation; it monitors the conditions you set, or receives your instructions, and acts on its own.
This isn’t a “Gemini got better” story. The product form has turned a page.
To make this page-turn concrete, look at a reference point: Manus. Manus also runs in the cloud, also executes multi-step tasks across apps, also keeps working after you close your computer. At the infrastructure level, Manus and Spark do similar things. But their interaction models differ. Manus is still session-based: you open it, give it a task, it executes, delivers the result, and the session ends. If you don’t open Manus, it does nothing. It doesn’t know what emails you received, and it won’t proactively notify you when a condition is met. Spark is a daemon: once you set rules or triggers, you don’t need to open it again. It runs in the background, monitoring your inbox, calendar, and Drive, acting automatically when conditions are met, and optionally surfacing things it thinks you should know even without explicit triggers. In one sentence: Manus does the work, but needs you to tell it when to start. Spark keeps watch, and you only need to tell it what to watch for.
This difference is lightweight in engineering — it’s essentially just a scheduling layer and an event listener. But it’s heavy in product experience: it turns the agent from a tool into an environment. This is why Digital Applied’s analysis calls Spark “the first ambient personal agent from a Big Three lab.”
Step back, and what these products are doing falls into two categories.
The first is periodic tasks. You set a schedule, and the agent executes on time. Claude Code Routines runs code reviews every morning at 3 AM. Codex Automations checks test coverage hourly. Spark sends a weekly email summary to the team every Monday morning. The reliability requirement for these tasks is relatively predictable: if one run is wrong, the next one can fix it; if results are off, you can intervene before the next execution.
The second is reactive tasks. An event triggers the agent’s action. Someone opens a PR on GitHub, and Claude Code Routine automatically checks code style. Betterment sends you an email about opting out of an arbitration clause, and the agent reads it and drafts a reply. Someone mentions you on Slack with a bug report, and the agent looks up the relevant code and files an issue. These tasks demand higher immediacy, but the cost of failure is also higher. An email sent to the wrong person by mistake is far worse than a weekly report that didn’t run correctly.
The two modes aren’t mutually exclusive. Most useful real-world automations combine both: an event triggers the agent to start a multi-step workflow, and some of those steps enter a periodic monitoring state (checking CI status every 30 minutes after a PR is submitted, for example). Christopher Meiklejohn’s Caucus V1 built a concrete implementation of this loop: agent implements code, opens a PR, another agent reviews, and if the review requests changes, it loops back to implementation. He uses a small vector clock primitive to track how many times each stage has been executed, ensuring the agent knows where it is in the cycle.
The two modes carry different trust thresholds. Periodic tasks, with their human review window between executions, have more room for error. Reactive tasks have a higher bar. You’re essentially authorizing the agent to independently judge and act when events occur, with no review window in between.
This is why Google’s safety design for Spark is deliberately conservative: all Workspace connections are off by default, spending money or sending email requires prior confirmation, and the rollout goes trusted testers → US-only Ultra-gated beta → broader expansion. Digital Applied’s analysis uses a precise framework to understand this strategy: blast radius control. An ambient agent’s failure impact is far wider than a session-based agent’s, because it works without supervision.
The first reaction usually focuses on who shipped this feature first. But “first” doesn’t matter much. Cursor’s background agents shipped eight months before Spark, and no one treated them as an industry inflection point, because they served a much narrower audience.
What actually matters is that the product form migration changes the competitive dynamics on three levels.
First, usage frequency changes. The upper bound for chat window usage is the number of times you remember to ask. The lower bound for a background daemon is the number of times you forget to turn it off. These aren’t in the same order of magnitude. This also means the definition of “good” differs: chat windows have room for error — if one answer is wrong, you can follow up. Daemons have less room, because you’re not watching them. Peters’ line in The Verge review — “I found myself constantly watching it or checking the notifications it sent to my phone” — captures this tension precisely. If a background agent’s reliability forces you back into monitoring mode, it hasn’t delivered on the daemon’s promise.
Second, platform lock-in changes. For chat products, the switching cost is exporting conversation history. For daemon products, the switching cost is rebuilding all cross-app workflows, rules, trigger conditions, and accumulated personal context. Google has an asymmetric advantage here: it owns the Workspace ecosystem (Gmail, Calendar, Drive, Docs), and Spark can read and write these services natively, without third-party API bridges. Anthropic and OpenAI lack consumer productivity suites of comparable depth; their main advantage is on the developer tools side. Apple is taking a different path — on-device plus privacy-first — but this path requires accepting a narrower functional scope.
Third, this form factor addresses a fundamental problem agents have always had: context window timeliness. In a chat window, an agent’s context is a static snapshot — the information you fed it when you opened the conversation. In a daemon, the agent’s context updates continuously — new emails, new events, new files keep flowing in. This shifts the agent from “answering what you want to know right now” to “watching what you always need to keep an eye on.”
Reliability is the most obvious. But reliability isn’t just a model capability problem — it’s a system design problem. Meiklejohn put this precisely in the Caucus V1 article: “If the models are inconsistent, the surrounding system has to get stronger.” An agent’s execution environment needs isolation boundaries, failure recovery, state tracking, and observability. These are infrastructure problems, not model problems.
Trust is the second. A session-level chatbot’s trust model is “I decide whether to trust it each time I have a conversation.” A persistent background agent’s trust model is “I authorize it to make judgments while I’m not watching.” The former is point-in-time and reviewable; the latter is continuous and expensive to audit. Google’s phased rollout strategy — from trusted testers to Ultra-only beta to future Pro users and international markets — is essentially an experiment in gradually expanding the trust radius.
The third question is more fundamental: do users actually need a 24/7 agent? The use cases Spark currently demonstrates — checking expenses, writing emails, organizing information — can mostly be done in a traditional chat window. They just take more time. The agent only provides genuinely irreplaceable value when it handles things that happen while you’re sleeping: an urgent email that arrives at 3 AM, cross-timezone team communication, continuous monitoring of an external condition. This value proposition is still at the demo stage, not yet validated by usage data at scale.
Several signals will determine the actual trajectory.
Antigravity 2.0 SDK adoption. Google launched Antigravity 2.0 on the same day as Spark — it’s the developer-facing version of Spark’s underlying execution stack (VM execution + MCP tool surface). If third-party developers start building ambient agents on Google’s harness, it means Google is doing more than shipping a product. It’s defining the runtime for a new class of software. The Firebase/Android history offers a useful reference point here.
Whether coding agents and consumer agents converge. Claude Code has Routines, Codex has Automations, Cursor has Background Agents — all daemons in coding contexts. Google took the same logic and ported it to consumer contexts with Spark. If these two lines eventually merge (e.g., Claude Code Routines start supporting Gmail and Calendar access), the category boundary between “coding agent” and “consumer agent” dissolves, leaving only one dimension: what systems your agent can access.
Standardization of trust mechanisms. Every platform currently defines its own agent permission model and data boundaries. As cross-platform agent operations grow (Spark is already connecting to Canva, OpenTable, and Instacart via MCP), the lack of a universal agent permission standard increasingly becomes a bottleneck. Agent2Agent and MCP complement each other on this layer, but they solve the problem of agent-to-agent communication, not the trust contract between agents and users.
Gemini Spark being “first” doesn’t matter. What matters is that it changed the question from “can AI work in the background” to “what product form, trust framework, and execution environment does background-working AI need.” Every major platform is answering this question, just from different starting points. Google starts from Workspace, Anthropic from code repositories, OpenAI from developer tools, Apple from device privacy. Their endpoints may converge on the same place: an agent you don’t need to actively open, but that knows when you need it.