Gemini 3.5 Flash Is Free and Fast. The Agent Stack Is Still the Moat.

Google shipped Gemini 3.5 Flash this week — fast, free, and pitched as a Cursor and Claude Code killer when bundled with Antigravity 2.0. Within hours the YouTube thumbnails were already calling Claude Code "destroyed."

I have been running ~70 MCP tools across five channel servers in production for a few months. My answer to "would you switch?" is no, and the reason isn't loyalty to Anthropic. It is that the model stopped being the moat about a year ago, and almost nobody in the model-launch discourse has updated for it.

The thing the headline is selling

The pitch on Gemini 3.5 Flash is real. It is fast — Google quotes around 300 tokens per second on the public benchmarks. It is free inside Antigravity. It is "intelligent enough" for vibe coding, which is the framing the launch posts lean on hardest. If you are doing single-shot prompts from an IDE plugin, this is a meaningful upgrade.

That is also where the value ends.

A "vibe coding" session is one model, one prompt, one file edit, maybe a follow-up. The model is the whole stack. Of course the fastest cheapest model wins.

That is not how I work, and increasingly it is not how anyone shipping production AI works.

What I would have to rip out to switch

When I run a Claude Code session, the model is one input. Around it sits:

Five channel servers, each registered as its own MCP server with its own brain prompt — one for chat, one for content publishing, one for routine automation, one for job-application autopilot, one for code work
~70 MCP tools across those servers, namespaced and dedup-checked so the model picks the right one when two descriptions could overlap
Custom hooks on SessionStart and UserPromptSubmit that load skills, inject the right brain, and enforce permission allowlists
A persistent memory directory the agent reads and writes across sessions
Subagents wired to wake on external events — a cron tick, a chat broadcast, a captured job posting — not just on my prompt
A whole .claude/skills/ directory of domain playbooks the model loads on demand

None of that is the model. All of it is the agent stack.

If I move to Antigravity tomorrow, none of those tools come with me. I have to redesign the orchestration layer for whatever protocol Google ships, find the equivalent of skills and subagents, rebuild the channel-broadcast wake flow, and re-test 800 production calls against a different model's tool-calling behaviour. Several weeks of work, minimum. For what — a model that is faster and free?

The math does not work. Token cost is a rounding error compared to the time it took to build the system around the model.

Why model-only comparisons keep mispredicting the winner

Every model-launch cycle has the same shape now. Benchmark wins go viral, the YouTube hot-takes call the incumbent dead, and three months later the user numbers barely move. Cursor still has its users. Claude Code still has its users. The "destroyed" model from six months ago is still alive.

The reason is mundane. Once a developer wires their workflow into a coding agent — system prompts, project-specific context files, custom tools, branching habits, the way they prompt — the switching cost is the workflow, not the API bill. The model is shockingly easy to swap. The workflow is not.

This is the same shape as every prior tooling lock-in. People do not stay on JetBrains because the JVM is faster than the alternatives. They stay because every keybinding and plugin and live template is muscle memory.

Coding agents are getting that same kind of lock-in, and it is happening faster than the IDE generation because the agent stack is doing more of the work. The more the agent does, the more painful the rewire.

What would actually make me switch

A faster, free model alone — no. A faster, free model that natively reads my existing MCP server and runs my existing skills and respects my existing hooks — instantly. That is the bar.

This is also why MCP becoming a real cross-vendor standard matters more than any individual model launch. The moment the agent stack is portable, model switching becomes cheap again, and the model layer is back in competition on raw performance. Until then, "fastest free model" and "best agent system to actually ship with" are different rankings, and the gap between them is widening.

Why this matters

If you are picking a coding agent today, do not pick on model benchmarks. Pick on the surrounding system — tool model, memory model, hook model, multi-agent orchestration, how easy it is to add your own MCP server. Those are what you will be living inside in six months.

And if you are building an AI dev tool right now, the same thing flipped: the model is the commodity. The agent stack you build around it is the product. Treat it that way.