Antigravity CLI First Impressions: Fast, Rough, Not Ready

Google has officially replaced Gemini CLI with the new Antigravity CLI and launched it alongside Gemini 3.5 Flash, which became the default model for the new CLI experience. That made the launch more than a simple rebrand: it was also the first real chance to see whether Google’s new default coding-agent stack actually felt better in practice

To test that, I ran Ship Bench, the real-coding-workflow benchmark I built to evaluate how models and coding agents behave in practical development tasks, rather than toy prompts or isolated code snippets. This was not a full benchmark write-up; it was a quick first-pass meant to capture what it felt like to use Antigravity CLI as a working developer tool while exercising a realistic repo workflow through Ship Bench.

What I tested

I used Antigravity CLI on Windows in the context of a Ship Bench run, which meant the CLI was being pushed through a practical coding loop rather than a curated demo. The goal was not just to test whether the agent could answer prompts, but whether it could survive the kind of environment, permissions, command execution, and iteration flow that real coding work demands.

I could have switched to Gemini Pro, but I intentionally stayed on Gemini 3.5 Flash, as the default model I wanted to see whether its promised speed and quota efficiency would make it the better everyday option. In other words, the test was really: Can it extend my usage quote over using a larger/pro model with the same quality?

First-run impressions

The first impression was mixed. Gemini 3.5 Flash is genuinely fast, and the agent feels quick and responsive, but the surrounding CLI experience was rough enough that it overshadowed most of the upside.

On the first development iteration, the model decided it wanted a different (older) Node version and used nvm to install an older one. After that, it seemed to lose track of Node on PATH entirely. It tried to recover, failed to reload the environment cleanly, then started searching the file system for node.exe and dynamically re-adding that location to PATH on each command run. That behavior appears to be what triggered repeated permission prompts on every command. A pretty miserable experience. Once the terminal and CLI were restarted, that specific problem cleared up and normal command execution returned.

That first issue felt like a bad transient state rather than the main product problem. The more important issue showed up in a more normal run: Antigravity CLI would not remember conversation-scoped permission grants, even after they had already been approved. That made the workflow feel fragmented and repetitive, because the tool kept asking for approval where the session context suggested it should already know the answer. Frustrating.

The handling of background commands also felt awkward. When running npm work in the background, the CLI shifted into a wait-timer style interaction instead of just naturally waiting on the task, which made the orchestration feel more mechanical than smooth. I suppose it could end up beneficial when it decides to run parallel tool calls.

I also hit the same class of Windows terminal issues other users have been discussing, including hanging or inconsistent command execution behavior in terminal sessions in addition to terminal resize issues. Basic expectations, really. On Windows especially, the CLI still feels unstable enough that the shell layer becomes part of the story rather than disappearing into the background.

Flash versus the CLI

One important distinction is that not every failure belonged to the CLI itself. For example, failing to add a .gitignore file on the first run feels more like a Gemini 3.5 Flash planning/execution miss than a shell-wrapper problem.

In that sense, the experience split into two separate judgments. Gemini 3.5 Flash felt fast and promising as a coding model, but Antigravity CLI felt rough as the environment wrapped around it. The difficult part is that, from a user perspective, those layers blur together fast when the default workflow is what you are actually evaluating.

Quota and value

The quota behavior ended up being the biggest practical negative. I chose to test Gemini 3.5 Flash specifically because it was the new default and because one of the appealing ideas behind Flash was that it could extend usable quota while still feeling fast enough for real work. Instead, I burned through quota shockingly quickly.

In practice, I could not get through even two meaningful iterations before hitting quota limits, and at one point the interface reported about 20% quota remaining while still refusing to continue. That mismatch made the product feel unreliable in exactly the area where a coding agent has to be predictable. In contrast, I was able to complete a seven iteration run with Claude Code Sonnet within its 5-hour quota, which made Antigravity’s current usage story feel much worse by comparison.

That is probably the biggest reason this left such a negative impression. Google AI Pro had started to look like one of the better-value options in the coding-agent space, but if the default Antigravity CLI plus Gemini 3.5 Flash path burns quota this fast while also failing to carry work forward smoothly, the value proposition drops hard.

Recommendation

Right now, the fairest read is that Antigravity CLI ships with a promising engine but an unstable developer experience. Gemini 3.5 Flash is fast enough to make the launch interesting, but the combination of permission persistence problems, Windows terminal roughness, odd environment recovery behavior, and unexpectedly harsh quota limits makes the overall package hard to recommend.

For a quick Ship Bench-driven first impression, this lands as a strong “not recommended” for me. The model may be improving, but the CLI needs to stabilize before it feels like a real replacement for the more mature Gemini CLI experience.

Antigravity CLI First Impressions: Fast, Rough, and Not Ready

What I tested

First-run impressions

Flash versus the CLI

Quota and value

Recommendation

Comments

More from this blog

Do Open Frontier Models Have A Chance Against Closed Models ?

Can Gemma 4 Beat Gemini 3.1 Pro at Coding?

An AI Benchmark That Tests Real Coding Workflows

Vector Similarity, Zero Client JS: Decoupled Analytics on a Side Project Budget

Command Palette

What I tested

First-run impressions

Flash versus the CLI

Quota and value

Recommendation

Comments

More from this blog