updateMay 20, 2026

OpenCode, Kilo, Replicate, and Mastra support Gemini 3.5 Flash on day one

OpenCode, Kilo, Replicate, and Mastra exposed Gemini 3.5 Flash on launch day across coding agents, routers, and hosted APIs. The fast uptake gives engineers multiple harnesses to test Google's 1M-context model despite mixed first-party app reports.

4 min read

OpenCode, Kilo, Replicate, and Mastra support Gemini 3.5 Flash on day one

TL;DR

Google shipped Gemini 3.5 Flash as a generally available model on day one, with rollout across the Gemini app, AI Mode in Search, the Gemini API, AI Studio, Antigravity, Android Studio, and enterprise agent products, according to GoogleDeepMind's availability post and Google's launch thread.
The first independent tooling wave landed immediately: OpenCode added the model, kilocode exposed it in Kilo CLI, replicate put it on Replicate's API, and mastra added it to Mastra's router.
Google's launch claim centered on speed plus coding and agentic gains, with GoogleDeepMind's benchmark thread and Google's speed chart post framing 1M context, stronger tool use, and roughly 4x faster output than comparable frontier models.
Day-one access was messy but broad: GeminiApp said the consumer app rollout was free and global, while testingcatalog's AI/ML API post, Warp's support post, and Cognition's Windsurf repost showed third-party surfaces turning the model on fast.

You can read Google's launch post, browse Mastra's Google provider page, hit the new Replicate model endpoint, and then jump straight into the weirder part, Antigravity's OS-building demo, where Google says 93 subagents burned through 2.6B tokens to build a working OS.

What shipped

Google positioned Gemini 3.5 Flash as its strongest coding and agentic model so far, not a cheap lite tier. The public launch materials paired that claim with 1,048,576 token input context, 65,536 max output tokens, and pricing of $1.50 per million input tokens and $9 per million output tokens in the API, as noted in Google's launch post and Simon Willison's launch writeup.

In Google's own comparison, Gemini 3.5 Flash beat Gemini 3.1 Pro on three flagship evals:

Coding, Terminal-Bench 2.1: 70.3% to 76.2%, per GoogleDeepMind's benchmark thread
Real-world agentic work, GDPval-AA Elo: 1314 to 1656, per GoogleDeepMind's benchmark thread
Scaled tool use, MCP Atlas: 78.2% to 83.6%, per GoogleDeepMind's benchmark thread

The catch is price. Before launch, scaling01's pricing post and AiBattle_'s price comparison both clocked the new Flash tier at triple the price of Gemini 3 Flash.

OpenCode and Kilo

The fast part of this story is not the model card, it is how quickly coding tools started wiring the slug in. OpenCode advertised Gemini 3.5 Flash as live with 1M context and pricing comparable to GLM, Kimi, and DeepSeek Pro.

Kilo moved just as fast. kilocode's launch post said Gemini 3.5 Flash was live in Kilo before I/O ended, with an initial 74.2% result on PinchBench and a claim of roughly 4x faster performance than comparable frontier models.

Kilo's follow-up posts matter because they show the harness Google seems to want for this model family, not just a dropdown entry. kilocode's Cloud Agent post pitched remote repo sessions with auto-commits and idle spin-down, while kilocode's Code Reviewer post pushed free PR review with inline comments.

Replicate and Mastra

Hosted API and router support also appeared immediately. replicate exposed Gemini 3.5 Flash through Replicate's API, and mastra added google/gemini-3.5-flash to Mastra's router the same day.

That gives the model three very different day-one entry points:

Managed inference via Replicate's model page
Framework routing via Mastra's Google models page
Terminal-agent access via Kilo and OpenCode, according to OpenCode and kilocode

The ecosystem kept expanding through the evening. warpdotdev added Gemini 3.5 Flash to Warp Agent, Cognition's Windsurf repost signaled Windsurf support, and testingcatalog's AI/ML API post pointed to a 24-hour free testing window through AI/ML API.

Antigravity teamwork

Google's most concrete demo for why it thinks 3.5 Flash belongs in agent harnesses was not inside Gemini the app. It was inside Antigravity 2.0, where Google's OS demo thread said agents built a functioning operating system from a single prompt in 12 hours using 93 parallel subagents, more than 15,000 model requests, and 2.6B processed tokens for less than $1,000 in API credits.

Mirrokni's thread broke the coordination pattern into four roles for /teamwork-preview, which he said is now in research preview:

Orchestrators
Explorers
Workers
Critics

According to mirrokni's access note, that preview is currently limited to Google AI Ultra subscribers. That makes the day-one story two-tiered: broad model availability almost everywhere, and the most aggressive multi-agent harness still behind a research-preview gate.

TL;DR

What shipped

OpenCode and Kilo

Replicate and Mastra

Antigravity teamwork

Discussion across the web