A Google bot-authored LiteRT-LM pull request references Gemma4 and AIcore NPU support, while multiple posts claim a largest version around 120B total and 15B active parameters. Engineers targeting on-device inference should wait for a formal model card before locking plans.

google-ai-edge/LiteRT-LM repo explicitly says "Add NPU support for AIcore for Gemma4 model," and the screenshots shared by the first sighting and a second capture show it coming from copybara-service[bot], Google's internal sync bot.The concrete signal is narrow but real. The screenshots shared in the original post and a second post show an open PR titled "Add NPU support for AIcore for Gemma4 model" in google-ai-edge/LiteRT-LM, with a comment from copybara-service[bot] repeating the same text. The image OCR in both posts identifies Copybara-Service as "an helper app for Google Copybara, synchronizing repositories maintained by Google," which makes this look like an internal-to-public repo sync rather than a random third-party fork.
For engineers, the interesting part is not just the string "Gemma4." It is the coupling of Gemma4 with LiteRT-LM, NPU support, and AIcore in the PR title itself. That suggests Google is plumbing runtime support for a new model family into its lightweight inference stack before, or alongside, a public release.
The parameter details are still rumor, not announcement. In one supporting post, the claim is that Gemma 4's biggest model will be "around 120B total" with "15B active" parameters; another post repeats "120b in total, 15b active parameters." If accurate, that would imply an MoE-style architecture where only a subset of parameters is active per token.
What is missing matters just as much. None of the evidence includes an official Google post, model card, context window, tokenizer details, benchmark table, quantization guidance, license update, or API availability. So the sizing rumor is useful as an early planning signal, but it does not yet answer deployability questions.
The strongest timing hint comes from the launch-week comment, which says Logan Kilpatrick called it "going to be a fun week of launches." Read together with the LiteRT-LM PR, that makes an imminent Gemma 4 reveal plausible.
But the evidence still describes a pre-release state. There are no published weights, no serving endpoints, and no reproducible evals attached to the leak. Right now the actionable facts are limited to a Google-linked LiteRT-LM PR mentioning "Gemma4" and "AIcore" NPU support, plus an unverified large-model sizing claim circulating in social posts.
Gemma 4 has been spotted on GitHub. The PR appears to be from Google’s bot account
Google Gemma 4 incoming! Let’s go!
Yes, Gemma 4 will soon be released this time their largest size might be around 120B total with 15B active
Google DeepMind’s Logan Kilpatrick says its “going to be a fun week of launches” We have already spotted Gemma 4 (see below), so we could potentially also see Gemini 3.1 Pro GA, Gemini Flash 3.1 GA Gemini Pro 3.2 preview and more What AI models are you hoping to see this week?