Google DeepMind releases Gemini Robotics-ER 1.6 with 93% instrument reading
Google DeepMind shipped Gemini Robotics-ER 1.6 to the Gemini API and AI Studio with better visual-spatial reasoning, multi-view success detection, and gauge reading. The model's 93% instrument-reading score targets robots that need to reason over cluttered scenes and physical constraints.


TL;DR
- Logan Kilpatrick's launch post and the official DeepMind announcement both frame Gemini Robotics-ER 1.6 as a reasoning-first robotics model, now available in the Gemini API and Google AI Studio.
- In Google's benchmark card, Gemini Robotics-ER 1.6 beats both Robotics-ER 1.5 and Gemini 3.0 Flash on pointing and counting, multi-view success detection, and especially instrument reading, where it reaches 93%.
- According to DeepMind's thread, the model can identify and count tools in cluttered scenes while declining to point at objects that are not there, a small detail that matters more than another demo reel.
- Phil Schmid's summary and the DeepMind blog post both say the model can natively call tools, including Google Search, vision-language-action models, and third-party functions.
- The official blog post says safety gains include better compliance with physical constraints, plus better hazard identification on text and video tasks, while Rohan Paul's writeup highlights demos where the system refuses risky actions.
You can read the launch post, check the Gemini API models page, and see that Google is already pitching this as a developer product, not just a lab video. The oddest detail is the feedback loop: the blog asks robotics teams to submit 10 to 50 labeled images of failure cases for future releases. There is also a robotics overview page and a mention of a starter Colab in the launch post, which suggests Google wants people prompting this thing before it wants them fine-tuning anything.
Pointing and counting
Pointing is still the core primitive here. In the official blog post, DeepMind describes it as the base layer for counting, relational logic, grasp planning, and constraint checks like whether an object can fit inside a cup.
The concrete improvement is not just better localization. The launch post says ER 1.6 can use points as intermediate reasoning steps, then chain them into counting or geometric estimates. DeepMind's thread shows the practical version: identify the right tools in a cluttered workshop, count them, and ignore requested items that do not exist.
That chart puts numbers on the claim. Google's benchmark image reports 80% on pointing and counting for ER 1.6, up from 61% for Robotics-ER 1.5 and 72% for Gemini 3.0 Flash.
Multi-view success detection
The second upgrade is more robotics-specific than it sounds. The blog says modern robot setups often mix overhead and wrist cameras, so the hard part is deciding whether those views add up to a finished task under occlusion, bad lighting, or awkward angles.
In the pen-and-holder demo that Phil Schmid calls out, the model uses multiple camera views to judge whether "put the blue pen into the black pen holder" is actually complete. The official announcement says ER 1.6 reaches 84% on multi-view success detection, versus 74% for both ER 1.5 and Gemini 3.0 Flash, while single-view success detection moves to 90%.
The fine print matters. DeepMind notes in the same benchmark caption that its single-view and multi-view success-detection evals use different example sets, so those two numbers should not be compared to each other as if they were one scale.
Instrument reading
Instrument reading is the new capability, and it is the part most likely to get bookmarked by people building inspection systems. The blog says the feature came out of work with Boston Dynamics, where Spot captures images of pressure gauges, thermometers, and sight glasses around industrial facilities.
DeepMind's description is more interesting than the headline score. The model first zooms into the image, then uses pointing plus code execution to estimate proportions and intervals, then applies world knowledge to interpret the unit and final reading. The gauge demo tweet condenses that into one line: pointing and code execution are used to get the gauge reading down to sub-tick accuracy.
On Google's chart, instrument reading jumps to 93% for ER 1.6. That same benchmark card lists 72% for Gemini 3.0 Flash and 23% for ER 1.5, with the blog noting that agentic vision was enabled for the instrument-reading evals except on ER 1.5, which does not support it.
Safety constraints and developer access
The safety section is more concrete than the usual launch boilerplate. In the official blog post, DeepMind says ER 1.6 improves on adversarial spatial reasoning tasks, follows physical constraints more reliably, and does better at identifying injury risks, with gains of 6% on text scenarios and 10% on video versus Gemini 3.0 Flash.
Some of those constraints are mundane in exactly the right way. The blog's examples include refusing to handle liquids or items heavier than 20 kg, while Rohan Paul's post also points to demos involving door checks, distorted camera angles, and risky-object avoidance.
Google is also exposing the model as a normal developer surface. The Gemini API models page lists Gemini Robotics-ER 1.6 Preview under specialized task models, the pricing page places AI Studio in the free tier and higher-rate-limit access in paid tiers, and the launch post says developers get a starter Colab. The last interesting detail is the request for practitioners to submit 10 to 50 labeled images of failure modes, which is about as direct a roadmap signal as launch posts ever get.