breakingMay 12, 2026

Google DeepMind tests Gemini pointer control on PDFs and charts

Google DeepMind showed an experimental pointer that lets Gemini act directly on screen elements with motion, speech, and shorthand commands. The demos move assistance from chat into live workspace control, but the feature was presented as an experiment rather than a shipped product.

3 min read

Google DeepMind tests Gemini pointer control on PDFs and charts

TL;DR

GoogleDeepMind's demo thread showed an experimental Gemini pointer that combines motion, speech, and shorthand commands so people can act on screen elements directly.
In the official examples, the pointer targets PDFs, tables, and recipes, then turns those gestures into actions like email bullet points, pie charts, and ingredient scaling.
LinusEkenstam's read framed the setup as a simple interface layer, mouse plus voice plus a screen snapshot, with most of the work happening behind the scenes.
The footage in GoogleDeepMind's video moves assistance out of a separate chat window and into the cursor itself, which is the real design shift here.

You can watch GoogleDeepMind's main demo turn a selected passage into a tweet draft, then jump to the same thread's examples of recipe scaling and PDF summarization in context. LinusEkenstam immediately reduced the interaction model to mouse, voice, and screen vision, while minchoi pushed the bolder claim that this kind of interface makes the prompt box feel dated.

Pointer control

Google DeepMind pitched the idea as "reimagining" the mouse pointer with AI. The core move is simple: point at something already on screen, speak a short instruction, and let Gemini infer the action from the local context.

The official examples in GoogleDeepMind's thread break into a few concrete verbs:

Point at a PDF, ask for bullet points for an email.
Hover over a table, ask for a pie chart.
Highlight a recipe, say "double these ingredients."
Select text, then say "Make it a tweet."

The interesting part is where the help appears. In the demo, the assistance lives beside the cursor and the active document, not in a separate app.

The interface layer

Linus Ekenstam's summary, "Mouse + Voice + Snapshot of Screen = Action," is probably the cleanest shorthand for what the demo is doing LinusEkenstam's take. It describes the system less like a new app and more like a thin control layer sitting on top of whatever is already open.

That reading fits the footage in GoogleDeepMind's video, where the cursor, local selection, and spoken command seem to provide enough grounding for Gemini to infer the task without a long typed prompt. The result feels closer to direct manipulation than chatbot prompting.

Experimental demos

Google DeepMind called these "experimental demos," not a product launch GoogleDeepMind's demo thread. There was no product name, no availability note, and no linked signup or docs in the evidence shown here.

That matters because the thread is presenting an interaction model, not a shipped feature set. The examples are polished, but the public material stops at concept footage.

Prompt-box reactions

The immediate reaction posts treated the demo as a possible shift away from both chat panes and conventional clicking. minchoi's reaction called it a moment where "the prompt box is dying," while danshipper's reaction said he was excited for a future where clicking a mouse feels "old school."

Those reactions are speculative, but they capture what made the thread travel: Google was not showing a smarter chatbot. It was showing a cursor that behaves like an agent.