Skip to content
AI Primer
TOPIC50 stories

Computer Use

Agents that click, type, browse, and operate software directly.

RELEASE25th June
Google opens Gemini 3.5 Flash Computer Use in Gemini API with explicit confirmations

A day after Gemini 3.5 Flash Computer Use surfaced as a launch story, Google formally opened it through the Gemini API and Enterprise Agent Platform. Explicit user confirmation, automated task stopping, and an Android adb quickstart make the rollout concrete for agent builders.

RELEASE24th June
Gemini 3.5 Flash adds Computer Use with 78.4 OSWorld score

Google released built-in Computer Use for Gemini 3.5 Flash across browser, mobile, and desktop. Try it for agent workflows, but watch for timeout issues on long design-from-scratch runs.

RELEASE22nd June
Hermes Agent adds Windows and Linux GUI computer use via TryCua

Hermes Agent added GUI computer-use support for Windows and Linux through TryCua drivers, extending beyond existing macOS support. Teams running desktop automation across mixed operating systems should test the new coverage.

RELEASE1w ago
OpenAI Codex adds Record & Replay for reusable workflow skills

OpenAI added Record & Replay to Codex so users can demonstrate a repetitive computer task once and save it as a reusable skill. The first rollout is Mac-only and unavailable in the EEA, UK, and Switzerland, so teams should check access before planning rollout.

RELEASE1w ago
TryCua launches Cua Driver Linux with background computer use and Wayland preview

TryCua brought Cua Driver to Linux, letting Claude Code, Codex, Hermes, and custom agents control real desktop apps via CLI or MCP without taking over the main terminal. The release also adds headless SSH execution and a preview of multi-window Wayland control across supported distros.

RELEASE1w ago
OpenAI opens Codex Computer Use and Chrome extension in the EEA, UK, and Switzerland

OpenAI expanded Codex in Europe with Computer Use, the Chrome extension, Memory, and Chronicle. The rollout broadens browser and desktop automation outside the U.S., though some memory features remain opt-in or preview-only.

NEWS1w ago
ENPIRE launches 8-agent Codex robot fleet for physical autoresearch

ENPIRE launched a physical autoresearch setup that gives eight Codex agents robots, GPUs, and real-world APIs for tasks like zip ties and part sorting. It matters because it moves long-horizon agent evaluation from browser-only loops into embodied experimentation with explicit safety controls.

RELEASE1w ago
TryCua launches Cua-Bench for KiCad; GPT-5.5 clears 6 of 25 tasks

TryCua and Snorkel opened Cua-Bench, a computer-use benchmark with 25 expert-authored KiCad tasks graded by exact netlist matches. The early results show frontier models still struggle with GUI execution, wiring completion, and self-checking, so treat benchmark wins as incomplete for real computer-use work.

WORKFLOW2w ago
Codex users report Appshots, browser control, and parallel PR workflows

OpenAI shipped a docs agent that can hand off guides to Codex, and users published Appshots, browser-control, parallel PR, and multi-tree workflows. Watch the examples for ways to structure Codex around orchestrated tasks, while code-review and plugin gaps remain.

RELEASE2w ago
Perplexity Computer adds native Deep Research with Search as Code

Perplexity made Deep Research a native skill inside Computer and tied it to the same harness, long-running sandboxes, tools, connectors, and licensed data. The update collapses multi-step research into one persistent agent interface instead of a separate mode.

WORKFLOW3w ago
Codex users compare iOS dictation, multi-thread UX, and long-context prompts

Codex usage moved further into phone-first workflows, with iOS dictation loops, background voice capture, and app updates like searchable settings and restored state. The comparisons still flag rough spots in multi-thread UX, Windows support, and cases where CLI tabs or cloud agents are easier to manage.

NEWS3w ago
Browser Use adds cloud profiles and geo proxies, with 484 browsers in <2s

Browser Use launched synced cloud profiles for logged-in sessions, added geo-targeted proxies, and showed a 484-browser startup demo that finished in under two seconds. The update matters because hosted browser agents can now keep authenticated state and regional routing without custom session-management work.

RELEASE3w ago
Personal Computer opens Windows waitlist for Max and Enterprise Max users

Perplexity opened Personal Computer for Windows to Max and Enterprise Max users on a waitlist. The rollout widens its local agent surface beyond earlier releases, and users should watch for the local-cloud task splitting preview for private or heavier workloads.

RELEASE3w ago
H Company launches Holo 3.1 with local computer use and 79.3% AndroidWorld

H Company released Holo 3.1, a local computer-use VLM family with function calling and AndroidWorld gains up to 79.3% on the 35B model. The update pushes computer-use agents toward local and mobile deployment instead of cloud-only runtimes.

RELEASE4w ago
OpenAI Codex adds Windows computer use and ChatGPT mobile remote control

OpenAI added computer use to Codex on Windows and lets ChatGPT mobile steer tasks running on Windows PCs. The update extends Codex to existing Windows dev machines and adds remote review and debugging from mobile.

RELEASE4w ago
Cua Driver supports Windows background computer use over MCP and CLI

Cua Driver said its Windows backend is now stable, letting Claude Code, Codex, Hermes, or custom agents drive real Windows apps through MCP or CLI. The release targets Windows-only line-of-business software while keeping the desktop usable with multi-pointer support.

WORKFLOW1mo ago
Codex users share /goal audits, mobile delegation, and Raspberry Pi workflows

Practitioners published reusable Codex workflows for project audits, memory-driven skill packaging, mobile delegation, and remote computer use. Try the prompt-and-steps patterns if you want to adapt Codex across repos and devices.

WORKFLOW1mo ago
Codex users report iPhone simulator bug-bashes, Appshots form fills, and locked-Mac runs

Two days after Codex added locked-Mac control and Appshots, users posted end-to-end iPhone simulator debugging, Safari form-filling, and remote-control workflows. That matters because the feature is moving from launch copy into concrete computer-use tasks that can replace manual QA and repetitive UI work.

RELEASE1mo ago
OpenAI updates Codex with locked-Mac control and Appshots

OpenAI shipped a Codex update that lets the mobile app control a locked Mac, adds Appshots for screen context, and graduates /goal. It also adds browser annotation tools, team plugin sharing, and expanded analytics for business users.

RELEASE1mo ago
Cognition adds Windows VMs to Devin for MSBuild, IIS, and .NET migrations

Cognition added native Windows VMs to Devin so it can build, run, and test Windows applications with MSBuild, IIS, PowerShell, and SQL Server. The rollout lets Devin handle enterprise codebases where Linux sandboxes are not enough.

NEWS1mo ago
Gemini desktop leaks Stream to Cursor, Spark local files, and Omni ahead of I/O

Leak videos and tester reports pointed to a larger Gemini desktop app with Stream to Cursor, Spark local-file access, Live, and Omni ahead of I/O. Independent testers also reported faster 3.2 and 3.5 Flash checkpoints, but Google had not announced the features publicly.

WORKFLOW1mo ago
Codex adds remote connections for Mac mini devboxes in the ChatGPT app

OpenAI documented Codex remote connections, letting the ChatGPT app point at a separate Codex host such as a Mac mini or rented VPS. Try it for long runs that need to stay alive off-device or for phone-first coding sessions.

NEWS1mo ago
Google introduces Gemini Intelligence on Android with browser use, AppFunctions, and Rambler

Google unveiled Gemini Intelligence at the Android Show with cross-app task automation, Gemini in Chrome, Rambler voice cleanup, custom widgets, and AppFunctions. The rollout moves Gemini into core Android workflows on Pixel and Galaxy devices this summer.

NEWS1mo ago
OpenAI Codex supports background computer use with Mac app control and Telegram BotFather setup

OpenAI showed Codex working across apps in the background without taking over the Mac, and early users applied it to Telegram BotFather setup and front-end testing. That matters because Codex is moving from repo-only work into authenticated desktop workflows and UI-driven task loops.

RELEASE1mo ago
Nous Research adds CUA computer use to Hermes Agent for desktop control

Nous Research added early computer-use support to Hermes Agent through CUA, enabling background desktop control without taking over keyboard, mouse, or screen input. The feature opens computer use to local or alternative models instead of tying the workflow to frontier-only modes.

RELEASE1mo ago
Zyphra releases ZAYA1-VL-8B with 700M active params and Apache 2.0

Zyphra released its first vision-language model, an 8B MoE with 700M active parameters and visual LoRA adapters. The model matters because it targets OCR, document reasoning, GUI interaction, and computer-use workloads under an Apache 2.0 license.

RELEASE1mo ago
Perplexity releases Personal Computer Mac app for local files and native app control

Perplexity released a new Mac app centered on Personal Computer, a local-first agent that works across local files, native Mac apps, and the web. It also supports remote control from iPhone and an always-on Mac mini setup paired with Comet.

RELEASE1mo ago
Navigator n1.5 claims web computer-use Pareto gains on accuracy, latency, and cost

Yutori rolled out Navigator n1.5 as a web computer-use model and said it improves the tradeoff between accuracy, latency, and cost for browser tasks. The launch matters because related environment-generation work is aimed at the long-horizon web workflows that make computer-use agents expensive and brittle.

NEWS1mo ago
Perplexity Computer launches Professional Finance with 35 workflows and licensed data

Perplexity launched Professional Finance for Computer with licensed Morningstar, PitchBook, Daloopa, and Carbon Arc data plus 35 analyst workflows. The release matters because outputs are now designed to stay traceable to source documents instead of behaving like opaque chat answers.

RELEASE1mo ago
DeepSeek removes visual-primitives repo after 90-KV vision details

DeepSeek briefly published a paper and threads on point-and-bbox reasoning, about 90 KV entries per 800² image, and RL-trained vision experts, then removed the repo and related mentions. The technique looked like a low-token path to computer use and multimodal reasoning in V4-Flash, but availability and reproducibility are now unclear.

RELEASE2mo ago
Codex adds macOS computer use, in-app browser, and artifact previews

Codex gained background macOS control, page inspection, image generation, plugins, artifacts, and follow-up automations. That gives it one agent thread for desktop apps, frontend debugging, and recurring work.

RELEASE2mo ago
Claude Connectors add Blender and Autodesk Fusion control via MCP

Anthropic released Claude Connectors for Blender, Autodesk Fusion, and other creative apps, exposing commands and file actions through MCP. That lets Claude operate inside existing desktop tools instead of only returning chat instructions.

RELEASE2mo ago
Browser Use launches Browser Use Box with persistent logins and Telegram control

Browser Use launched Browser Use Box, a 24/7 Browser Harness environment with persistent logins and Telegram control. It moves browser agents off laptops and into always-on remote sessions for long-running web tasks.

NEWS2mo ago
Pi ecosystem ships computer use, `/parallel-review`, and Chrome extension templates

Independent builders shipped Pi-GUI computer use, pi-subagents parallel review, and starter templates for extensions, Docker workers, and voice add-ons. The releases add reusable computer-use, subagent, and local-runtime building blocks around the base Pi harness.

RELEASE2mo ago
Cua Driver opens macOS background app control with multi-cursor support for Claude Code and Codex

Cua Driver open-sourced a macOS driver that lets agents control apps in the background with multi-player and multi-cursor support. It matters because it turns background computer use from an app-specific feature into a reusable primitive that any agent loop can adopt.

WORKFLOW2mo ago
Codex users report subagent, MCP, and canary deploy workflows

Practitioners shared repeatable Codex workflows for long-lived threads, background subagents, computer-use access through MCP, and canary rollouts. Codex is being used less as a one-shot assistant and more as a persistent automation harness.

WORKFLOW2mo ago
Codex supports hidden-app control on macOS as users report 38-hour computer-use sessions

Fresh hands-on reports show Codex controlling minimized apps via macOS APIs, using a DOM-aware browser comment mode, and running for day-long sessions in the desktop app. That gives OpenAI stronger evidence that computer use is usable for daily development, though the rollout remains macOS-first and brittle around working-state changes.

RELEASE2mo ago
Codex adds background computer use on macOS with 90+ plugins and SSH devboxes

OpenAI expanded Codex with background Mac computer use, an in-app browser, image generation, memory preview, automations, and 90+ plugins. The release moves Codex from terminal coding toward long-running UI and ops workflows, though some features remain macOS-first or alpha.

RELEASE2mo ago
Perplexity launches Personal Computer for Mac with local file and app control

Perplexity launched Personal Computer for Mac, giving its desktop agent access to local folders, native apps, and the browser from one orchestration layer. It also supports Mac mini setups controlled from iPhone, pushing the product toward an always-on desktop agent.

RELEASE2mo ago
Z.ai launches GLM-5V-Turbo for screenshot coding and GUI-agent tasks

Z.ai released GLM-5V-Turbo, a multimodal coding model for screenshots, video, design drafts, and GUI-agent tasks. It keeps text-coding performance steady while adding native vision support, so teams can test visual workflows without swapping models.

RELEASE2mo ago
H Company launches Holo3 with 78.9% on OSWorld-Verified

H Company introduced Holo3, a computer-use model family with a 122B API model and an Apache 2.0 35B release on Hugging Face. Check the benchmark and pricing claims before assuming the model is ready for field deployment.

RELEASE3mo ago
Claude Code adds computer use in research preview for Pro and Max

Anthropic put computer use directly into Claude Code, letting the CLI open apps, click through GUIs, and verify work on screen. Try it if you want Claude Code to handle end-to-end UI tasks beyond file edits, but note it is rolling out as a research preview on Pro and Max plans.

RELEASE3mo ago
Expect launches CLI to QA apps in a real browser and record bug videos

Expect wraps browser QA for Claude Code, Codex, or Cursor into a CLI that records bug videos and feeds failures back into a fix loop. It gives coding agents a tighter UI validation cycle without requiring a custom browser harness.

RELEASE3mo ago
Firecrawl launches /interact for natural-language browser actions

Firecrawl’s new /interact endpoint lets agents click, fill, scroll, and keep live browser sessions right after /scrape. It shortens the path from page extraction to web automation, but Playwright remains the better fit when you need deterministic full-session control.

NEWS3mo ago
Claude Code adds macOS computer use with app control and permission prompts

Claude can now drive macOS apps, browser tabs, the keyboard, and the mouse from Claude Cowork and Claude Code, with permission prompts when it needs direct screen access. That makes legacy desktop workflows automatable, and Anthropic is pairing the push with more background-task support for longer agent loops.

RELEASE3mo ago
Agent Computer launches cloud computers in under 0.5s with SSH access

Agent Computer launched cloud desktops that boot in under half a second and expose persistent disks, shared credentials, SSH access, and ACP control for agents. It gives coding agents a faster place to run tools and reuse auth, but teams still need to design safe session and credential boundaries.

WORKFLOW3mo ago
Claude tests 25 Capacitor screens daily through Android CDP and iOS accessibility

A solo developer wired Claude into emulators and simulators to inspect 25 Capacitor screens daily and file bugs across web, Android, and iOS. The writeup is a solid template for unattended QA, but it also shows where iOS tooling and agent reliability still crack.

RELEASE3mo ago
OpenClaw 3.13 supports Chrome 146 via MCP for signed-in browser control

OpenClaw 3.13 now connects to a real Chrome 146 session over MCP so agents can drive your signed-in browser instead of a separate bot context. Update if captchas or auth state were blocking your web automation flows.

RELEASE3mo ago
Hermes Agent releases v0.3.0 with plugins, live Chrome CDP, and ACP IDE support

Hermes Agent v0.3.0 added a first-class plugin system, live browser attach via CDP, real-time streaming, and VS Code, Zed, and JetBrains integration through ACP. Update if you want shareable skills, browser control, and a more stable long-running agent setup.

RELEASE3mo ago
H Company releases Holotron-12B: 8.9k tok/s on H100 and 80.5% WebVoyager

H Company launched Holotron-12B, an open multimodal model for computer-use agents built on a hybrid SSM-attention stack that targets KV-cache bottlenecks. Benchmark it if you need high-concurrency browser agents and want better throughput without giving up web-task accuracy.

AI PrimerAI Primer

Your daily guide to AI tools, workflows, and creative inspiration.

© 2026 AI Primer. All rights reserved.