Computer Use
Agents that click, type, browse, and operate software directly.
Stories
Filter storiesGoogle unveiled Gemini Intelligence at the Android Show with cross-app task automation, Gemini in Chrome, Rambler voice cleanup, custom widgets, and AppFunctions. The rollout moves Gemini into core Android workflows on Pixel and Galaxy devices this summer.
OpenAI showed Codex working across apps in the background without taking over the Mac, and early users applied it to Telegram BotFather setup and front-end testing. That matters because Codex is moving from repo-only work into authenticated desktop workflows and UI-driven task loops.
Nous Research added early computer-use support to Hermes Agent through CUA, enabling background desktop control without taking over keyboard, mouse, or screen input. The feature opens computer use to local or alternative models instead of tying the workflow to frontier-only modes.
Zyphra released its first vision-language model, an 8B MoE with 700M active parameters and visual LoRA adapters. The model matters because it targets OCR, document reasoning, GUI interaction, and computer-use workloads under an Apache 2.0 license.
Perplexity released a new Mac app centered on Personal Computer, a local-first agent that works across local files, native Mac apps, and the web. It also supports remote control from iPhone and an always-on Mac mini setup paired with Comet.
Yutori rolled out Navigator n1.5 as a web computer-use model and said it improves the tradeoff between accuracy, latency, and cost for browser tasks. The launch matters because related environment-generation work is aimed at the long-horizon web workflows that make computer-use agents expensive and brittle.
Perplexity launched Professional Finance for Computer with licensed Morningstar, PitchBook, Daloopa, and Carbon Arc data plus 35 analyst workflows. The release matters because outputs are now designed to stay traceable to source documents instead of behaving like opaque chat answers.
DeepSeek briefly published a paper and threads on point-and-bbox reasoning, about 90 KV entries per 800² image, and RL-trained vision experts, then removed the repo and related mentions. The technique looked like a low-token path to computer use and multimodal reasoning in V4-Flash, but availability and reproducibility are now unclear.
Codex gained background macOS control, page inspection, image generation, plugins, artifacts, and follow-up automations. That gives it one agent thread for desktop apps, frontend debugging, and recurring work.
Anthropic released Claude Connectors for Blender, Autodesk Fusion, and other creative apps, exposing commands and file actions through MCP. That lets Claude operate inside existing desktop tools instead of only returning chat instructions.
Browser Use launched Browser Use Box, a 24/7 Browser Harness environment with persistent logins and Telegram control. It moves browser agents off laptops and into always-on remote sessions for long-running web tasks.
Independent builders shipped Pi-GUI computer use, pi-subagents parallel review, and starter templates for extensions, Docker workers, and voice add-ons. The releases add reusable computer-use, subagent, and local-runtime building blocks around the base Pi harness.
Cua Driver open-sourced a macOS driver that lets agents control apps in the background with multi-player and multi-cursor support. It matters because it turns background computer use from an app-specific feature into a reusable primitive that any agent loop can adopt.
Practitioners shared repeatable Codex workflows for long-lived threads, background subagents, computer-use access through MCP, and canary rollouts. Codex is being used less as a one-shot assistant and more as a persistent automation harness.
Fresh hands-on reports show Codex controlling minimized apps via macOS APIs, using a DOM-aware browser comment mode, and running for day-long sessions in the desktop app. That gives OpenAI stronger evidence that computer use is usable for daily development, though the rollout remains macOS-first and brittle around working-state changes.
OpenAI expanded Codex with background Mac computer use, an in-app browser, image generation, memory preview, automations, and 90+ plugins. The release moves Codex from terminal coding toward long-running UI and ops workflows, though some features remain macOS-first or alpha.
Perplexity launched Personal Computer for Mac, giving its desktop agent access to local folders, native apps, and the browser from one orchestration layer. It also supports Mac mini setups controlled from iPhone, pushing the product toward an always-on desktop agent.
Z.ai released GLM-5V-Turbo, a multimodal coding model for screenshots, video, design drafts, and GUI-agent tasks. It keeps text-coding performance steady while adding native vision support, so teams can test visual workflows without swapping models.
H Company introduced Holo3, a computer-use model family with a 122B API model and an Apache 2.0 35B release on Hugging Face. Check the benchmark and pricing claims before assuming the model is ready for field deployment.
Anthropic put computer use directly into Claude Code, letting the CLI open apps, click through GUIs, and verify work on screen. Try it if you want Claude Code to handle end-to-end UI tasks beyond file edits, but note it is rolling out as a research preview on Pro and Max plans.
Firecrawl’s new /interact endpoint lets agents click, fill, scroll, and keep live browser sessions right after /scrape. It shortens the path from page extraction to web automation, but Playwright remains the better fit when you need deterministic full-session control.
Expect wraps browser QA for Claude Code, Codex, or Cursor into a CLI that records bug videos and feeds failures back into a fix loop. It gives coding agents a tighter UI validation cycle without requiring a custom browser harness.
Claude can now drive macOS apps, browser tabs, the keyboard, and the mouse from Claude Cowork and Claude Code, with permission prompts when it needs direct screen access. That makes legacy desktop workflows automatable, and Anthropic is pairing the push with more background-task support for longer agent loops.
Agent Computer launched cloud desktops that boot in under half a second and expose persistent disks, shared credentials, SSH access, and ACP control for agents. It gives coding agents a faster place to run tools and reuse auth, but teams still need to design safe session and credential boundaries.
A solo developer wired Claude into emulators and simulators to inspect 25 Capacitor screens daily and file bugs across web, Android, and iOS. The writeup is a solid template for unattended QA, but it also shows where iOS tooling and agent reliability still crack.
OpenClaw 3.13 now connects to a real Chrome 146 session over MCP so agents can drive your signed-in browser instead of a separate bot context. Update if captchas or auth state were blocking your web automation flows.
Hermes Agent v0.3.0 added a first-class plugin system, live browser attach via CDP, real-time streaming, and VS Code, Zed, and JetBrains integration through ACP. Update if you want shareable skills, browser control, and a more stable long-running agent setup.
OpenAI shipped GPT-5.4 mini to ChatGPT, Codex, and the API, and GPT-5.4 nano to the API, with 400K context, lower prices, and stronger coding and computer-use scores. Route subagents and high-volume tasks to the smaller tiers to cut spend without giving up much capability.
H Company launched Holotron-12B, an open multimodal model for computer-use agents built on a hybrid SSM-attention stack that targets KV-cache bottlenecks. Benchmark it if you need high-concurrency browser agents and want better throughput without giving up web-task accuracy.
Perplexity shipped an enterprise version of Comet with admin controls, silent deployment via MDM, telemetry, audit logs, and CrowdStrike Falcon integration. Test it if your team wants browser-native agents without giving up endpoint management and security review.
Manus moved from a cloud sandbox onto local machines with My Computer, a desktop app that can organize files, run commands, and build apps on macOS and Windows. Use it if you want agent workflows over private local data and hardware instead of a remote browser sandbox.
Perplexity expanded Computer to Android and added control of a local Comet browser session, including logged-in sites, from the agent. Try it if you want one agent workflow across mobile and browser surfaces without per-site connectors or custom MCP glue.
Chrome DevTools MCP now lets agents attach to an existing signed-in browser session, and companion tools added one-command auto-connect flows. Use it to debug and automate in the tabs you already use instead of setting up separate logins or headless sessions.
Markov AI released Computer Use Large on Hugging Face with 48,478 screen recordings spanning about 12,300 hours across six professional apps. Use it to train and evaluate GUI agents on real software workflows with a large CC-BY dataset.
Perplexity brought Computer to iOS with cross-device sync so multi-step cloud tasks can keep running after you leave the screen. Try it if you want to start agent workflows from a phone instead of a desktop-only session.
The OpenClaw-RL paper proposes training agents continuously from normal interactions by turning user corrections, logs, and next-state feedback into rewards and word-level supervision. Watch it if you build persistent agents and want adaptation to come from live deployment traces instead of offline labeling.
OpenAI published runtime details for the Responses API computer environment, including shell loops, capped output, automatic compaction, proxied outbound traffic, and reusable skills folders. Use it as a reference architecture for hosted agents that need state, safety controls, and tool execution patterns.