TOPIC37 stories

Computer Use

Agents that click, type, browse, and operate software directly.

Stories

Google introduces Gemini Intelligence on Android with browser use, AppFunctions, and Rambler

Google unveiled Gemini Intelligence at the Android Show with cross-app task automation, Gemini in Chrome, Rambler voice cleanup, custom widgets, and AppFunctions. The rollout moves Gemini into core Android workflows on Pixel and Galaxy devices this summer.

NEWS12th May

OpenAI Codex supports background computer use with Mac app control and Telegram BotFather setup

OpenAI showed Codex working across apps in the background without taking over the Mac, and early users applied it to Telegram BotFather setup and front-end testing. That matters because Codex is moving from repo-only work into authenticated desktop workflows and UI-driven task loops.

RELEASE11th May

Nous Research adds CUA computer use to Hermes Agent for desktop control

Nous Research added early computer-use support to Hermes Agent through CUA, enabling background desktop control without taking over keyboard, mouse, or screen input. The feature opens computer use to local or alternative models instead of tying the workflow to frontier-only modes.

RELEASE8th May

Zyphra releases ZAYA1-VL-8B with 700M active params and Apache 2.0

Zyphra released its first vision-language model, an 8B MoE with 700M active parameters and visual LoRA adapters. The model matters because it targets OCR, document reasoning, GUI interaction, and computer-use workloads under an Apache 2.0 license.

RELEASE7th May

Perplexity releases Personal Computer Mac app for local files and native app control

Perplexity released a new Mac app centered on Personal Computer, a local-first agent that works across local files, native Mac apps, and the web. It also supports remote control from iPhone and an always-on Mac mini setup paired with Comet.

RELEASE6th May

Navigator n1.5 claims web computer-use Pareto gains on accuracy, latency, and cost

Yutori rolled out Navigator n1.5 as a web computer-use model and said it improves the tradeoff between accuracy, latency, and cost for browser tasks. The launch matters because related environment-generation work is aimed at the long-horizon web workflows that make computer-use agents expensive and brittle.

NEWS1w ago

Perplexity Computer launches Professional Finance with 35 workflows and licensed data

Perplexity launched Professional Finance for Computer with licensed Morningstar, PitchBook, Daloopa, and Carbon Arc data plus 35 analyst workflows. The release matters because outputs are now designed to stay traceable to source documents instead of behaving like opaque chat answers.

RELEASE1w ago

DeepSeek removes visual-primitives repo after 90-KV vision details

DeepSeek briefly published a paper and threads on point-and-bbox reasoning, about 90 KV entries per 800² image, and RL-trained vision experts, then removed the repo and related mentions. The technique looked like a low-token path to computer use and multimodal reasoning in V4-Flash, but availability and reproducibility are now unclear.

RELEASE2w ago

Codex adds macOS computer use, in-app browser, and artifact previews

Codex gained background macOS control, page inspection, image generation, plugins, artifacts, and follow-up automations. That gives it one agent thread for desktop apps, frontend debugging, and recurring work.

RELEASE2w ago

Claude Connectors add Blender and Autodesk Fusion control via MCP

Anthropic released Claude Connectors for Blender, Autodesk Fusion, and other creative apps, exposing commands and file actions through MCP. That lets Claude operate inside existing desktop tools instead of only returning chat instructions.

RELEASE2w ago

Browser Use launches Browser Use Box with persistent logins and Telegram control

Browser Use launched Browser Use Box, a 24/7 Browser Harness environment with persistent logins and Telegram control. It moves browser agents off laptops and into always-on remote sessions for long-running web tasks.

NEWS2w ago

Pi ecosystem ships computer use, `/parallel-review`, and Chrome extension templates

Independent builders shipped Pi-GUI computer use, pi-subagents parallel review, and starter templates for extensions, Docker workers, and voice add-ons. The releases add reusable computer-use, subagent, and local-runtime building blocks around the base Pi harness.

RELEASE2w ago

Cua Driver opens macOS background app control with multi-cursor support for Claude Code and Codex

Cua Driver open-sourced a macOS driver that lets agents control apps in the background with multi-player and multi-cursor support. It matters because it turns background computer use from an app-specific feature into a reusable primitive that any agent loop can adopt.

WORKFLOW3w ago

Codex users report subagent, MCP, and canary deploy workflows

Practitioners shared repeatable Codex workflows for long-lived threads, background subagents, computer-use access through MCP, and canary rollouts. Codex is being used less as a one-shot assistant and more as a persistent automation harness.

WORKFLOW3w ago

Codex supports hidden-app control on macOS as users report 38-hour computer-use sessions

Fresh hands-on reports show Codex controlling minimized apps via macOS APIs, using a DOM-aware browser comment mode, and running for day-long sessions in the desktop app. That gives OpenAI stronger evidence that computer use is usable for daily development, though the rollout remains macOS-first and brittle around working-state changes.

RELEASE3w ago

Codex adds background computer use on macOS with 90+ plugins and SSH devboxes

OpenAI expanded Codex with background Mac computer use, an in-app browser, image generation, memory preview, automations, and 90+ plugins. The release moves Codex from terminal coding toward long-running UI and ops workflows, though some features remain macOS-first or alpha.

RELEASE3w ago

Perplexity launches Personal Computer for Mac with local file and app control

Perplexity launched Personal Computer for Mac, giving its desktop agent access to local folders, native apps, and the browser from one orchestration layer. It also supports Mac mini setups controlled from iPhone, pushing the product toward an always-on desktop agent.

RELEASE1mo ago

Z.ai launches GLM-5V-Turbo for screenshot coding and GUI-agent tasks

Z.ai released GLM-5V-Turbo, a multimodal coding model for screenshots, video, design drafts, and GUI-agent tasks. It keeps text-coding performance steady while adding native vision support, so teams can test visual workflows without swapping models.

RELEASE1mo ago

H Company launches Holo3 with 78.9% on OSWorld-Verified

H Company introduced Holo3, a computer-use model family with a 122B API model and an Apache 2.0 35B release on Hugging Face. Check the benchmark and pricing claims before assuming the model is ready for field deployment.

RELEASE1mo ago

Claude Code adds computer use in research preview for Pro and Max

Anthropic put computer use directly into Claude Code, letting the CLI open apps, click through GUIs, and verify work on screen. Try it if you want Claude Code to handle end-to-end UI tasks beyond file edits, but note it is rolling out as a research preview on Pro and Max plans.

RELEASE1mo ago

Firecrawl launches /interact for natural-language browser actions

Firecrawl’s new /interact endpoint lets agents click, fill, scroll, and keep live browser sessions right after /scrape. It shortens the path from page extraction to web automation, but Playwright remains the better fit when you need deterministic full-session control.

RELEASE1mo ago

Expect launches CLI to QA apps in a real browser and record bug videos

Expect wraps browser QA for Claude Code, Codex, or Cursor into a CLI that records bug videos and feeds failures back into a fix loop. It gives coding agents a tighter UI validation cycle without requiring a custom browser harness.

NEWS1mo ago

Claude Code adds macOS computer use with app control and permission prompts

Claude can now drive macOS apps, browser tabs, the keyboard, and the mouse from Claude Cowork and Claude Code, with permission prompts when it needs direct screen access. That makes legacy desktop workflows automatable, and Anthropic is pairing the push with more background-task support for longer agent loops.

RELEASE1mo ago

Agent Computer launches cloud computers in under 0.5s with SSH access

Agent Computer launched cloud desktops that boot in under half a second and expose persistent disks, shared credentials, SSH access, and ACP control for agents. It gives coding agents a faster place to run tools and reuse auth, but teams still need to design safe session and credential boundaries.

WORKFLOW1mo ago

Claude tests 25 Capacitor screens daily through Android CDP and iOS accessibility

A solo developer wired Claude into emulators and simulators to inspect 25 Capacitor screens daily and file bugs across web, Android, and iOS. The writeup is a solid template for unattended QA, but it also shows where iOS tooling and agent reliability still crack.

RELEASE1mo ago

OpenClaw 3.13 supports Chrome 146 via MCP for signed-in browser control

OpenClaw 3.13 now connects to a real Chrome 146 session over MCP so agents can drive your signed-in browser instead of a separate bot context. Update if captchas or auth state were blocking your web automation flows.

RELEASE1mo ago

Hermes Agent releases v0.3.0 with plugins, live Chrome CDP, and ACP IDE support

Hermes Agent v0.3.0 added a first-class plugin system, live browser attach via CDP, real-time streaming, and VS Code, Zed, and JetBrains integration through ACP. Update if you want shareable skills, browser control, and a more stable long-running agent setup.

RELEASE1mo ago

OpenAI releases GPT-5.4 mini and nano: 400K context, 2x faster mini, $0.20 nano

OpenAI shipped GPT-5.4 mini to ChatGPT, Codex, and the API, and GPT-5.4 nano to the API, with 400K context, lower prices, and stronger coding and computer-use scores. Route subagents and high-volume tasks to the smaller tiers to cut spend without giving up much capability.

RELEASE1mo ago

H Company releases Holotron-12B: 8.9k tok/s on H100 and 80.5% WebVoyager

H Company launched Holotron-12B, an open multimodal model for computer-use agents built on a hybrid SSM-attention stack that targets KV-cache bottlenecks. Benchmark it if you need high-concurrency browser agents and want better throughput without giving up web-task accuracy.

NEWS1mo ago

Perplexity launches Comet Enterprise with MDM rollout, audit logs, and CrowdStrike controls

Perplexity shipped an enterprise version of Comet with admin controls, silent deployment via MDM, telemetry, audit logs, and CrowdStrike Falcon integration. Test it if your team wants browser-native agents without giving up endpoint management and security review.

RELEASE1mo ago

Manus launches My Computer for local macOS and Windows control

Manus moved from a cloud sandbox onto local machines with My Computer, a desktop app that can organize files, run commands, and build apps on macOS and Windows. Use it if you want agent workflows over private local data and hardware instead of a remote browser sandbox.

RELEASE1mo ago

Perplexity Computer adds Android support and local Comet browser control

Perplexity expanded Computer to Android and added control of a local Comet browser session, including logged-in sites, from the agent. Try it if you want one agent workflow across mobile and browser surfaces without per-site connectors or custom MCP glue.

RELEASE2mo ago

Chrome DevTools MCP adds auto-connect for live browser sessions in Chrome 144+

Chrome DevTools MCP now lets agents attach to an existing signed-in browser session, and companion tools added one-command auto-connect flows. Use it to debug and automate in the tabs you already use instead of setting up separate logins or headless sessions.

RELEASE2mo ago

Markov AI releases Computer Use Large on Hugging Face: 48,478 videos and 12,300 hours

Markov AI released Computer Use Large on Hugging Face with 48,478 screen recordings spanning about 12,300 hours across six professional apps. Use it to train and evaluate GUI agents on real software workflows with a large CC-BY dataset.

RELEASE2mo ago

Perplexity launches Computer on iOS with cross-device sync for long-running tasks

Perplexity brought Computer to iOS with cross-device sync so multi-step cloud tasks can keep running after you leave the screen. Try it if you want to start agent workflows from a phone instead of a desktop-only session.

NEWS2mo ago

OpenClaw-RL reports continuous agent training from user corrections and next-state signals

The OpenClaw-RL paper proposes training agents continuously from normal interactions by turning user corrections, logs, and next-state feedback into rewards and word-level supervision. Watch it if you build persistent agents and want adaptation to come from live deployment traces instead of offline labeling.

NEWS2mo ago

OpenAI reports Responses API runtime uses compaction, proxy egress, and reusable skills

OpenAI published runtime details for the Responses API computer environment, including shell loops, capped output, automatic compaction, proxied outbound traffic, and reusable skills folders. Use it as a reference architecture for hosted agents that need state, safety controls, and tool execution patterns.