Multimodal computer-use agent
ByteDance's multimodal computer-use agent/software for interacting with desktop and web UIs.
UI-TARS resurfaced as an open-source desktop-control stack while Opendesk described using accessibility APIs and marked elements instead of raw pixel guesses. The approach makes computer-use workflows more repeatable, but it still depends on human-oriented interfaces.