releaseJune 2, 2026

Perplexity Computer adds hybrid agentic inference with local-cloud model splits

Perplexity said Computer will split tasks between on-device models and frontier cloud models, keeping some data on the local machine while escalating harder work remotely. That matters for privacy-sensitive workflows and for reducing token-heavy cloud usage on laptop-class hardware.

4 min read

Perplexity Computer adds hybrid agentic inference with local-cloud model splits

TL;DR

Perplexity says Perplexity's announcement will let Computer split a task between an on-device model and frontier cloud models, with private data staying local when possible.
The linked Official blog post frames the feature as a three-way tradeoff between accuracy, privacy, and cost, and calls it the "first hybrid local-server inference" setup for Personal Computer.
According to Perplexity's Search as Code launch, Computer already defaults to a search stack that generates Python and calls Perplexity's search primitives directly instead of stepping through one tool call at a time.
Sandbox docs show the execution layer behind that push: isolated containers, background runs, and a separate $0.03 session charge on top of model and tool usage, as also linked in Perplexity's docs pointer.

You can read Perplexity's hybrid inference pitch in the launch post, trace the search layer in the Search as Code architecture writeup, and check the operational details in the sandbox docs. Aravind Srinivas added one rollout detail the main announcement skipped: Windows laptops are next.

Hybrid local-server inference

Perplexity's core claim is simple: Computer will route easy or sensitive work to a local model, then escalate harder steps to cloud models. Perplexity's launch video pitches that as a privacy and token-efficiency win, while Aravind Srinivas' follow-up adds "token efficiency per watt" and says the feature is coming to Windows laptops.

The official post makes the tradeoff explicit:

frontier models for accuracy
local execution for privacy
smaller local models to avoid spending remote compute on work they can already handle

That is more specific than the tweet. Perplexity is not just adding offline fallback, it is describing orchestration across local and server inference inside one agent run.

Search as Code

Perplexity shipped the underlying search architecture one day earlier. In the Search as Code launch, the company said agents now write Python that talks to its search stack directly, and that this path is already the default in Computer.

The corresponding research post says Search as Code exposes retrieval, ranking, filtering, fanout, and rendering primitives through an SDK, then lets the model assemble a custom pipeline at runtime. That matters here because hybrid inference is arriving on top of a product that already moved away from serial tool-calling toward generated code plus search primitives.

Perplexity also paired that launch with a benchmark tease. Perplexity's WANDR post said its in-house WANDR benchmark scored Search as Code at 0.386 versus 0.152 for the next best system, with a public release promised in the coming weeks.

Sandbox sessions and billing

The docs linked in Perplexity's Agent API post fill in the execution model behind Search as Code. The sandbox page says the model can run code in isolated containers during an Agent API request, leave long jobs running in the background, and return structured execution results with stdout, stderr, exit codes, and duration.

The pricing details are unusually concrete for a preview feature:

token usage is billed per model
each sandbox session costs $0.03
tool calls made from inside the sandbox are billed separately

Those mechanics sit a layer below the hybrid inference announcement, but they explain how Perplexity is building Computer: generated code, containerized execution, and then local-versus-cloud model routing on top.

Windows laptops

One deployment detail came from Aravind Srinivas' post, not the main product account: Perplexity says hybrid local-model support is "coming soon to Windows laptops."

That matters because the official materials describe the architecture in general terms, but the platform rollout still looks staged. As of these posts, Perplexity has announced the capability, published the architectural framing, and pointed to the supporting search and sandbox stack, but has not given a broader availability timeline beyond "coming soon."

TL;DR

Hybrid local-server inference

Search as Code

Sandbox sessions and billing

Windows laptops

Discussion across the web