releaseApril 29, 2026

OpenAI adds WebSocket mode to Responses API for 40% faster Codex loops

OpenAI added WebSocket mode to the Responses API and says it cuts repeated work across Codex tool loops, improving end-to-end speed by up to 40%. The change reduces runtime overhead for agent workflows, not just base-model latency.

3 min read

OpenAI adds WebSocket mode to Responses API for 40% faster Codex loops

TL;DR

OpenAIDevs said OpenAI added WebSocket mode to the Responses API, and the company says that makes Codex-style agent loops up to 40% faster end to end.
In pierceboggan's VS Code note, GitHub says the same transport shift makes supported OpenAI models 12% faster inside @code, with no user configuration.
The technical change in OpenAI's engineering post is straightforward: keep the connection open, retain response state server-side, and stop resending full context on every tool turn.
According to pierceboggan's thread, the WebSocket work ships alongside a broader token-efficiency push that also includes prompt caching improvements, a tool search tool, and new search and execution tools.

You can read OpenAI's writeup, check the exact VS Code 1.118 changelog note, browse the open source Codex repo, and the Codex app server docs already spell out how WebSocket transport works in OpenAI's broader coding stack.

WebSocket mode

The core change is replacing per-turn HTTP calls with a persistent WebSocket connection. In the VS Code 1.118 update, GitHub says the client now sends only new input items plus the previous response ID, while the server keeps the conversation state.

That makes this a harness optimization, not a new base-model speed claim. OpenAIDevs framed the bottleneck as API overhead that started to dominate once Codex inference got faster.

Warm state and fewer hops

OpenAI's engineering post says the latency work landed in four places:

in-memory caching of rendered tokens and model configuration
fewer intermediate network hops
faster safety checks
persistent connections that keep response state warm across tool calls

pierceboggan's thread puts that in the context of usage-based billing. GitHub says its @code team is also working on prompt caching improvements, a tool search tool, and new tools for search and execution to cut token waste without changing agent quality.

Codex app server

One extra tell is in the Codex app server docs. The server already supports stdio plus an experimental ws:// listener, uses JSON-RPC 2.0 messages, exposes /readyz and /healthz probes, and documents an overload error for full ingress queues.

That does not prove the Responses API feature was implemented through the same stack. It does show WebSocket transport was already a first-class idea in OpenAI's coding tooling, with connection management and backpressure behavior documented before this speedup post landed.

TL;DR

WebSocket mode

Warm state and fewer hops

Codex app server

Discussion across the web