breakingMarch 13, 2026

OpenAI reports Responses API runtime uses compaction, proxy egress, and reusable skills

OpenAI published runtime details for the Responses API computer environment, including shell loops, capped output, automatic compaction, proxied outbound traffic, and reusable skills folders. Use it as a reference architecture for hosted agents that need state, safety controls, and tool execution patterns.

3 min read

OpenAI reports Responses API runtime uses compaction, proxy egress, and reusable skills

TL;DR

OpenAI published a closer look at the Responses API runtime, centered on an agent loop where the model proposes commands, the platform executes them, and the results feed the next step, according to the thread and OpenAI's report.
The hosted environment adds concrete controls for long workflows: a shell tool, output caps that keep only the beginning and end of long logs, and automatic “compaction” that compresses old context into a smaller summary, as described in the report summary.
OpenAI also says the runtime container is meant to hold files and query structured stores like SQLite, while outbound traffic goes through a proxy that “hides real passwords” and uses placeholder secrets, per the thread.
For repeated tasks, developers can package reusable steps into “skills” folders instead of having the model rediscover the same procedure each run, based on OpenAI's report and the supporting retweet thread.

What did OpenAI actually ship?

This is less a new model feature than a reference architecture for hosted agents. In OpenAI's writeup, the Responses API runtime is described as a managed computer environment where the model operates in a loop: propose a command, run it, inspect the result, and decide the next action. The thread in Rohan Paul's summary says the interface is built around a shell tool, which gives the model access to standard command-line utilities inside the hosted workspace.

The practical point is state and execution. Instead of forcing everything through one prompt, the runtime container can store intermediate files and work with structured data stores such as SQLite, which the thread frames as a better fit than making the model read “massive raw spreadsheets.”

How does the runtime handle scale and safety?

OpenAI's design notes focus on two operational problems: context bloat and risky execution. According to the report summary, terminal output is capped so the system keeps only the start and end of very long logs, and older conversation history is automatically compacted into a smaller summary that preserves key details. The thread calls this “compaction” a way to keep long-running jobs from exhausting the model's memory budget thread details.

For safety, outbound network access is proxied rather than left open-ended. The same thread summary says the proxy masks real credentials and substitutes placeholder secrets, which matters for agents touching external services. OpenAI also describes reusable “skills” folders for repetitive workflows, so common procedures can be bundled once instead of being relearned in every run report summary.

TL;DR

What did OpenAI actually ship?

How does the runtime handle scale and safety?

Discussion across the web