breakingMarch 13, 2026

Terminal-Bench 2.0 removes OpenBlocks after cheating verification

Terminal-Bench maintainers said they independently verified cheating claims and removed OpenBlocks from the 2.0 leaderboard. Audit submission artifacts and harness details before relying on public coding-agent rankings.

Coding Agents Evals Benchmarks

2 min read

Terminal-Bench 2.0 removes OpenBlocks after cheating verification

TL;DR

Terminal-Bench maintainers said they "independently verified" cheating claims against OpenBlocks and removed it from the 2.0 leaderboard, reversing the earlier claim that OB-1 was "#1 on Terminal Bench" removal notice ranking claim.
The practical takeaway for engineers is that leaderboard position alone was not durable here: the maintainers pointed to public submission artifacts as part of a community-auditable process submission note.
Before the removal, OpenBlocks had been promoted as usable with a local model subscription, with the linked repo describing a local proxy for routing requests to OpenAI-compatible backends instead of the default cloud path local model post OB1 repo.
A supporting implementation detail from the same thread was OB-1's "microcompact" behavior, described as compressing context during a session to preserve longer working context microcompact note.

What happened to the OpenBlocks ranking?

Alex Shaw

@alexgshaw

·Follow

We independently verified these claims and removed OpenBlocks from the Terminal-Bench 2.0 leaderboard. Thank you @NoCommas for helping us keep leaderboard entries honest! Recent leaderboard submissions are in huggingface.co/datasets/harbo… which makes it easy for the community to work Show more

Monk Zero

@NoCommas

x.com/i/article/2032…

2:11 AM · Mar 14, 2026

237

Read 13 replies

Terminal-Bench 2.0 maintainers said they "independently verified these claims" and removed OpenBlocks from the leaderboard. The same post says recent submissions live in a public dataset, making it easier for the community to inspect entries and "detect cheating" removal notice leaderboard dataset.

That matters because OpenBlocks had just been advertised as "#1 on Terminal Bench" in the launch thread for OB-1 ranking claim. For engineering teams comparing coding agents, this is a reminder that public rankings are only as useful as the submission rules, harness transparency, and the maintainers' willingness to revoke results after audit.

What technical claims were attached to OB-1?

Before the removal, the OpenBlocks thread paired the ranking claim with product details: OB-1 could be used with a "local model subscription" and was said to be tested with GPT-5.4 and Opus 4.6 local model post. The linked repository describes a local proxy setup that can rewrite OB-1 requests to a local API server or other OpenAI-compatible backend, letting users bring their own keys or route through infrastructure they control OB1 repo.

A follow-up post also highlighted a "microcompact" feature that reportedly compacts context during the session so the model keeps a "longer refreshed context" microcompact note. That may still be a useful implementation idea, but the leaderboard removal means engineers should separate those product mechanics from the now-invalidated benchmark standing.