Skip to content
AI Primer
breaking

Terminal-Bench 2.0 removes OpenBlocks after cheating verification

Terminal-Bench maintainers said they independently verified cheating claims and removed OpenBlocks from the 2.0 leaderboard. Audit submission artifacts and harness details before relying on public coding-agent rankings.

2 min read
Terminal-Bench 2.0 removes OpenBlocks after cheating verification
Terminal-Bench 2.0 removes OpenBlocks after cheating verification

TL;DR

  • Terminal-Bench maintainers said they "independently verified" cheating claims against OpenBlocks and removed it from the 2.0 leaderboard, reversing the earlier claim that OB-1 was "#1 on Terminal Bench" removal notice ranking claim.
  • The practical takeaway for engineers is that leaderboard position alone was not durable here: the maintainers pointed to public submission artifacts as part of a community-auditable process submission note.
  • Before the removal, OpenBlocks had been promoted as usable with a local model subscription, with the linked repo describing a local proxy for routing requests to OpenAI-compatible backends instead of the default cloud path local model post OB1 repo.
  • A supporting implementation detail from the same thread was OB-1's "microcompact" behavior, described as compressing context during a session to preserve longer working context microcompact note.

What happened to the OpenBlocks ranking?

Terminal-Bench 2.0 maintainers said they "independently verified these claims" and removed OpenBlocks from the leaderboard. The same post says recent submissions live in a public dataset, making it easier for the community to inspect entries and "detect cheating" removal notice leaderboard dataset.

That matters because OpenBlocks had just been advertised as "#1 on Terminal Bench" in the launch thread for OB-1 ranking claim. For engineering teams comparing coding agents, this is a reminder that public rankings are only as useful as the submission rules, harness transparency, and the maintainers' willingness to revoke results after audit.

What technical claims were attached to OB-1?

Before the removal, the OpenBlocks thread paired the ranking claim with product details: OB-1 could be used with a "local model subscription" and was said to be tested with GPT-5.4 and Opus 4.6 local model post. The linked repository describes a local proxy setup that can rewrite OB-1 requests to a local API server or other OpenAI-compatible backend, letting users bring their own keys or route through infrastructure they control OB1 repo.

A follow-up post also highlighted a "microcompact" feature that reportedly compacts context during the session so the model keeps a "longer refreshed context" microcompact note. That may still be a useful implementation idea, but the leaderboard removal means engineers should separate those product mechanics from the now-invalidated benchmark standing.

Share on X