releaseMay 5, 2026

SubQ launches 12M-token SSA model with SubQ Code early access

SubQ launched a sub-quadratic sparse-attention model with a 12 million token context window and opened early access alongside SubQ Code. The company claims 52x faster 1M-token performance than FlashAttention and under 5% of Opus cost, putting long-context coding workflows into a new price and latency band.

5 min read

SubQ launches 12M-token SSA model with SubQ Code early access

TL;DR

Subquadratic launched SubQ, a new model the company says is built on fully sub-quadratic sparse attention, with alex_whedon's launch thread and the official post both centering a 12 million token context window.
The company says SubQ is 52x faster than FlashAttention at 1M tokens, cuts attention compute by nearly 1,000x at 12M tokens, and lands at roughly one-fifth the cost of other leading models, according to alex_whedon's launch thread and the SSA explainer.
Early access opened for three surfaces at once: the API, SubQ Code in the CLI, and SubQ Search, with alex_whedon's follow-up post linking to the request form and the launch post spelling out the private beta lineup.
The pitch is less about a bigger number than fewer workarounds: hasantoxr's commentary thread frames it as a break from chunking, summaries, and sliding windows, while the official site says SubQ Code can load an entire repository into one context window.

You can read the official launch post, skim the SSA architecture explainer, and poke through the homepage examples, which size the Python 3.13 standard library at about 5.1M tokens and six months of React PRs at about 7.5M. There is also already a tiny Hacker News thread, although the official materials carry most of the actual detail right now.

What shipped

Subquadratic came out of stealth with one model and two product wrappers around it. The company is calling the model SubQ 1M-Preview, while the public-facing pitch keeps stressing the larger 12M-token research result.

According to the launch post, early access covers:

API: full-context access for developers and enterprise teams.
SubQ Code: a CLI coding agent that loads an entire codebase into one context window.
SubQ Search: a long-context search product pitched as deep research with chatbot speed.

The homepage adds a few more operating details: 150 tokens per second, streaming and tool use for the API, and examples built around full repositories, long PR histories, and persistent agent state.

SSA

The architectural claim is the whole story here. In the SSA explainer, Subquadratic says dense attention compares every token with every other token, while SSA uses content-dependent selection to route attention only to the positions that matter.

That is why the company keeps talking about scaling behavior, not just context length. The official materials say compute grows linearly with context length under this design, and the launch post says the research model cuts attention compute by almost 1,000x at 12M tokens.

A lot of the community reaction immediately mapped this to workflow pain. LinusEkenstam's post ties the announcement to teams that spent months building around context collapse with chunking and retrieval, which is basically the exact stack Subquadratic is trying to obsolete.

Benchmarks and economics

Subquadratic is pairing the architecture pitch with a benchmark sheet that aims straight at coding and long-context retrieval. In the launch post, the company reports:

RULER 128K: 95.0% for SubQ 1M-Preview, versus 94.8% for Claude Opus 4.6.
MRCR v2: 65.9 for the production model, with an 83 score for a research result.
SWE-Bench Verified: 81.8, versus 80.8 for Opus 4.6 and 80.0 for DeepSeek 4.0 Pro.
Architecture-level speed: 52x faster than FlashAttention at 1M tokens, with 63% less compute.

The cost pitch is just as aggressive. alex_whedon's launch thread says SubQ runs at less than 5% the cost of Opus, while the homepage simplifies that to about one-fifth the cost of other leading LLMs. Those are company claims, but they are the reason the launch reads like a coding workflow story rather than a pure model paper story.

SubQ Code

SubQ Code is the clearest creative-tool angle in the launch. The official announcement says the CLI agent can plan, execute, and review across a full repository in a single pass, without the coordination overhead of multi-agent systems.

The product summary attached to the subq.ai homepage sketches the intended workflow more concretely:

map a full codebase into one context window,
answer token-heavy questions faster,
plug into existing coding surfaces like Claude Code, Codex, and Cursor,
auto-redirect expensive turns, and
install as a one-line layer on top of existing agent setups.

That makes the launch feel closer to an infrastructure layer for coding agents than a standalone chatbot. The model announcement and the coding agent announcement are effectively the same thing.

Team and backing

The company also used the launch post to disclose a $29 million seed round. The announcement names investors including Javier Villamizar, Justin Mateen of JAM Fund, Grant Gittlin of Lasagna, and Jaclyn Rice Nelson of Coalition Operators.

The post anchors the team around Justin Dangel, co-founder and CEO of Subquadratic, and Alex Whedon, CTO at Subquadratic, and says the research group includes 11 PhD researchers and engineers from Meta, Google, Oxford, Cambridge, ByteDance, Adobe, and Microsoft. For a model company making a first-principles architecture claim, that team section is doing almost as much persuasion work as the benchmark table.

TL;DR

What shipped

SSA

Benchmarks and economics

SubQ Code

Team and backing

Discussion across the web