releaseJune 24, 2026

Baidu releases Unlimited OCR with 3B params for single-pass long documents

Baidu released Unlimited OCR as an open-source long-document OCR model with 3B total parameters and 500M active at inference. Early ParseBench testing says it is strong on tables and reading order but weaker on semantic formatting and charts, giving teams a new open-weight OCR option with clear tradeoffs.

3 min read

Baidu releases Unlimited OCR with 3B params for single-pass long documents

TL;DR

Baidu shipped WesRoth's launch post for Unlimited OCR as an open-source long-document OCR model with 3 billion total parameters and about 500 million active during inference.
According to aibuilderclub_'s architecture summary, the key trick is Reference Sliding Window Attention, or R-SWA, which keeps KV cache size constant while parsing dozens of pages in one pass.
Jerry Liu's ParseBench check found strong table parsing and reading order, but weaker semantic formatting and chart handling than PaddleOCR-VL-1.6.
The release already has the usual open model distribution path, with aibuilderclub_ linking the repo and Hugging Face and _akhaliq pointing to docs for people who want to inspect or run it.

You can jump from the repo to the Hugging Face page, skim the docs, and the attached demo in _akhaliq's post makes the pitch obvious fast: this model is aimed at the messy part of OCR, multi-page documents where layout, tables, and continuity usually fall apart.

R-SWA

The headline feature is not raw parameter count. It is the attention pattern.

According to aibuilderclub_'s summary, Unlimited OCR uses R-SWA to hold a constant KV cache while keeping enough reference context to parse long documents in a single pass. The diagram in

shows the split between a deep encoder and a 3B MoE decoder with a sliding reference window.

That targets a specific failure mode in document OCR: page boundaries. aibuilderclub_ frames the win as better continuity across long PDFs, where tables lose structure, captions drift, and reading order breaks.

ParseBench tradeoffs

Early benchmarking says the model is good in exactly the places Baidu emphasized, and shaky in a few others.

Jerry Liu, founder of LlamaIndex, wrote in his ParseBench post that Unlimited OCR is strong on table parsing and proper reading order, but struggles more on semantic formatting and charts. The benchmark table in

shows that split clearly:

Tables, Unlimited OCR 70.2 vs. PaddleOCR-VL-1.6 at 67.8
Text faithfulness, 86.8 vs. 82.7
Semantic formatting, 1.0 vs. 54.6
Charts, 1.3 vs. 54.2
Layout element rule pass rate, 71.5 vs. 77.8

That does not look like a general replacement for every OCR workload. It looks like a new open-weight option with a sharp profile: better on long-document structure, less convincing on formatting-heavy or chart-heavy parsing.

Open weights and early hacking

The fastest signal that a model has landed is usually someone trying to wrap it in a faster local stack before the day is over.

In doodlestein's post, the plan is already a Rust library and CLI called FrankenOCR, with specialized kernels for Apple Silicon and x86 CPUs. The follow-up dashboard screenshot in

shows parallel agents pulling a Baidu dossier, kernel research, and a crate plan at the same time.

That makes this release more interesting than a plain paper drop. Baidu shipped an open long-document OCR model, and the first community response was immediate systems work around packaging, kernels, and local inference ergonomics.

TL;DR

R-SWA

ParseBench tradeoffs

Open weights and early hacking

Discussion across the web