Skip to content
AI Primer
release

Baidu releases Unlimited OCR with 3B params for single-pass long documents

Baidu released Unlimited OCR as an open-source long-document OCR model with 3B total parameters and 500M active at inference. Early ParseBench testing says it is strong on tables and reading order but weaker on semantic formatting and charts, giving teams a new open-weight OCR option with clear tradeoffs.

3 min read
Baidu releases Unlimited OCR with 3B params for single-pass long documents
Baidu releases Unlimited OCR with 3B params for single-pass long documents

TL;DR

You can jump from the repo to the Hugging Face page, skim the docs, and the attached demo in _akhaliq's post makes the pitch obvious fast: this model is aimed at the messy part of OCR, multi-page documents where layout, tables, and continuity usually fall apart.

R-SWA

The headline feature is not raw parameter count. It is the attention pattern.

According to aibuilderclub_'s summary, Unlimited OCR uses R-SWA to hold a constant KV cache while keeping enough reference context to parse long documents in a single pass. The diagram in

shows the split between a deep encoder and a 3B MoE decoder with a sliding reference window.

That targets a specific failure mode in document OCR: page boundaries. aibuilderclub_ frames the win as better continuity across long PDFs, where tables lose structure, captions drift, and reading order breaks.

ParseBench tradeoffs

Early benchmarking says the model is good in exactly the places Baidu emphasized, and shaky in a few others.

Jerry Liu, founder of LlamaIndex, wrote in his ParseBench post that Unlimited OCR is strong on table parsing and proper reading order, but struggles more on semantic formatting and charts. The benchmark table in

shows that split clearly:

  • Tables, Unlimited OCR 70.2 vs. PaddleOCR-VL-1.6 at 67.8
  • Text faithfulness, 86.8 vs. 82.7
  • Semantic formatting, 1.0 vs. 54.6
  • Charts, 1.3 vs. 54.2
  • Layout element rule pass rate, 71.5 vs. 77.8

That does not look like a general replacement for every OCR workload. It looks like a new open-weight option with a sharp profile: better on long-document structure, less convincing on formatting-heavy or chart-heavy parsing.

Open weights and early hacking

The fastest signal that a model has landed is usually someone trying to wrap it in a faster local stack before the day is over.

In doodlestein's post, the plan is already a Rust library and CLI called FrankenOCR, with specialized kernels for Apple Silicon and x86 CPUs. The follow-up dashboard screenshot in

shows parallel agents pulling a Baidu dossier, kernel research, and a crate plan at the same time.

That makes this release more interesting than a plain paper drop. Baidu shipped an open long-document OCR model, and the first community response was immediate systems work around packaging, kernels, and local inference ergonomics.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 2 threads
TL;DR2 posts
R-SWA1 post
Share on X