workflowJuly 3, 2026

MinerU supports local PDF-to-Markdown OCR with 109 languages and MCP

MinerU was documented as a local OCR pipeline for PDF, Office, and image-to-Markdown with LaTeX formulas, tables, and 109 languages. The workflow adds mineru -p, mineru-api, Gradio, and an MCP server for Claude Desktop or Cursor.

4 min read

MinerU supports local PDF-to-Markdown OCR with 109 languages and MCP

TL;DR

The OCR demo framed MinerU as local conversion for PDF, Word, PPT, Excel, and images into Markdown, with formulas kept as LaTeX, tables kept as tables, and images extracted as files.
The pipeline note broke the stack into LayoutLMv3 layout detection, UniMERNet formula detection and recognition, PaddleOCR OCR across 109 languages, and RapidTable table recognition.
The CLI post gave mineru -p doc.pdf -o ./output --return_images true and said the Markdown references extracted images by relative path.
The Gradio post showed local mineru-api plus mineru-gradio, while the CLI post said MinerU ships an MCP server for Claude Desktop or Cursor.

The repo is on GitHub, the official docs split usage across Quick Usage, CLI tools, output files, and model sources, and the agent-facing integration lives in the MinerU Open MCP README.

Local PDF-to-Markdown

Official project docs describe MinerU as a document parser that converts PDF, image, DOCX, PPTX, and XLSX inputs into Markdown and JSON for retrieval, extraction, and processing.

The local-model reply said the tool can work for IELTS extraction and described it as a local small model without those safety guardrails. The Quick Start docs add a caveat from the maintainers: complex layouts, scanned pages, and handwritten content may still produce imperfect parsing results.

Pipeline components

Here is the model stack as the pipeline note broke it down:

Layout detection: LayoutLMv3
Formula detection and recognition: UniMERNet
OCR: PaddleOCR, with 109-language support
Table recognition: RapidTable

The current README frames the broader system as:

VLM+OCR dual engine
109-language OCR
Formulas to LaTeX and tables to HTML
Scanned docs, handwriting, multi-column layouts, and cross-page table merging
Human reading order with automatic header and footer removal

Output files

In the CLI post, --return_images true writes images as separate files and leaves relative references in the generated Markdown.

The output file docs add the machine-readable side: content_list.json is a flat reading-order block list with page_idx, bbox, type, and content details, while middle.json carries intermediate page, layout, image, table, equation, and discarded-block data.

CLI and Gradio

The shortest CLI path from the CLI post is one command:

The Quick Usage docs say mineru launches a temporary local mineru-api when --api-url is omitted, or connects to an existing local or remote FastAPI service when --api-url is provided. The same docs list POST /tasks, POST /file_parse, GET /tasks/{task_id}, GET /tasks/{task_id}/result, and GET /health for the API surface.

The Gradio command used a split API plus browser UI flow:

Model sources

MinerU does not require one fixed model download path. The model source docs say the default auto policy checks Hugging Face first and falls back to ModelScope when Hugging Face is unavailable.

The same docs define MINERU_MODEL_SOURCE values for huggingface, modelscope, and local. For local models, mineru-models-download fetches the model files and writes their path into mineru.json, then MINERU_MODEL_SOURCE=local points runtime calls at the local copy.

MCP server

The CLI post said MinerU ships an MCP server for Claude Desktop or Cursor. The official MinerU Open MCP README calls it an official server that exposes MinerU document parsing as MCP tools for PDFs, Word docs, PowerPoint files, and images into Markdown.

The MCP README also says Flash mode works without an API key, with lower limits and Markdown-only output. Setting MINERU_API_TOKEN unlocks higher limits and extra output formats, and the README shows uvx mineru-open-mcp setup for stdio MCP clients including Claude Desktop, Cursor, and Windsurf.