MinerU supports local PDF-to-Markdown OCR with 109 languages and MCP
MinerU was documented as a local OCR pipeline for PDF, Office, and image-to-Markdown with LaTeX formulas, tables, and 109 languages. The workflow adds mineru -p, mineru-api, Gradio, and an MCP server for Claude Desktop or Cursor.

TL;DR
- The OCR demo framed MinerU as local conversion for PDF, Word, PPT, Excel, and images into Markdown, with formulas kept as LaTeX, tables kept as tables, and images extracted as files.
- The pipeline note broke the stack into LayoutLMv3 layout detection, UniMERNet formula detection and recognition, PaddleOCR OCR across 109 languages, and RapidTable table recognition.
- The CLI post gave
mineru -p doc.pdf -o ./output --return_images trueand said the Markdown references extracted images by relative path. - The Gradio post showed local
mineru-apiplusmineru-gradio, while the CLI post said MinerU ships an MCP server for Claude Desktop or Cursor.
The repo is on GitHub, the official docs split usage across Quick Usage, CLI tools, output files, and model sources, and the agent-facing integration lives in the MinerU Open MCP README.
Local PDF-to-Markdown
Official project docs describe MinerU as a document parser that converts PDF, image, DOCX, PPTX, and XLSX inputs into Markdown and JSON for retrieval, extraction, and processing.
The local-model reply said the tool can work for IELTS extraction and described it as a local small model without those safety guardrails. The Quick Start docs add a caveat from the maintainers: complex layouts, scanned pages, and handwritten content may still produce imperfect parsing results.
Pipeline components
Here is the model stack as the pipeline note broke it down:
- Layout detection: LayoutLMv3
- Formula detection and recognition: UniMERNet
- OCR: PaddleOCR, with 109-language support
- Table recognition: RapidTable
The current README frames the broader system as:
- VLM+OCR dual engine
- 109-language OCR
- Formulas to LaTeX and tables to HTML
- Scanned docs, handwriting, multi-column layouts, and cross-page table merging
- Human reading order with automatic header and footer removal
Output files
In the CLI post, --return_images true writes images as separate files and leaves relative references in the generated Markdown.
The output file docs add the machine-readable side: content_list.json is a flat reading-order block list with page_idx, bbox, type, and content details, while middle.json carries intermediate page, layout, image, table, equation, and discarded-block data.
CLI and Gradio
The shortest CLI path from the CLI post is one command:
The Quick Usage docs say mineru launches a temporary local mineru-api when --api-url is omitted, or connects to an existing local or remote FastAPI service when --api-url is provided. The same docs list POST /tasks, POST /file_parse, GET /tasks/{task_id}, GET /tasks/{task_id}/result, and GET /health for the API surface.
The Gradio command used a split API plus browser UI flow:
Model sources
MinerU does not require one fixed model download path. The model source docs say the default auto policy checks Hugging Face first and falls back to ModelScope when Hugging Face is unavailable.
The same docs define MINERU_MODEL_SOURCE values for huggingface, modelscope, and local. For local models, mineru-models-download fetches the model files and writes their path into mineru.json, then MINERU_MODEL_SOURCE=local points runtime calls at the local copy.
MCP server
The CLI post said MinerU ships an MCP server for Claude Desktop or Cursor. The official MinerU Open MCP README calls it an official server that exposes MinerU document parsing as MCP tools for PDFs, Word docs, PowerPoint files, and images into Markdown.
The MCP README also says Flash mode works without an API key, with lower limits and Markdown-only output. Setting MINERU_API_TOKEN unlocks higher limits and extra output formats, and the README shows uvx mineru-open-mcp setup for stdio MCP clients including Claude Desktop, Cursor, and Windsurf.