OCR-VLs WebUI
Convert documents to markdown, extract raw text, and locate specific content with bounding boxes.
Mode (DeepSeekOCR)
Task (DeepSeekOCR)
150 600
OCR Engines
- DeepSeekOCR: AI-powered OCR with advanced document understanding and markdown conversion
- PaddleOCR-VL: Document parsing model that converts documents to markdown format (install with:
pip install 'paddleocr[doc-parser]') - Gemini Flash 2.5: Google Gemini model for fast, high-quality Markdown conversion (set GEMINI_API_1..5 in .env)
- olmOCR: Vision-language model for document OCR (requires Python >=3.11)
- dots.ocr: Multilingual document parser with SOTA performance on layout detection and content recognition (install with:
pip install qwen-vl-utils)
DeepSeekOCR Modes
- Gundam: 1024 base + 640 tiles with cropping - Best balance
- Tiny: 512×512, no crop - Fastest
- Small: 640×640, no crop - Quick
- Base: 1024×1024, no crop - Standard
- Large: 1280×1280, no crop - Highest quality
DeepSeekOCR Tasks
- Markdown: Convert document to structured markdown (grounding)
- Tables: Extract tables only as Markdown (grounding)
- Locate: Find specific text in image (grounding)
- Describe: General image description
- Custom: Your own prompt (add
<|grounding|>for boxes)
PaddleOCR-VL
- Document parsing model that automatically converts documents to markdown
- Supports both images and PDFs
olmOCR
- Vision-language model based on Qwen2.5-VL-7B-Instruct
- Automatically converts documents to markdown format
- Supports both images and PDFs
- Model: allenai/olmOCR-2-7B-1025
- Requires Python >=3.11 - For Hugging Face Spaces, create a
runtime.txtfile withpython-3.11or higher
dots.ocr
- Multilingual document parser based on 1.7B LLM with SOTA performance
- Achieves state-of-the-art results for text, tables, and reading order
- Supports both images and PDFs
- Model: rednote-hilab/dots.ocr
- Requires transformers >= 4.47.0 - Please upgrade transformers if you see import errors
Gemini Flash 2.5
- Google Gemini model for fast, high-quality Markdown conversion