OCR-VLs WebUI

Convert documents to markdown, extract raw text, and locate specific content with bounding boxes.

OCR Engine

Choose between DeepSeekOCR, PaddleOCR-VL, olmOCR, or dots.ocr, Gemini Flash 2.5

Mode (DeepSeekOCR)
Task (DeepSeekOCR)
150 600

OCR Engines

  • DeepSeekOCR: AI-powered OCR with advanced document understanding and markdown conversion
  • PaddleOCR-VL: Document parsing model that converts documents to markdown format (install with: pip install 'paddleocr[doc-parser]')
  • Gemini Flash 2.5: Google Gemini model for fast, high-quality Markdown conversion (set GEMINI_API_1..5 in .env)
  • olmOCR: Vision-language model for document OCR (requires Python >=3.11)
  • dots.ocr: Multilingual document parser with SOTA performance on layout detection and content recognition (install with: pip install qwen-vl-utils)

DeepSeekOCR Modes

  • Gundam: 1024 base + 640 tiles with cropping - Best balance
  • Tiny: 512×512, no crop - Fastest
  • Small: 640×640, no crop - Quick
  • Base: 1024×1024, no crop - Standard
  • Large: 1280×1280, no crop - Highest quality

DeepSeekOCR Tasks

  • Markdown: Convert document to structured markdown (grounding)
  • Tables: Extract tables only as Markdown (grounding)
  • Locate: Find specific text in image (grounding)
  • Describe: General image description
  • Custom: Your own prompt (add <|grounding|> for boxes)

PaddleOCR-VL

  • Document parsing model that automatically converts documents to markdown
  • Supports both images and PDFs

olmOCR

  • Vision-language model based on Qwen2.5-VL-7B-Instruct
  • Automatically converts documents to markdown format
  • Supports both images and PDFs
  • Model: allenai/olmOCR-2-7B-1025
  • Requires Python >=3.11 - For Hugging Face Spaces, create a runtime.txt file with python-3.11 or higher

dots.ocr

  • Multilingual document parser based on 1.7B LLM with SOTA performance
  • Achieves state-of-the-art results for text, tables, and reading order
  • Supports both images and PDFs
  • Model: rednote-hilab/dots.ocr
  • Requires transformers >= 4.47.0 - Please upgrade transformers if you see import errors

Gemini Flash 2.5

  • Google Gemini model for fast, high-quality Markdown conversion