Tesseract Python Packages | PyRank

pymupdf

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

100.9M 10K 750

pymupdfb

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

5.3M 10K 750

ocrmypdf

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

918K 34K 2K

tesserocr

A Python wrapper for the tesseract-ocr API

472K 2K 260

kreuzberg

A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.

191K 9K 507

fastmrz

⚡Extracting the Machine Readable Zone (MRZ) from passport or any document images

12K 191 39

tagui

Python package for doing RPA

11K 5K 722

rpa

Python package for doing RPA

11K 5K 722

aiopytesseract

A Python asyncio wrapper for Tesseract-OCR.

2K 27 7

tesseract-jax

Tesseract JAX executes Tesseracts as part of JAX programs, with full support for function transformations like JIT, `grad`, and more.

2K 31 2

arabic-pdf2ebook

Convert scanned Arabic PDF books into e-reader friendly EPUBs (OCR + image modes, RTL, local web UI, CrossPoint Wi-Fi send)

2K 0 0

pysseract

Python binding to Tesseract 4.0 API

2K 1 1

vnc-remote-control

Layout-aware VNC/RFB remote control CLI: type, key, click, screenshot, OCR, against any VNC/RFB server.

1K 0 0

pdf2dataset

Easily convert a subdirectory with big volume of PDF documents into a dataset, supports extracting text and images

870 19 5

xberg

A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.

785 9K 507

mosaic-bench

A benchmark suite for differentiable physics solvers

763 2 0

anyocr

A lightweight, unified OCR toolkit with a one-liner API. Supports Surya, EasyOCR, PaddleOCR, Tesseract, and Vision LLMs through a single interface.

697 0 0

nkocr

🔎📝 This is a module to make specifics OCRs at food products and nutritional tables.

598 40 11

screen-ocr

Library for processing screen contents using OCR

586 50 7

arabic-extract

Clean Arabic text extraction from PDFs and scanned images — OCR + visual-order repair in one pipeline

516 0 0

axa-fr-ocr

AXA France OCR library

516 3 0

wagtail-textract

Text extraction for Wagtail document search

514 34 14

readmrz

Machine readable zone reader on ID cards

480 20 10

tesseract-torch

Execute + differentiate Tesseracts as part of PyTorch programs, with full support for reverse-mode and forward-mode AD. 🔦

449 3 0