PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Tesseract Python Packages

Python packages with the GitHub topic tesseract. Sorted by relevance, with stars and monthly downloads.
pymupdf
pymupdf

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

79.5M 10K 726
pymupdf
pymupdfb

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

4.8M 10K 726
ocrmypdf
ocrmypdf

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

783K 34K 2K
sirfz
tesserocr

A Python wrapper for the tesseract-ocr API

367K 2K 259
kreuzberg-dev
kreuzberg

A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.

174K 8K 488
tebelorg
tagui

Python package for doing RPA

12K 5K 727
tebelorg
rpa

Python package for doing RPA

11K 5K 727
sivakumar-mahalingam
fastmrz

⚡Extracting the Machine Readable Zone (MRZ) from passport or any document images

5K 192 41
amenezes
aiopytesseract

asyncio tesseract wrapper for Tesseract-OCR

2K 27 7
xiahongze
pysseract

Python binding to Tesseract 4.0 API

2K 1 1
pasteurlabs
tesseract-jax

Execute + differentiate Tesseracts as part of JAX programs, with full support for function transformations like JIT, grad, and more. ⚡

1K 31 3
vietanhdev
anyocr

A lightweight, unified OCR toolkit with a one-liner API. Supports Surya, EasyOCR, PaddleOCR, Tesseract, and Vision LLMs through a single interface.

851 0 0
nometria
medical-ocr

Multi-engine OCR pipeline for medical and legal documents

696 1 0
icaropires
pdf2dataset

Converts a whole subdirectory with a big (or small) volume of PDF documents to a dataset (pandas DataFrame) with error tracking and choice of features

674 19 5
wolfmanstout
screen-ocr

Easily perform OCR on portions of the screen, choosing from a selection of backends.

599 50 7
AxaFrance
axa-fr-ocr

AXA France OCR library

551 3 0
Lucs1590
nkocr

This is a module to make specifics OCRs at food products and nutricional tables.

532 39 11
fullstackcrew-alpha
privacy-mask

Automatically redacts sensitive data in screenshots before sending to AI agents

509 10 1
tmsincomb
imagetocsv

Converts an image to a CSV. This exists because Chorus 3.0 is bat-shit and only show images for vital metadata.

453 5 2
egemenzeytinci
readmrz

Machine readable zone reader on ID cards https://pypi.org/project/readmrz

336 20 10
fourdigits
wagtail-textract

Text extraction for Wagtail document search

334 34 14
bandrel
ocyara

Performs OCR on image files and scans them for matches to YARA rules

324 42 8
engeir
northern-lights-forecast

A northern lights forecast that automatically send a telegram notification during substorm events.

322 1 0
asiff00
bangla-pdf-ocr

A package to extract Bengali text from PDFs using OCR

322 21 3
    • Data from PyPI, GitHub, ClickHouse, and BigQuery