PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Documents Python Packages

Python packages with the GitHub topic documents. Sorted by relevance, with stars and monthly downloads.
h2non
jsonpath-ng

Finally, a JSONPath implementation for Python that aims to be standard compliant. That's all. Enjoy!

72.4M 730 113
docling-project
docling

Get your documents ready for gen AI

7.2M 60K 4K
deeplook
svglib

Read SVG files and convert them to other formats.

4.8M 363 86
docling-project
docling-slim

Get your documents ready for gen AI

1.1M 60K 4K
topk-io
topk-sdk

Provide the right context to your agents.

136K 69 3
signnow
signnow-python-sdk

Official SignNow SDK for Python. Sign documents, request eSignatures, and build role-based multi-signer workflows via REST API.

83K 12 7
konstantint
passporteye

Extraction of machine-readable zone information from passports, visas and id-cards via OCR

17K 450 122
karolzak
boxdetect

BoxDetect is a Python package based on OpenCV which allows you to easily detect rectangular shapes like character or checkbox boxes on scanned forms.

14K 113 21
Aquilesorei
strutex

strutex is a Python library designed to extract JSON from documents .

8K 10 0
seanpedrick-case
doc-redaction

Redact PDF/image-based documents, Word, or CSV/XLSX files using a graphical user interface. Demo: https://huggingface.co/spaces/seanpedrickcase/document_redaction or with try with VLMs: https://huggingface.co/spaces/seanpedrickcase/document_redaction_vlm

3K 50 10
openlegaldata
oldp

Open Legal Data Platform

3K 143 25
ispras
dedoc

Extract content and logical tree structure from textual documents

2K 681 56
Manu11-Pro
organisingfiles-by-type

File Organiser which organises files by type

2K 1 0
firdausmntp
dokumen-pintar

Universal MCP server for document CRUD across formats (TXT, JSON, YAML, CSV, XML, DOCX, XLSX, PPTX, PDF) — with versioning, sandboxed multi-root workspace, full-text search, batch ops, and optional semantic search.

2K 0 0
phil65
docler

Abstractions & Tools for OCR / document processing

2K 5 2
Hugues-DTANKOUO
olgadoc

Four formats. One engine. PDF, DOCX, XLSX, HTML → Markdown and typed JSON, 15–40× faster than equivalent-quality OSS. Rust core with strictly-typed Python bindings.

1K 8 0
tboy1337
pr2md

Pull Request Markdown Generator

877 1 0
nometria
medical-ocr

Multi-engine OCR pipeline for medical and legal documents

696 1 0
mouraworks
docowling

SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for powering downstream workflows such as gen AI applications.

673 3 1
docling-project
docling-sdg

A set of tools to create synthetically-generated data from documents

578 48 18
kreuzberg-dev
kreuzberg-surrealdb

Kreuzberg-to-SurrealDB connector for document ingestion pipelines — schema management, content deduplication, chunk storage, and index configuration

528 13 1
cyrildever
redacted-py

Redacting classified documents

502 3 0
stanford-oval
churro-ocr

CHURRO is an OCR toolkit for historical document transcription, built to make handwritten and printed sources readable at high accuracy and lower cost.

464 43 4
usercando
pullcite

Evidence-backed structured extraction from documents

427 1 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery