PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Document Python Packages

Python packages with the GitHub topic document. Sorted by relevance, with stars and monthly downloads.
jshwi
docsig

Check Python signature params for proper documentation

439K 42 3
fcakyon
craft-text-detector

This repo is deprecated. Please refer to new up-to-date repo: https://github.com/fcakyon/craft-text-detector

16K 12 1
Byaidu
pdf2zh

[EMNLP 2025 Demo] PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/MCP/Docker/Zotero

16K 34K 3K
sunholo-data
ailang-parse

Universal document parsing and generation in AILANG. Deterministic Office (DOCX/PPTX/XLSX) extraction, AI-powered PDF/image parsing, 9-format document generation.

4K 0 1
moskize91
doc-page-extractor

Document page extraction tool powered by DeepSeek-OCR.

4K 13 7
henrihapponen
docxedit

Edit Word (.docx) documents effortlessly without changing the original formatting.

3K 23 3
osllmai
indox

The Indox Ecosystem offers integrated AI tools for data workflows. Our four components (IndoxArcg, IndoxMiner, IndoxJudge, and IndoxGen) enhance AI applications with advanced retrieval, extraction, evaluation, and generation capabilities, supporting multiple document formats and LLM providers.

2K 19 2
oomol-lab
pdf-craft

PDF craft can convert PDF files into various other formats. This project will focus on processing PDF files of scanned books.

2K 6K 395
stencila
stencila

Documents with Scientific Intelligence

2K 883 57
emcf
thepipe-api

Get clean data from tricky documents, powered by VLMs.

2K 2K 99
pstwh
docuwarp

Docuwarp is a Python library for unwarping documents

2K 7 0
Michael-A-Kuykendall
contextlite

Database Freedom Platform - Mathematical search optimization for whatever database you already have. 27,000x faster than vector databases with SMT-powered search across 8+ database types. One-time 9-2999 vs 00-500/month recurring.

2K 17 5
farfarfun
funread

文档阅读和解析工具包 - 支持多种文档格式的读取和解析

2K 1 0
DavidSchobl
edof

Python library for programmatic document creation, template filling and export (.edof format)

1K 1 0
Sinapsis-AI
sinapsis-bertopic

Package for topic modeling using BERTopic, including templates for fitting models and making predictions.

1K 0 0
retospect
precis-summary

Fast extractive summarization via RAKE keyword extraction

1K 0 0
Sinapsis-AI
sinapsis-langchain-readers

Package that provides support for Langchain community data loaders.

1K 27 7
tenseleyFlow
document-language-model

Document-first local LLM training, preference mining, retraining, and multi-target export from .dlm docs, codebases, and multimodal sources.

1K 0 0
Sinapsis-AI
sinapsis-langchain

Package with sinapsis templates to support langchain functionality

1K 27 7
klich3
rocket-store

Using the filesystem as a searchable database.

999 1 0
rossumai
docile-benchmark

Tools to work with the DocILE dataset and benchmark

882 146 12
tboy1337
pr2md

Pull Request Markdown Generator

877 1 0
onedoclabs
client-onedoc

Onedoc SDK for Python

824 71 2
yushulx
document-scanner-sdk

Document Scanner SDK for document edge detection, border cropping, perspective correction and brightness adjustment

702 2 1
    • Data from PyPI, GitHub, ClickHouse, and BigQuery