PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Pdf To Markdown Python Packages

Python packages with the GitHub topic pdf-to-markdown. Sorted by relevance, with stars and monthly downloads.
yfedoseev
pdf-oxide

The fastest PDF library for Python and Rust. Text extraction, image extraction, markdown conversion, PDF creation & editing. 0.8ms mean, 5× faster than industry leaders, 100% pass rate on 3,830 PDFs. MIT/Apache-2.0.

127K 756 82
grammy-jiang
research-pipeline

Deterministic stage-based pipeline for searching, screening, downloading, converting, and summarizing academic papers. CLI + MCP server.

15K 1 0
yfedoseev
pdf-oxide-fips

The fastest PDF library for Python and Rust. Text extraction, image extraction, markdown conversion, PDF creation & editing. 0.8ms mean, 5× faster than industry leaders, 100% pass rate on 3,830 PDFs. MIT/Apache-2.0.

4K 756 82
NameetP
pdfmux

PDF extraction that checks its own work. #2 reading order accuracy — zero AI, zero GPU, zero cost.

3K 63 7
SakuraMathcraft
mathcraft-ocr

A Windows math workspace for screenshot OCR, handwriting-to-LaTeX, editing, preview, and symbolic computation, powered by MathCraft OCR and MathLive.

3K 167 15
iamarunbrahma
vision-parse

Parse PDFs into markdown using Vision LLMs

2K 475 66
nanonets
docstrange

Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple formats (Markdown, JSON, CSV, HTML) with intelligent structured data extraction and advanced OCR.

2K 1K 132
Hugues-DTANKOUO
olgadoc

Four formats. One engine. PDF, DOCX, XLSX, HTML → Markdown and typed JSON, 15–40× faster than equivalent-quality OSS. Rust core with strictly-typed Python bindings.

1K 8 0
nanonets
llm-data-converter

Convert any document format into LLM-ready data format (markdown) with advanced intelligent document processing capabilities powered by pre-trained models.

1K 7 1
muchdogesec
file2txt

Turn a supported list of filetypes (e.g. .docx) into a markdown structured text file. Also optionally defangs indicators and extract texts from images. Built for threat intel use-cases.

1K 12 2
altaidevorg
llm-food

Serving files for hungry LLMs

813 26 0
wisupai
wisup-e2m

E2M converts various file types (doc, docx, epub, html, htm, url, pdf, ppt, pptx, mp3, m4a) into Markdown. It’s easy to install, with dedicated parsers and converters, supporting custom configs. E2M offers an all-in-one, flexible, and open-source solution.

728 1K 73
shoryasethia
markdrop

A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted tables/images using several LLM clients. And many more functionalities. Markdrop is available on PyPI.

700 204 18
stanford-oval
churro-ocr

CHURRO is an OCR toolkit for historical document transcription, built to make handwritten and printed sources readable at high accuracy and lower cost.

464 43 4
herrkaefer
anything2md

Convert documents to Markdown using Cloudflare Workers AI toMarkdown.

374 1 0
nanonets
document-data-extractor

Convert any document format into LLM-ready data format (markdown) with advanced intelligent document processing capabilities powered by pre-trained models.

363 7 1
ignatenkofi
docshelf-mcp

MCP server that turns a folder of PDFs and Markdown into an AI-friendly document shelf. Convert, split by chapter, auto-index — Claude/ChatGPT answers from a 5 KB INDEX over raw URLs.

343 0 0
TylerMorrison21
paperflow-postprocess

PaperFlow markdown post-processing for citations, figures, tables, and frontmatter.

201 22 3
drmingler
smart-llm-loader

smart-llm-loader is a lightweight yet powerful Python package that transforms any document into LLM-ready chunks. Spend less time on preprocessing headaches and more time building what matters. From RAG systems to chatbots to document Q&A, SmartLLMLoader handles the heavy lifting so you can focus on creating exceptional AI applications.

181 76 3
markdownbridge
markdownbridge

Python SDK for the MarkdownBridge OCR API — convert documents, images, and PDFs to Markdown with one line of code

115 0 0
credeed
credeed-pdf-to-markdown

Convert PDF to Markdown using AI, can be used for Agent to understand documents.

102 0 0
iamarunbrahma
multimodal-parser

Parse PDFs into markdown using Vision LLMs

1 465 66
    • Data from PyPI, GitHub, ClickHouse, and BigQuery