PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Html To Markdown Python Packages

Python packages with the GitHub topic html-to-markdown. Sorted by relevance, with stars and monthly downloads.
adbar
trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

8.9M 6K 371
firecrawl
firecrawl-py

πŸ”₯ Search, scrape, and clean the web for AI agents.

6.7M 121K 7K
firecrawl
firecrawl

πŸ”₯ Search, scrape, and clean the web for AI agents.

997K 121K 7K
spider-rs
spider-client

Python, Javascript, and Rust libraries for the Spider Cloud API.

412K 25 9
Spenhouet
confluence-markdown-exporter

Export Atlassian Confluence pages as markdown files.

40K 423 109
tim-gromeyer
pyhtml2md

Transform your HTML into clean, easy-to-read markdown with html2md.

17K 81 11
us
crw

Fast, lightweight Firecrawl alternative in Rust. Web scraper, crawler & search API with MCP server for AI agents. Drop-in Firecrawl-compatible API (/v1/scrape, /v1/crawl, /v1/search). 2.3x faster than Tavily, 1.5x faster than Firecrawl in 1K-URL benchmarks. 6 MB RAM, single binary. Self-host or use managed cloud.

6K 89 5
pankaj28843
article-extractor

Pure-Python article extraction library and HTTP API - Extract clean content from web pages as Markdown or HTML

2K 0 0
nanonets
llm-data-converter

Convert any document format into LLM-ready data format (markdown) with advanced intelligent document processing capabilities powered by pre-trained models.

1K 7 1
muchdogesec
file2txt

Turn a supported list of filetypes (e.g. .docx) into a markdown structured text file. Also optionally defangs indicators and extract texts from images. Built for threat intel use-cases.

842 12 2
QuartzUnit
markgrab

Universal web content extraction β€” any URL to LLM-ready markdown

779 0 0
paulpierre
markdown-crawler

A multithreaded πŸ•ΈοΈ web crawler that recursively crawls a website and creates a πŸ”½ markdown file for each page, designed for LLM RAG

711 442 54
nanonets
document-data-extractor

Convert any document format into LLM-ready data format (markdown) with advanced intelligent document processing capabilities powered by pre-trained models.

359 7 1
renesugar
html2txt

Convert HTML to markdown

318 1 2
mazzasaverio
url2md4ai

Lean Python tool for extracting clean, LLM-optimized markdown from web pages. Handles dynamic content with Playwright + Trafilatura for maximum information extraction efficiency.

221 4 0
yannickperrenet
bookmarkdown

Parse your browser's exported HTML bookmark file to Markdown.

192 18 0
spider-rs
spiderwebai-py

Python, Javascript, and Rust libraries for the Spider Cloud API.

178 25 9
spider-rs
spidercloud-py

Python, Javascript, and Rust libraries for the Spider Cloud API.

32 25 9
spider-rs
spiderclient-py

Python SDK for Spider Cloud API

31 25 9
trubitsyn
bookmarks2markdown

Convert bookmarks to Markdown

3 5 1
    • Data from PyPI, GitHub, ClickHouse, and BigQuery