PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Trafilatura Python Packages

Python packages with the GitHub topic trafilatura. Sorted by relevance, with stars and monthly downloads.
opendatalab
mineru-html

MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.

714 246 24
tiramisu-sh
lurkers

Convenient API + CLI to fetch web content (HTML, YouTube, Twitter, RSS) for agents

331 0 0
mazzasaverio
url2md4ai

Lean Python tool for extracting clean, LLM-optimized markdown from web pages. Handles dynamic content with Playwright + Trafilatura for maximum information extraction efficiency.

208 4 0
Yasser03
pipescraper

A pipe-based news article scraping and metadata extraction library for Python

89 2 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery