PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Article Extractor Python Packages

Python packages with the GitHub topic article-extractor. Sorted by relevance, with stars and monthly downloads.
adbar
trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

8.9M 6K 371
myifeng
article-parser

A parser that parses articles from any url or html

922 50 6
opendatalab
mineru-html

MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.

714 246 24
artiomn
markdown-tool

Parse markdown article, download images and replace images URL's with local paths

279 128 28
rexdivakar
llmparser

Turn any website into clean, structured content that language models can actually read.

195 2 0
arachnio
arachnio

Arachnio client library for Python 3.10+

114 0 0
johnbumgarner
newshound

This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around the world in over 50 languages.

101 34 3
    • Data from PyPI, GitHub, ClickHouse, and BigQuery