Crawling Python Packages | PyRank

courlan

Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters

10.3M 176 14

scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

6.2M 63K 12K

newspaper3k

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

838K 15K 2K

scrapling

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

797K 68K 7K

crawlee

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

648K 9K 765

abx-dl

⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/subtitles, PDFs, screenshots, article text, git repos, and more...

124K 128 7

scrapy-selenium

Scrapy middleware to handle javascript pages using selenium

31K 952 352

spidermon

Scrapy Extension for monitoring spiders execution.

27K 560 102

kreuzcrawl

High-performance web crawling engine with bindings for 11 languages

24K 120 14

arxiv2text

Converting PDF files to text, mainly with a focus on arXiv papers.

13K 25 2

grab

Web Scraping Framework

5K 2K 276

proxycrawl

A Python class that acts as wrapper for ProxyCrawl scraping and crawling API

4K 58 19

crawlberg

High-performance web crawling engine with bindings for 11 languages

3K 120 14

crawlbase

Fast python library for the Crawlbase API

2K 25 2

mlscraper

Scrape HTML automatically with machine learning.

2K 1K 93

newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

2K 15K 2K

scrapyrt

HTTP API for Scrapy spiders

2K 882 162

memorious

Lightweight web scraping toolkit for documents and structured data.

2K 315 64

aioscpy

An asyncio + aiolibs crawler imitate scrapy framework

1K 115 10

maalfrid-toolkit

Toolkit for the Målfrid project

839 1 0

qcrawl

qcrawl - fast async web crawling & scraping framework for Python.

834 110 6

docscrape

Scrape any documentation site to Markdown in seconds

661 0 0

aminer-scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

631 63K 12K

lulu

A simple and clean video/music/image downloader that supports many websites ��

630 804 139