PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Crawling Python Packages

Python packages with the GitHub topic crawling. Sorted by relevance, with stars and monthly downloads.
adbar
courlan

Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters

8.5M 171 13
scrapy
scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

3.7M 62K 12K
codelucas
newspaper3k

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

1M 15K 2K
D4Vinci
scrapling

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

881K 50K 5K
apify
crawlee

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

536K 9K 742
ArchiveBox
abx-dl

⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/subtitles, PDFs, screenshots, article text, git repos, and more...

75K 114 6
clemfromspace
scrapy-selenium

Scrapy middleware to handle javascript pages using selenium

27K 952 352
scrapinghub
spidermon

Scrapy Extension for monitoring spiders execution.

23K 558 102
dsdanielpark
arxiv2text

Converting PDF files to text, mainly with a focus on arXiv papers.

13K 24 2
kreuzberg-dev
kreuzcrawl

High-performance web crawling engine with bindings for 11 languages

10K 97 12
lorien
grab

Web Scraping Framework

6K 2K 278
codelucas
newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

3K 15K 2K
scrapinghub
scrapyrt

HTTP API for Scrapy spiders

2K 880 162
alephdata
memorious

A minimalistic, recursive web crawling library for Python.

2K 315 64
lorey
mlscraper

🤖 Scrape data from HTML websites automatically by just providing examples

2K 1K 92
ihandmine
aioscpy

An asyncio + aiolibs crawler imitate scrapy framework

2K 115 10
crawlbase-source
crawlbase

Fast python library for the Crawlbase API

2K 25 2
proxycrawl
proxycrawl

ProxyCrawl Python library for scraping and crawling

1K 58 19
zaidkx37
shopscout

Scrape any Shopify store - products, collections, pages, metadata & reviews from the public JSON API. SDK + CLI + REST API.

1K 1 0
bluet
proxybroker2

The New (auto rotate) Proxy [Finder | Checker | Server]. HTTP(S) & SOCKS :performing_arts:

1K 995 136
NationalLibraryOfNorway
maalfrid-toolkit

Toolkit for the Målfrid project

856 1 0
iawia002
lulu

[Unmaintained] A simple and clean video/music/image downloader 👾

802 805 140
rivermont
spidy-web-crawler

The simple, easy to use command line web crawler.

659 354 69
scrapy
aminer-scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

627 62K 12K
    • Data from PyPI, GitHub, ClickHouse, and BigQuery