PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Crawler Python Packages

Python packages with the GitHub topic crawler. Sorted by relevance, with stars and monthly downloads.
adbar
trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

8.9M 6K 371
adbar
courlan

Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters

8.5M 171 13
rushter
selectolax

Python binding to Modest and Lexbor engines. Fast HTML5 parser with CSS selectors for Python.

7.4M 2K 92
firecrawl
firecrawl-py

🔥 Search, scrape, and clean the web for AI agents.

6.8M 121K 7K
scrapy
scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

3.7M 62K 12K
codelucas
newspaper3k

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

1M 15K 2K
firecrawl
firecrawl

🔥 Search, scrape, and clean the web for AI agents.

971K 121K 7K
D4Vinci
scrapling

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

881K 50K 5K
JoMingyu
google-play-scraper

Google play scraper for Python inspired by <facundoolano/google-play-scraper>

564K 971 246
apify
crawlee

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

536K 9K 742
spider-rs
spider-client

Python, Javascript, and Rust libraries for the Spider Cloud API.

417K 25 9
Luqman-Ud-Din
random-user-agent

A package to get list of user agents based on filters such as operating system, software name etc..

397K 103 12
0x676e67
rnet

An ergonomic Python HTTP Client with TLS fingerprint

241K 1K 105
moskrc
crawlerdetect

🕷CrawlerDetect is a Python library designed to identify bots, crawlers, and spiders by analyzing their user agents.

150K 44 11
scrapy-plugins
scrapy-zyte-api

Zyte API integration for Scrapy

106K 41 22
jpramosi
geckordp

A client implementation of Firefox DevTools over remote debug protocol in python

90K 38 14
hellock
icrawler

A multi-thread crawler framework with many builtin image crawlers provided.

79K 924 179
0x676e67
wreq

An ergonomic Python HTTP Client with TLS fingerprint

68K 1K 105
hect0x7
jmcomic

Python API for JMComic | 提供Python API访问禁漫天堂,同时支持网页端和移动端 | 禁漫天堂GitHub Actions下载器🚀

41K 6K 11K
rmax
scrapy-redis

Redis-based components for Scrapy.

39K 6K 2K
tobocop2
lilbee

Terminal-first local search and AI chat over your documents, code, and crawled websites. Semantic + hybrid search, vision OCR, auto-built wiki, browsable GGUF model catalog. Works as CLI, TUI, MCP server, REST API, or Python library. Offline by default, no sidecar services.

37K 18 3
yuanxu-li
html-table-extractor

extract data from html table

29K 88 22
scrapy-plugins
scrapy-zyte-smartproxy

Zyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy

25K 365 91
scrapy-plugins
scrapy-crawlera

Zyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy

23K 365 91
    • Data from PyPI, GitHub, ClickHouse, and BigQuery