PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Web Scraping Python Packages

Python packages with the GitHub topic web-scraping. Sorted by relevance, with stars and monthly downloads.
lexiforest
curl-cffi

Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.

27.4M 6K 486
adbar
htmldate

Fast and robust date extraction from web pages, with Python or on the command-line

11.2M 149 30
deedy5
primp

HTTP client that can impersonate web browsers

10.8M 523 54
adbar
trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

8.9M 6K 371
rushter
selectolax

Python binding to Modest and Lexbor engines. Fast HTML5 parser with CSS selectors for Python.

7.4M 2K 92
firecrawl
firecrawl-py

🔥 Search, scrape, and clean the web for AI agents.

6.8M 121K 7K
Kaliiiiiiiiii-Vinyzu
patchright

Undetected Python version of the Playwright testing and automation library.

5.3M 1K 99
scrapy
scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

3.7M 62K 12K
seleniumbase
seleniumbase

APIs for browser automation, testing, and bypassing bot-detection.

3.5M 13K 2K
serpapi
google-search-results

Google Search Results via SERP API pip Python Package

1.7M 737 121
firecrawl
firecrawl

🔥 Search, scrape, and clean the web for AI agents.

971K 121K 7K
D4Vinci
scrapling

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

881K 50K 5K
apify
crawlee

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

536K 9K 742
spider-rs
spider-client

Python, Javascript, and Rust libraries for the Spider Cloud API.

417K 25 9
ScrapeGraphAI
scrapegraph-py

Official Python SDK for the ScrapeGraph AI API. Smart scraping, search, crawling, markdownify, agentic browser automation, scheduled jobs, and structured data extraction

318K 76 14
scrapfly
scrapfly-sdk

Official Python SDK for the Scrapfly platform: web scraping, screenshots, AI extraction, crawling, and a remote anti-bot browser. Integrates with Scrapy, LlamaIndex, and LangChain.

256K 55 15
0x676e67
rnet

An ergonomic Python HTTP Client with TLS fingerprint

241K 1K 105
CloakHQ
cloakbrowser

Stealth Chromium that passes every bot detection test. Drop-in Playwright replacement with source-level fingerprint patches. 30/30 tests passed.

238K 13K 989
lit26
finvizfinance

Finviz analysis python library.

134K 1K 231
rebrowser
rebrowser-playwright

A drop-in replacement for playwright-python patched with rebrowser-patches. It allows to pass modern automation detection tests.

100K 99 10
steel-dev
steel-sdk

The official Python library for the Steel API

97K 32 5
ArchiveBox
abx-plugins

🧩 Plugins and extractors that ArchiveBox + abx-dl use: chrome, ytdlp, wget, singlefile, readability, forum-dl, gallery-dl, papers-dl, and more...

82K 5 0
0x676e67
wreq

An ergonomic Python HTTP Client with TLS fingerprint

68K 1K 105
deedy5
pyreqwest-impersonate

HTTP client that can impersonate web browsers

53K 523 54
    • Data from PyPI, GitHub, ClickHouse, and BigQuery