Web Scraping Python Packages

curl-cffi

Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.

34.9M 6K 513

htmldate

Fast and robust date extraction from web pages, with Python or on the command-line

12.4M 153 31

primp

HTTP client that can impersonate web browser tls/http2 fingerprints

10.9M 551 58

trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

10.5M 6K 392

firecrawl-py

The API to search, scrape, and interact with the web at scale. 🔥

7.2M 144K 8K

selectolax

Python binding to Modest and Lexbor engines. Fast HTML5 parser with CSS selectors for Python.

6.3M 2K 91

scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

6.2M 63K 12K

patchright

Undetected Python version of the Playwright testing and automation library.

3.6M 1K 107

seleniumbase

📊 APIs for web automation, testing, and bypassing bot-detection.

3.1M 13K 2K

apify-client

Apify API client for Python—Programmatically run Actors, manage and stream data from storages (datasets, key-value stores, request queues), schedule and monitor runs, and access the full Apify platform API. Sync and async interfaces with automatic retries and pagination.

2.5M 94 17

google-search-results

Google Search Results via SERP API pip Python Package

1.6M 750 120

rnet

An ergonomic, censorship-resistant Python HTTP Client

1.4M 1K 110

firecrawl

The API to search, scrape, and interact with the web at scale. 🔥

1.3M 144K 8K

httpmorph

httpmorph is a drop-in replacement for Python's requests library that uses a custom C implementation with BoringSSL instead of Python's standard HTTP stack.

1.2M 149 4

scrapling

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

797K 68K 7K

crawlee

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

648K 9K 765

cloakbrowser

Stealth Chromium that passes every bot detection test. Drop-in Playwright replacement with source-level fingerprint patches. 30/30 tests passed.

542K 28K 2K

spider-client

Python, Javascript, and Rust libraries for the Spider Cloud API.

368K 26 9

scrapegraph-py

Official Python SDK for the ScrapeGraph AI API. Smart scraping, search, crawling, markdownify, agentic browser automation, scheduled jobs, and structured data extraction

292K 80 16

apify

Apify SDK for Python—The official library for building Apify Actors: serverless cloud programs for web scraping, browser automation, data processing, and AI agents. Manages the Actor lifecycle, storages (datasets, key-value stores, request queues), events, proxies, and pay-per-event monetization. Built on top of the the Apify API Client.

254K 172 25

scrapfly-sdk

Official Python SDK for the Scrapfly platform: web scraping, screenshots, AI extraction, crawling, and a remote anti-bot browser. Integrates with Scrapy, LlamaIndex, and LangChain.

191K 59 15

finvizfinance

Finviz analysis python library.

158K 1K 254

steel-sdk

The official Python library for the Steel API

107K 34 5

rebrowser-playwright

A drop-in replacement for playwright-python patched with rebrowser-patches. It allows to pass modern automation detection tests.

104K 101 10