Scraping Python Packages

trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

10.5M 6K 392

fake-useragent

Up-to-date simple useragent faker with real world database

10.3M 4K 538

firecrawl-py

The API to search, scrape, and interact with the web at scale. 🔥

7.2M 144K 8K

scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

6.2M 63K 12K

parsel

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

4.8M 1K 162

browserforge

🎭 Intelligent browser header & fingerprint generator

2.9M 1K 82

apify-client

Apify API client for Python—Programmatically run Actors, manage and stream data from storages (datasets, key-value stores, request queues), schedule and monitor runs, and access the full Apify platform API. Sync and async interfaces with automatic retries and pagination.

2.5M 94 17

camoufox

🦊 Anti-detect browser

2M 10K 835

google-search-results

Google Search Results via SERP API pip Python Package

1.6M 750 120

apify-fingerprint-datapoints

Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.

1.6M 2K 254

firecrawl

The API to search, scrape, and interact with the web at scale. 🔥

1.3M 144K 8K

undetected-chromedriver

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)

1M 13K 1K

scrapling

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

797K 68K 7K

crawlee

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

648K 9K 765

fake-http-header

A python package to generate random request fields for a http header.

627K 44 2

requests-html

Pythonic HTML Parsing for Humans™

565K 327 42

scrapegraph-py

Official Python SDK for the ScrapeGraph AI API. Smart scraping, search, crawling, markdownify, agentic browser automation, scheduled jobs, and structured data extraction

292K 80 16

apify

Apify SDK for Python—The official library for building Apify Actors: serverless cloud programs for web scraping, browser automation, data processing, and AI agents. Manages the Actor lifecycle, storages (datasets, key-value stores, request queues), events, proxies, and pay-per-event monetization. Built on top of the the Apify API Client.

254K 172 25

zenrows

SDK to access ZenRows API directly from Python. We handle proxies rotation, headless browsers and CAPTCHAs for you.

185K 19 9

twikit

174K 5K 553

socid-extractor

⛏️ The extraction engine behind Maigret: turn any profile URL into a structured OSINT record across 150+ sites

127K 1K 109

oxylabs-mcp

Official Oxylabs MCP integration

126K 99 25

abx-dl

⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/subtitles, PDFs, screenshots, article text, git repos, and more...

124K 128 7

shot-scraper

A CLI utility for taking screenshots of websites, recording video demos and scraping sites using JavaScript

121K 2K 120