PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Scraping Python Packages

Python packages with the GitHub topic scraping. Sorted by relevance, with stars and monthly downloads.
adbar
trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

8.9M 6K 371
fake-useragent
fake-useragent

Up-to-date simple useragent faker with real world database

7.4M 4K 536
firecrawl
firecrawl-py

🔥 Search, scrape, and clean the web for AI agents.

6.8M 121K 7K
scrapy
parsel

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

4.2M 1K 161
scrapy
scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

3.7M 62K 12K
apify
apify-client

Apify API client for Python

2M 93 16
serpapi
google-search-results

Google Search Results via SERP API pip Python Package

1.7M 737 121
daijro
browserforge

🎭 Intelligent browser header & fingerprint generator

1.5M 1K 83
apify
apify-fingerprint-datapoints

Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.

1.5M 2K 223
ultrafunkamsterdam
undetected-chromedriver

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)

999K 13K 1K
firecrawl
firecrawl

🔥 Search, scrape, and clean the web for AI agents.

971K 121K 7K
D4Vinci
scrapling

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

881K 50K 5K
kennethreitz
requests-html

Pythonic HTML Parsing for Humans™

827K 326 42
daijro
camoufox

🦊 Anti-detect browser

738K 9K 722
MichaelTatarski
fake-http-header

A python package to generate random request fields for a http header.

639K 44 2
apify
crawlee

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

536K 9K 742
ScrapeGraphAI
scrapegraph-py

Official Python SDK for the ScrapeGraph AI API. Smart scraping, search, crawling, markdownify, agentic browser automation, scheduled jobs, and structured data extraction

318K 76 14
apify
apify

The Apify SDK for Python is the official library for creating Apify Actors in Python. It provides useful features like actor lifecycle management, local storage emulation, and actor event handling.

234K 168 23
d60
twikit

Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot

165K 4K 537
ZenRows
zenrows

SDK to access ZenRows API directly from Python. We handle proxies rotation, headless browsers and CAPTCHAs for you.

147K 18 9
simonw
shot-scraper

A command-line utility for taking automated screenshots of websites

118K 2K 116
soxoj
maigret

🕵️‍♂️ Collect a dossier on a person by username from 3000+ sites

108K 29K 2K
scrapy-plugins
scrapy-zyte-api

Zyte API integration for Scrapy

106K 41 22
rebrowser
rebrowser-playwright

A drop-in replacement for playwright-python patched with rebrowser-patches. It allows to pass modern automation detection tests.

100K 99 10
    • Data from PyPI, GitHub, ClickHouse, and BigQuery