Web Crawler Python Packages

firecrawl-py

The API to search, scrape, and interact with the web at scale. 🔥

7.2M 144K 8K

firecrawl

The API to search, scrape, and interact with the web at scale. 🔥

1.3M 144K 8K

crawlee

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

648K 9K 765

scrapegraph-py

Official Python SDK for the ScrapeGraph AI API. Smart scraping, search, crawling, markdownify, agentic browser automation, scheduled jobs, and structured data extraction

292K 80 16

scrapfly-sdk

Official Python SDK for the Scrapfly platform: web scraping, screenshots, AI extraction, crawling, and a remote anti-bot browser. Integrates with Scrapy, LlamaIndex, and LangChain.

191K 59 15

stealth-requests

Undetected web-scraping & seamless HTML parsing in Python!

46K 526 53

kreuzcrawl

High-performance web crawling engine with bindings for 11 languages

24K 120 14

search-ai-core

Search the web with advanced filters and LLM-friendly output formats!

17K 58 2

basemind

Full AI context and content layer for coding agents over one MCP server — tree-sitter code-map, document RAG, shared memory, multi-agent comms, web crawl, git history + blame. 300+ languages, 10+ agent harnesses, pure Rust.

12K 58 11

contextractor

Crawl any website and extract clean, boilerplate-free main-content text as Markdown, plain text, JSON, HTML, or raw original HTML — ready for LLMs, RAG, and vector databases. Built on rs-trafilatura + Crawlee/Playwright. Ships as a CLI, npm library, Python package, and Apify Actor.

10K 0 2

crw

Fast, lightweight Firecrawl/Tavily alternative in Rust. Web scraper, crawler & search API with MCP server for AI agents. Drop-in Firecrawl-compatible API (/scrape, /crawl, /search). 2.3x faster than Tavily, 1.5x faster than Firecrawl in 1K-URL benchmarks. 6 MB RAM, single binary. Self-host or use managed cloud.

10K 263 22

docpull

Convert the public web into AI-ready Markdown with a local Python CLI/SDK/MCP crawler.

8K 24 2

linktrace

Async web crawler with rate limiting, robots.txt support, and broken link tracking

7K 1 0

endpointscanner

Website endpoint recon tool and rate limit tester that can bypass simple captchas and WAFs.

6K 3 1

crawlberg

High-performance web crawling engine with bindings for 11 languages

3K 120 14

ghostcrawl

Authentic, reliable web data at scale — real Chrome, Firefox & WebKit browsers in the cloud. Crawl, scrape, extract & automate via API, SDKs, and MCP. Self-host free or use the managed cloud.

3K 0 0

agentcrawl-ai

Self-hosted web scraping and Markdown extraction for AI agents

2K 4 0

xseo

Local-first desktop SEO crawler. Audit your site on your own machine — no cloud, no accounts, no data leaves your computer.

2K 3 0

site-doctor

🩺 Crawl any website and audit SEO, accessibility, performance & broken links from your terminal. Scored report, JSON output, CI-gating. Zero dependencies.

2K 0 0