Crawler Python Packages

trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

10.5M 6K 392

courlan

Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters

10.3M 176 14

firecrawl-py

The API to search, scrape, and interact with the web at scale. 🔥

7.2M 144K 8K

selectolax

Python binding to Modest and Lexbor engines. Fast HTML5 parser with CSS selectors for Python.

6.3M 2K 91

scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

6.2M 63K 12K

rnet

An ergonomic, censorship-resistant Python HTTP Client

1.4M 1K 110

firecrawl

The API to search, scrape, and interact with the web at scale. 🔥

1.3M 144K 8K

newspaper3k

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

838K 15K 2K

scrapling

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

797K 68K 7K

crawlee

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

648K 9K 765