PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Web Archiving Python Packages

Python packages with the GitHub topic web-archiving. Sorted by relevance, with stars and monthly downloads.
akamhy
waybackpy

Wayback Machine API interface & a command-line tool

2.5M 579 40
webrecorder
warcio

Streaming WARC/ARC library for fast web archive IO

1.3M 457 69
webrecorder
pywb

Core Python Web Archiving Toolkit for replay and recording of web archives

12K 2K 239
ArchiveBox
archivebox

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

7K 28K 2K
bellingcat
auto-archiver

Automatically archive links to videos, images, and social media content from Google Sheets (and more).

6K 1K 100
cocrawler
cdx-toolkit

A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine

4K 206 34
oduwsdl
ipwb

InterPlanetary Wayback: A distributed and persistent archive replay system using IPFS

4K 651 41
webrecorder
cdxj-indexer

CDXJ Indexing of WARC/ARCs

4K 34 15
GeiserX
wayback-archive

Download complete websites from the Wayback Machine with full asset preservation for offline viewing

776 10 6
Own-Data-Privateer
hoardy-web

Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewing, replay, mirroring, data scraping, and/or indexing. Your own personal private Wayback Machine that can also archive HTTP POST requests and responses, as well as most other HTTP-level data.

496 127 10
caltechlibrary
eprints2archives

Send records from an EPrints server to the Internet Archive and other web archives

388 4 0
internetarchive
fatcat-openapi-client

Perpetual Access To The Scholarly Record

315 121 18
internetarchive
scrapy-warcio

Scrapy WARC I/O

300 24 6
Florents-Tselai
warcdb

WarcDB: Web crawl data as SQLite databases.

269 405 10
eliask
farchive

Local content-addressed archive with observation history. Stores bytes by SHA-256, preserves locator state as contiguous spans, compresses with zstd and corpus-trained dictionaries. SQLite-backed.

266 7 1
GeiserX
wayback-diff

Intelligent web page comparison tool with Wayback Machine support and visual regression testing

208 1 0
Own-Data-Privateer
hoardy-web-sas

A simple archiving server for the `Hoardy-Web` Web Extension browser add-on.

199 127 10
ArchiveBox
archivebox-likn

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

169 28K 2K
ikreymer
pywayback

Core Python Web Archiving Toolkit for replay and recording of web archives

1 2K 239
    • Data from PyPI, GitHub, ClickHouse, and BigQuery