PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Internet Archiving Python Packages

Python packages with the GitHub topic internet-archiving. Sorted by relevance, with stars and monthly downloads.
akamhy
waybackpy

Wayback Machine API interface & a command-line tool

2.6M 579 40
ArchiveBox
abx-plugins

🧩 Plugins and extractors that ArchiveBox + abx-dl use: chrome, ytdlp, wget, singlefile, readability, forum-dl, gallery-dl, papers-dl, and more...

83K 5 0
ArchiveBox
abx-dl

⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/subtitles, PDFs, screenshots, article text, git repos, and more...

76K 114 6
ArchiveBox
archivebox

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

8K 28K 2K
mikwielgus
forum-dl

Scrape posts, threads from forums, news aggregators, mail archives, export to JSONL, mailbox, WARC

3K 117 6
Own-Data-Privateer
hoardy-web

Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewing, replay, mirroring, data scraping, and/or indexing. Your own personal private Wayback Machine that can also archive HTTP POST requests and responses, as well as most other HTTP-level data.

487 127 10
Quoorex
archive-file-urls

Submit URLs listed inside a file to website archival services

324 3 0
Own-Data-Privateer
hoardy-web-sas

A simple archiving server for the `Hoardy-Web` Web Extension browser add-on.

195 127 10
ArchiveBox
archivebox-likn

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

180 28K 2K
    • Data from PyPI, GitHub, ClickHouse, and BigQuery