PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Common Crawl Python Packages

Python packages with the GitHub topic common-crawl. Sorted by relevance, with stars and monthly downloads.
KenObata
distributed-curator

Partition-aware MinHash LSH deduplication for large-scale text data curation on Apache Spark

2K 1 0
MigoXLab
dingo-python

Dingo: A Comprehensive AI Data, Model and Application Quality Evaluation Tool

1K 700 72
4thel00z
ccdown

A rust based, resumable downloader cli and python library for Common Crawl data

479 0 0
connor-marchand
gau-python

This library gets urls from AlienVault's Open Threat Exchange, the Wayback Machine, and Common Crawl. Inspired by Corbin Leo's gau

222 3 0
DataEval
dingo-client

Dingo: A Comprehensive AI Data, Model and Application Quality Evaluation Tool

66 702 72
    • Data from PyPI, GitHub, ClickHouse, and BigQuery