PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Dedupe Python Packages

Python packages with the GitHub topic dedupe. Sorted by relevance, with stars and monthly downloads.
J535D165
recordlinkage

A powerful and modular toolkit for record linkage and duplicate detection in Python

4.6M 1K 153
dedupeio
dedupe

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

76K 4K 570
zinggAI
zingg

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

5K 1K 168
mighty-justice
django-super-deduper

Utilities for de-duping Django model instances

4K 32 9
kdeldycke
mail-deduplicate

📧 CLI to deduplicate mails from mail boxes

976 198 42
knjcode
imgdupes

Identifying and removing near-duplicate images using perceptual hashing.

542 389 24
kdeldycke
maildir-deduplicate

📧 CLI to deduplicate mails from mail boxes

408 198 42
dedupeio
dedupe-fork-eccovia

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

303 4K 570
laktak
chkbit

Check your files for data corruption and run quick file deduplication

298 176 13
dssg
superdeduper

A simple command line interface to the datamade/dedupe library.

234 43 5
yugn
yadupe

Yet another tool to find and remove duplicate files.

133 0 1
chansooligans
oagdedupe

Developed for Use by NY Office of the Attorney General: A Python library for scalable entity resolution, using active learning to learn blocking configurations, generate comparison pairs, then clasify matches

132 2 1
dssg
pgdedupe

A simple command line interface to the datamade/dedupe library.

94 43 5
dedupeio
dedupe-fh

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

91 4K 570
    • Data from PyPI, GitHub, ClickHouse, and BigQuery