PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Duplicate Detection Python Packages

Python packages with the GitHub topic duplicate-detection. Sorted by relevance, with stars and monthly downloads.
nomic-ai
nomic

Nomic Developer API SDK

37K 2K 197
akamhy
videohash

Near Duplicate Video Detection (Perceptual Video Hashing) - Get a 64-bit comparable hash-value for any video.

11K 373 62
AI-team-UoA
pyjedai

An open-source library that leverages Python’s data science ecosystem to build powerful end-to-end Entity Resolution workflows.

2K 94 13
justinshenk
simages

Find similar images in a dataset

1K 23 3
akcarsten
duplicate-finder

This Python packages identifies duplicate files in a folder of interest.

799 24 5
nmavail
nmavail

Async Python CLI tool to check name availability across domains, GitHub, NPM, PyPI, Crates.io, and system packages. Zero-config tool for startups and developers."

457 0 0
deplicate
deplicate

Advanced Duplicate File Finder for Python. Nothing is impossible to solve.

450 79 17
markusressel
py-image-dedup

A library to find duplicate images and delete unwanted ones

422 170 18
drogers0
clonehunter

Find duplicate code in mixed-language repositories using semantic and lexical similarity

378 2 0
zeronyk
imageduplicatefinder

Simple duplication finder for Images, matches on names and then compares image hashes.

358 0 1
erikreed
pydupes

A duplicate file finder that may be faster in environments with millions of files and terabytes of data.

323 4 2
exponential-decay
demystify-digipres

engine for the analysis of DROID and Siegfried file format reports

259 33 5
KeyWeeUsr
thebear

Bear - the decluttering deduplicator

242 4 1
ChuckNorrison
imgdups

Very fast image duplicate finder with pickle and cv2

220 2 0
elcorto
findsame

Find duplicate files and directories using hashes and a Merkle tree

203 6 1
MarcinOrlowski
dhunter

Fast, content based duplicate file detector with cache and more!

197 0 0
sophiaconsulting
fast-suffix-array

O(n) suffix array construction (SA-IS) with LCP arrays, BWT, FM-index, and pattern search. Rust-powered Python bindings.

142 0 0
vuolter
deplicate-cli

Command Line Interface for deplicate.

142 3 1
giosali
dupeutil

A command-line program written in Python for detecting and removing duplicate files.

96 0 0
NicolasBi
dupe-eraser

A command-line tool which automate the deletion of duplicate files based on their hash or perceptual-hash.

75 13 0
callforpapers-source
doc2term

A fast sentence/word tokenizer, and punctuation remover.

42 2 1
dealfonso
searchdups

Search for duplicate files

35 0 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery