PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Fuzzy Matching Python Packages

Python packages with the GitHub topic fuzzy-matching. Sorted by relevance, with stars and monthly downloads.
taleinat
fuzzysearch

Find parts of long text or data, allowing for some changes/typos.

786K 342 27
moj-analytical-services
splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

781K 2K 236
mammothb
symspellpy

Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

414K 870 126
matchms
matchms

Python library for processing (tandem) mass spectrometry data and for computing spectral similarities.

54K 256 79
chrislit
abydos

Abydos NLP/IR library for Python

49K 194 43
gandersen101
spaczz

Fuzzy matching and more functionality for spaCy.

15K 259 31
RobinL
fuzzymatcher

Record linking package that fuzzy matches two Python pandas dataframes using sqlite3 fts4

12K 286 60
benzsevern
goldenmatch

Polyglot entity-resolution + data-quality toolkit. Zero-config auto-config (negative-evidence + Path Y) hits DQbench composite 91.04 (T3 53.8% → 85.5%). Holds 0.96 DBLP-ACM, 0.94 Febrl3, 0.97 NCVR. GoldenCheck → GoldenFlow → GoldenMatch → GoldenPipe. MCP per package, multi-arch containers, Airflow DAGs, browser workbench.

9K 44 6
maxharlow
csvmatch

🔎 Finds fuzzy matches between CSV files

8K 191 21
proycon
analiticcl

an approximate string matching or fuzzy-matching system for spelling correction, normalisation or post-OCR correction (mirror of https://codeberg.org/proycon/analiticcl)

7K 39 4
yymao
fuzzyname

A simple python class for fuzzy name matching

5K 2 0
zinggAI
zingg

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

5K 1K 168
cangyuanli
floof

Fuzzymatching made easy

4K 5 0
Christopher-Thornton
hmni

📛 Fuzzy Name Matching with Machine Learning

4K 268 51
austinv11
prefixtrie

This is a high-performance implementation of a Prefix Trie to perform efficient fuzzy string matches.

2K 1 0
iomega
spec2vec

Word2Vec based similarity measure of mass spectrometry data.

2K 84 20
rosette-api
rosette-api

Babel Street Analytics Client Library for Python

2K 38 37
fritshermans
deduplipy

End-to-end deduplication solution

2K 82 8
benzsevern
goldenpipe

Polyglot entity-resolution + data-quality toolkit. Zero-config auto-config (negative-evidence + Path Y) hits DQbench composite 91.04 (T3 53.8% → 85.5%). Holds 0.96 DBLP-ACM, 0.94 Febrl3, 0.97 NCVR. GoldenCheck → GoldenFlow → GoldenMatch → GoldenPipe. MCP per package, multi-arch containers, Airflow DAGs, browser workbench.

2K 0 0
AI-team-UoA
pyjedai

An open-source library that leverages Python’s data science ecosystem to build powerful end-to-end Entity Resolution workflows.

2K 94 13
benzsevern
goldenflow

Polyglot entity-resolution + data-quality toolkit. Zero-config auto-config (negative-evidence + Path Y) hits DQbench composite 91.04 (T3 53.8% → 85.5%). Holds 0.96 DBLP-ACM, 0.94 Febrl3, 0.97 NCVR. GoldenCheck → GoldenFlow → GoldenMatch → GoldenPipe. MCP per package, multi-arch containers, Airflow DAGs, browser workbench.

2K 1 0
benzsevern
infermap

Inference-driven schema mapping engine for Python and TypeScript. 7 built-in scorers, domain dictionaries (healthcare/finance/ecommerce), confidence calibration, cross-language accuracy benchmark (F1 0.84), and full Python↔TypeScript parity.

2K 0 0
cobanov
semaclust

clustering similar strings using sentence embeddings and agglomerative clustering

1K 5 0
dbousque
batch-jaro-winkler

Fast batch jaro winkler distance implementation in C99.

1K 27 4
    • Data from PyPI, GitHub, ClickHouse, and BigQuery