PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Record Linkage Python Packages

Python packages with the GitHub topic record-linkage. Sorted by relevance, with stars and monthly downloads.
J535D165
recordlinkage

A powerful and modular toolkit for record linkage and duplicate detection in Python

4.6M 1K 153
moj-analytical-services
splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

782K 2K 236
dedupeio
dedupe

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

76K 4K 570
maxharlow
csvmatch

🔎 Finds fuzzy matches between CSV files

8K 191 21
benzsevern
goldenmatch

Polyglot entity-resolution + data-quality toolkit. Zero-config auto-config (negative-evidence + Path Y) hits DQbench composite 91.04 (T3 53.8% → 85.5%). Holds 0.96 DBLP-ACM, 0.94 Febrl3, 0.97 NCVR. GoldenCheck → GoldenFlow → GoldenMatch → GoldenPipe. MCP per package, multi-arch containers, Airflow DAGs, browser workbench.

7K 44 6
data61
anonlink

Python implementation of anonymous linkage using cryptographic linkage keys

7K 74 8
cangyuanli
floof

Fuzzymatching made easy

4K 5 0
data61
clkhash

CLK hash: hash pii for entity matching

3K 47 7
fritshermans
deduplipy

End-to-end deduplication solution

2K 82 8
benzsevern
goldenpipe

Polyglot entity-resolution + data-quality toolkit. Zero-config auto-config (negative-evidence + Path Y) hits DQbench composite 91.04 (T3 53.8% → 85.5%). Holds 0.96 DBLP-ACM, 0.94 Febrl3, 0.97 NCVR. GoldenCheck → GoldenFlow → GoldenMatch → GoldenPipe. MCP per package, multi-arch containers, Airflow DAGs, browser workbench.

2K 0 0
benzsevern
goldenflow

Polyglot entity-resolution + data-quality toolkit. Zero-config auto-config (negative-evidence + Path Y) hits DQbench composite 91.04 (T3 53.8% → 85.5%). Holds 0.96 DBLP-ACM, 0.94 Febrl3, 0.97 NCVR. GoldenCheck → GoldenFlow → GoldenMatch → GoldenPipe. MCP per package, multi-arch containers, Airflow DAGs, browser workbench.

2K 1 0
ul-mds
gecko-syndata

Python library for the generation and mutation of realistic personal identification data at scale

2K 6 2
NickCrews
mismo

The SQL/Ibis powered sklearn of record linkage.

1K 23 4
data61
blocklib

A library for blocking in record linkage

1K 21 4
moj-analytical-services
splink-graph

a small set of graph functions to be used from pySpark on top of networkx and graphframes

1K 10 5
data61
anonlink-client

Client side tool for clkhash and blocklib

1K 6 2
ul-mds
pprl-model

Data models for use with a HTTP-based service for privacy-preserving record linkage using Bloom filters.

1K 1 0
ajl2718
whereabouts

Fast, accurate, open-source geocoding in Python

985 71 11
ihmeuw
easylink

A tool that allows users to build and run highly configurable record linkage/entity resolution pipelines.

933 11 0
vintasoftware
entity-embed

PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.

908 161 16
ncn-foreigners
blockingpy

Blocking records for record linkage and data deduplication based on ANN algorithms.

849 20 2
usc-isi-i2
rltk

Record Linkage ToolKit

818 112 22
ipums
hlink

Hierarchical record linkage at scale

569 13 2
benzsevern
goldenmatch-duckdb

Polyglot entity-resolution + data-quality toolkit. Zero-config auto-config (negative-evidence + Path Y) hits DQbench composite 91.04 (T3 53.8% → 85.5%). Holds 0.96 DBLP-ACM, 0.94 Febrl3, 0.97 NCVR. GoldenCheck → GoldenFlow → GoldenMatch → GoldenPipe. MCP per package, multi-arch containers, Airflow DAGs, browser workbench.

528 2 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery