minhash-lsh-algorithm
Partition-aware MinHash LSH deduplication for large-scale text data curation on Apache Spark