semantic-deduplication
Fast Multimodal Semantic Deduplication & Filtering
Scalable data pre processing and curation toolkit for LLMs