PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Data Selection Python Packages

Python packages with the GitHub topic data-selection. Sorted by relevance, with stars and monthly downloads.
georgianpartners
transformers-domain-adaptation

:no_entry: [DEPRECATED] Adapt Transformer-based language models to new text domains

1K 85 13
p-lambda
data-selection

DSIR large-scale data selection framework for language model training

310 273 19
zhuang-li
scar-tool

[ACL 2025 main] SCAR: Data Selection via Style Consistency-Aware Response Ranking for Efficient Instruction-Tuning of Large Language Models

292 40 4
OpenDCAI
dataflex

A data-centric training system for Large Language Models

231 610 65
4AI
gen-dedup

Generative deduplication

139 3 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery