PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Data Filtering Python Packages

Python packages with the GitHub topic data-filtering. Sorted by relevance, with stars and monthly downloads.
p-lambda
data-selection

DSIR large-scale data selection framework for language model training

308 273 19
zhuang-li
scar-tool

[ACL 2025 main] SCAR: Data Selection via Style Consistency-Aware Response Ranking for Efficient Instruction-Tuning of Large Language Models

283 40 4
w2xim3
sqljson

A powerful tool that allows users to query JSON data using SQL-like syntax. Effortlessly search, filter, and manipulate your JSON data with familiar SQL queries.

229 4 0
chaleaoch
lumi-filter

A powerful and flexible data filtering library with unified interface for multiple data sources including Peewee ORM, Pydantic models, and Python iterables. Flask-friendly.

111 5 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery