Rlhf Python Packages

llamafactory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

863K 73K 9K

transformerlab

The open source research environment for AI researchers to seamlessly train, evaluate, and scale models from local hardware to GPU clusters.

71K 5K 534

image-reward

[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation

45K 2K 90

rubrix

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets

12K 5K 491

transformerlab-cli

The open source research environment for AI researchers to seamlessly train, evaluate, and scale models from local hardware to GPU clusters.

6K 5K 534

shadowlm

A fine-tuning SDK — any open model, any harness, any method. 12 training methods behind one argument; pure-stdlib core.

6K 16 3

trinity-rft

Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (LLM).

6K 662 72

mlx-lm-lora

Train Large Language Models on MLX.

5K 390 50

py-openjudge

OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards

4K 706 58

kiln-ai

Build, Evaluate, and Optimize AI Systems. Includes evals, RAG, agents, fine-tuning, synthetic data generation, dataset management, MCP, and more.

4K 5K 375

argilla-server

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets

2K 5K 491

llmtuner

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

2K 73K 9K

kiln-server

Build, Evaluate, and Optimize AI Systems. Includes evals, RAG, agents, fine-tuning, synthetic data generation, dataset management, MCP, and more.

2K 5K 375

log10-io

Unified LLM data management

1K 97 12

textrl

TextRL - reinforcement learning for text generation, built on HuggingFace TRL.

1K 563 61

oat-llm

🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

973 664 63

pairjudge

Pairwise LLM judges (A/B/tie): budget-aware multi-turn packing, position-bias correction, pseudo-label distillation. Generalized from the 4th-place (gold) solution to Kaggle LMSYS Chatbot Arena.

856 169 11

lazyllm-llamafactory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

768 73K 9K

openpo

Synthetic data for fine tuning LLM

589 27 0

glmtuner

Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调

395 4K 462

alignment-handbook

The Alignment Handbook

389 6K 493

knowlyr-datalabel

Serverless annotation framework with LLM pre-labeling, inter-annotator agreement analysis & offline HTML interface. CLI + MCP ready.

361 0 0

oxrl

A lightweight post-training framework for LLMs and VLMs. 51 algorithms, 38 verified models. Scales with DeepSpeed, vLLM, and Ray.

359 19 2

lmxlab

Transformer language models on Apple Silicon with MLX

356 1 0