Dpo Python Packages

soup-cli

Soup turns the pain of LLM fine-tuning into a simple workflow. One config, one command, done.

14K 72 21

shadowlm

A fine-tuning SDK — any open model, any harness, any method. 12 training methods behind one argument; pure-stdlib core.

6K 16 3

mlx-lm-lora

Train Large Language Models on MLX.

5K 390 50

oumi

Easily fine-tune, evaluate and deploy Gemma 4, Qwen3.5, Qwen3.6, gpt-oss, DeepSeek-R1, or any open source LLM / VLM!

3K 9K 783

afterimage

Generate conversational, tool-calling, structured-output, and preference datasets — easily and at scale

1K 41 3

forgelm

Config-driven LLM fine-tuning with safety evaluation, EU AI Act compliance, 6 alignment methods, and one-command bundled quickstart templates.

1K 7 1

oat-llm

🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

973 664 63

openpo

Synthetic data for fine tuning LLM

589 27 0

sillm-mlx

SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.

530 284 26

oxrl

A lightweight post-training framework for LLMs and VLMs. 51 algorithms, 38 verified models. Scales with DeepSpeed, vLLM, and Ray.

359 19 2

provenir

Reproducible, evaluation-first orchestration for LLM fine-tuning. RLAIF, LLM-as-Judge, DataFlywheel, RAG generation, adapter merging, benchmark eval, REST API.

304 1 0

knowlyr-hub

Gymnasium-style RL framework for LLM agent training — MDP environments, three-layer process reward & SFT/DPO/GRPO policy optimization. CLI + MCP ready.

283 3 0

knowlyr-recorder

Gymnasium-style RL framework for LLM agent training — MDP environments, three-layer process reward & SFT/DPO/GRPO policy optimization. CLI + MCP ready.

279 3 0

knowlyr-core

Gymnasium-style RL framework for LLM agent training — MDP environments, three-layer process reward & SFT/DPO/GRPO policy optimization. CLI + MCP ready.

276 3 0

toolbrain

A framework for agentic tool use training with reinforcement learning

260 193 19

knowlyr-sandbox

Gymnasium-style RL framework for LLM agent training — MDP environments, three-layer process reward & SFT/DPO/GRPO policy optimization. CLI + MCP ready.

255 3 0

knowlyr-reward

Gymnasium-style RL framework for LLM agent training — MDP environments, three-layer process reward & SFT/DPO/GRPO policy optimization. CLI + MCP ready.

254 3 0

mlora-cli

An Efficient "Factory" to Build Multiple LoRA Adapters

242 381 67

sygra

Graph-oriented Synthetic data generation Pipeline library

189 86 17

flash-pref

Accelerate LLM preference tuning via prefix sharing with a single line of code

174 52 0

knowlyr-trainer

Gymnasium-style RL framework for LLM agent training — MDP environments, three-layer process reward & SFT/DPO/GRPO policy optimization. CLI + MCP ready.

165 3 0