Grpo Python Packages

judgeval

The Continuous-Improvement Stack for Agents. Our environment data and evals power agent improvement and monitoring.

147K 1K 93

ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-V4, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Gemma4, Llava, Phi4, ...) (AAAI 2025).

104K 15K 2K

hud-python

RL environments + evals for AI agents. Define once, train anything.

56K 271 61

shadowlm

A fine-tuning SDK — any open model, any harness, any method. 12 training methods behind one argument; pure-stdlib core.

6K 16 3

areno

An easy-to-use, fast toolkit to scale up RL post-training on a single node.

2K 8 0

crystal-metrics

CRYSTAL: Beyond Final Answers: Benchmark for Transparent Multimodal Reasoning Evaluation | arXiv 2603.13099

1K 2 0

forgelm

Config-driven LLM fine-tuning with safety evaluation, EU AI Act compliance, 6 alignment methods, and one-command bundled quickstart templates.

1K 7 1

textpolicy

Reinforcement learning for text generation on MLX (Apple Silicon)

1K 14 3

oat-llm

🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

973 664 63

lfm-coder

Using GRPO with RLVR, fine-tune LLMs to enhance coding capabilities

813 0 1

learnlens-rl

Universal evaluation layer for standard RL environments. Measures what an agent learned - not just how much reward it accumulated.

553 5 0

r-torch

An open source implementation of R1

381 31 5

oxrl

A lightweight post-training framework for LLMs and VLMs. 51 algorithms, 38 verified models. Scales with DeepSpeed, vLLM, and Ray.

359 19 2

learnlens

Universal evaluation layer for standard RL environments. Measures what an agent learned - not just how much reward it accumulated.

328 5 0

toolbrain

A framework for training LLM-powered agents to use tools more effectively using Reinforcement Learning

260 192 19

mlx-guided-grpo

Train reasoning models on your Mac. GRPO training framework for Apple Silicon with curriculum learning.

172 1 0

lightrft

Light, Efficient, Omni-modal & Reward-model Driven Reinforcement Fine-Tuning Framework

152 389 11

open-parl

PARL (Parallel-Agent Reinforcement Learning) - A training paradigm for coordinating multiple agents in parallel workflows

116 49 4

genteki-hdp

SDK for the HUD platform (Genteki fork with circular import fix).

114 271 61