PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Grpo Python Packages

Python packages with the GitHub topic grpo. Sorted by relevance, with stars and monthly downloads.
JudgmentLabs
judgeval

The Continuous-Improvement Stack for Agents. Our environment data and evals power agent improvement and monitoring.

479K 1K 93
modelscope
ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-R1, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Gemma4, Llava, Phi4, ...) (AAAI 2025).

141K 14K 1K
hud-evals
hud-python

OSS RL environment + evals toolkit

66K 254 57
AjayBandiwaddar
learnlens-rl

Universal evaluation layer for standard RL environments. Measures what an agent learned - not just how much reward it accumulated.

3K 3 0
AjayBandiwaddar
learnlens

Universal evaluation layer for standard RL environments. Measures what an agent learned - not just how much reward it accumulated.

1K 3 0
rparkr
lfm-coder

GRPO with RLVR training Liquid AI's LFM 2.5-instruct model to enhance coding capabilities

1K 0 1
sail-sg
oat-llm

Online AlignmenT (OAT) for LLMs.

651 653 63
teilomillet
textpolicy

Reinforcement learning for text generation on MLX (Apple Silicon): GRPO/GSPO, environments, rollout, rewards, LoRA/QLoRA

561 14 3
kyegomez
r-torch

Open Implementation of Deepseek's R1

413 31 5
toolbrain
toolbrain

A framework for agentic tool use training with reinforcement learning

214 167 19
warlockee
oxrl

A lightweight post-training framework for LLMs and VLMs

191 17 2
opendilab
lightrft

LightRFT: Light, Efficient, Omni-modal & Reward-model Driven Reinforcement Fine-Tuning Framework

148 336 11
adeelahmad
mlx-guided-grpo

Guided Group Relative Policy Optimization (GRPO) training for MLX on Apple Silicon

123 1 0
The-Swarm-Corporation
open-parl

PARL (Parallel-Agent Reinforcement Learning) is a training paradigm that teaches models to decompose complex tasks into parallel subtasks and coordinate multiple agents simultaneously.

96 43 4
hud-evals
genteki-hdp

OSS RL environment + evals toolkit

84 254 57
    • Data from PyPI, GitHub, ClickHouse, and BigQuery