reward-shaping
Implementation of Humanoid Standing Up, from the paper "Learning Humanoid Standing-up Control across Diverse Postures" out of Shanghai, in Pytorch
Catch reward traps before training. Named after Goodhart's Law.
Inference-time scaling for LLMs-as-a-judge.
Reinforcement Learning environments for learning the Optimal Power Flow
for shaping RL agent package.