reinforcement-learning-from-human-feedback
Super-Efficient RLHF Training of LLMs with Parameter Reallocation