Han Zhong is qualified to endorse.
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong: | Is registered as an author of this paper. Can endorse for cs.AI, cs.CC, cs.CL, cs.DS, cs.GT, cs.LG, math.OC, stat.CO, stat.ML. (why?) |
Guhao Feng, Wei Xiong, Li Zhao, Di He, Jiang Bian and Liwei Wang are not registered as owners of this paper. (why?)