We gratefully acknowledge support from
the Simons Foundation and member institutions.

Han Zhong is qualified to endorse.

DPO Meets PPO: Reinforced Token Optimization for RLHF

Han Zhong: Is registered as an author of this paper.
Can endorse for cs.AI, cs.CC, cs.CL, cs.DS, cs.GT, cs.LG, math.OC, stat.CO, stat.ML. (why?)

Guhao Feng, Wei Xiong, Li Zhao, Di He, Jiang Bian and Liwei Wang are not registered as owners of this paper. (why?)