AI Alignment through Reinforcement Learning from Human Feedback? Contradictions and Limitations
Authors:
Adam Dahlgren Lindström,
Leila Methnani,
Lea Krause,
Petter Ericson,
Íñigo Martínez de Rituerto de Troya,
Dimitri Coelho Mollo,
Roel Dobbe
Abstract:
This paper critically evaluates the attempts to align Artificial Intelligence (AI) systems, especially Large Language Models (LLMs), with human values and intentions through Reinforcement Learning from Feedback (RLxF) methods, involving either human feedback (RLHF) or AI feedback (RLAIF). Specifically, we show the shortcomings of the broadly pursued alignment goals of honesty, harmlessness, and he…
▽ More
This paper critically evaluates the attempts to align Artificial Intelligence (AI) systems, especially Large Language Models (LLMs), with human values and intentions through Reinforcement Learning from Feedback (RLxF) methods, involving either human feedback (RLHF) or AI feedback (RLAIF). Specifically, we show the shortcomings of the broadly pursued alignment goals of honesty, harmlessness, and helpfulness. Through a multidisciplinary sociotechnical critique, we examine both the theoretical underpinnings and practical implementations of RLxF techniques, revealing significant limitations in their approach to capturing the complexities of human ethics and contributing to AI safety. We highlight tensions and contradictions inherent in the goals of RLxF. In addition, we discuss ethically-relevant issues that tend to be neglected in discussions about alignment and RLxF, among which the trade-offs between user-friendliness and deception, flexibility and interpretability, and system safety. We conclude by urging researchers and practitioners alike to critically assess the sociotechnical ramifications of RLxF, advocating for a more nuanced and reflective approach to its application in AI development.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
A Hybrid Recommender System for Patient-Doctor Matchmaking in Primary Care
Authors:
Qiwei Han,
Mengxin Ji,
Inigo Martinez de Rituerto de Troya,
Manas Gaur,
Leid Zejnilovic
Abstract:
We partner with a leading European healthcare provider and design a mechanism to match patients with family doctors in primary care. We define the matchmaking process for several distinct use cases given different levels of available information about patients. Then, we adopt a hybrid recommender system to present each patient a list of family doctor recommendations. In particular, we model patien…
▽ More
We partner with a leading European healthcare provider and design a mechanism to match patients with family doctors in primary care. We define the matchmaking process for several distinct use cases given different levels of available information about patients. Then, we adopt a hybrid recommender system to present each patient a list of family doctor recommendations. In particular, we model patient trust of family doctors using a large-scale dataset of consultation histories, while accounting for the temporal dynamics of their relationships. Our proposed approach shows higher predictive accuracy than both a heuristic baseline and a collaborative filtering approach, and the proposed trust measure further improves model performance.
△ Less
Submitted 18 March, 2019; v1 submitted 9 August, 2018;
originally announced August 2018.