Showing 1–2 of 2 results for author: de Troya, Í M d R

Search v0.5.6 released 2020-02-24

arXiv:2406.18346 [pdf, ps, other]

cs.AI

AI Alignment through Reinforcement Learning from Human Feedback? Contradictions and Limitations

Authors: Adam Dahlgren Lindström, Leila Methnani, Lea Krause, Petter Ericson, Íñigo Martínez de Rituerto de Troya, Dimitri Coelho Mollo, Roel Dobbe

Abstract: This paper critically evaluates the attempts to align Artificial Intelligence (AI) systems, especially Large Language Models (LLMs), with human values and intentions through Reinforcement Learning from Feedback (RLxF) methods, involving either human feedback (RLHF) or AI feedback (RLAIF). Specifically, we show the shortcomings of the broadly pursued alignment goals of honesty, harmlessness, and he… ▽ More This paper critically evaluates the attempts to align Artificial Intelligence (AI) systems, especially Large Language Models (LLMs), with human values and intentions through Reinforcement Learning from Feedback (RLxF) methods, involving either human feedback (RLHF) or AI feedback (RLAIF). Specifically, we show the shortcomings of the broadly pursued alignment goals of honesty, harmlessness, and helpfulness. Through a multidisciplinary sociotechnical critique, we examine both the theoretical underpinnings and practical implementations of RLxF techniques, revealing significant limitations in their approach to capturing the complexities of human ethics and contributing to AI safety. We highlight tensions and contradictions inherent in the goals of RLxF. In addition, we discuss ethically-relevant issues that tend to be neglected in discussions about alignment and RLxF, among which the trade-offs between user-friendliness and deception, flexibility and interpretability, and system safety. We conclude by urging researchers and practitioners alike to critically assess the sociotechnical ramifications of RLxF, advocating for a more nuanced and reflective approach to its application in AI development. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: 12 pages, 1 table, to be submitted
arXiv:1808.03265 [pdf, other]

cs.IR cs.LG stat.ML

A Hybrid Recommender System for Patient-Doctor Matchmaking in Primary Care

Authors: Qiwei Han, Mengxin Ji, Inigo Martinez de Rituerto de Troya, Manas Gaur, Leid Zejnilovic

Abstract: We partner with a leading European healthcare provider and design a mechanism to match patients with family doctors in primary care. We define the matchmaking process for several distinct use cases given different levels of available information about patients. Then, we adopt a hybrid recommender system to present each patient a list of family doctor recommendations. In particular, we model patien… ▽ More We partner with a leading European healthcare provider and design a mechanism to match patients with family doctors in primary care. We define the matchmaking process for several distinct use cases given different levels of available information about patients. Then, we adopt a hybrid recommender system to present each patient a list of family doctor recommendations. In particular, we model patient trust of family doctors using a large-scale dataset of consultation histories, while accounting for the temporal dynamics of their relationships. Our proposed approach shows higher predictive accuracy than both a heuristic baseline and a collaborative filtering approach, and the proposed trust measure further improves model performance. △ Less

Submitted 18 March, 2019; v1 submitted 9 August, 2018; originally announced August 2018.

Comments: This paper is accepted at DSAA 2018 as a full paper, Proc. of the 5th IEEE International Conference on Data Science and Advanced Analytics (DSAA), Turin, Italy

Search v0.5.6 released 2020-02-24