Search | arXiv e-print repository

AI Alignment through Reinforcement Learning from Human Feedback? Contradictions and Limitations

Authors: Adam Dahlgren Lindström, Leila Methnani, Lea Krause, Petter Ericson, Íñigo Martínez de Rituerto de Troya, Dimitri Coelho Mollo, Roel Dobbe

Abstract: This paper critically evaluates the attempts to align Artificial Intelligence (AI) systems, especially Large Language Models (LLMs), with human values and intentions through Reinforcement Learning from Feedback (RLxF) methods, involving either human feedback (RLHF) or AI feedback (RLAIF). Specifically, we show the shortcomings of the broadly pursued alignment goals of honesty, harmlessness, and he… ▽ More This paper critically evaluates the attempts to align Artificial Intelligence (AI) systems, especially Large Language Models (LLMs), with human values and intentions through Reinforcement Learning from Feedback (RLxF) methods, involving either human feedback (RLHF) or AI feedback (RLAIF). Specifically, we show the shortcomings of the broadly pursued alignment goals of honesty, harmlessness, and helpfulness. Through a multidisciplinary sociotechnical critique, we examine both the theoretical underpinnings and practical implementations of RLxF techniques, revealing significant limitations in their approach to capturing the complexities of human ethics and contributing to AI safety. We highlight tensions and contradictions inherent in the goals of RLxF. In addition, we discuss ethically-relevant issues that tend to be neglected in discussions about alignment and RLxF, among which the trade-offs between user-friendliness and deception, flexibility and interpretability, and system safety. We conclude by urging researchers and practitioners alike to critically assess the sociotechnical ramifications of RLxF, advocating for a more nuanced and reflective approach to its application in AI development. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: 12 pages, 1 table, to be submitted

arXiv:2304.11217 [pdf, other]

ACROCPoLis: A Descriptive Framework for Making Sense of Fairness

Authors: Andrea Aler Tubella, Dimitri Coelho Mollo, Adam Dahlgren Lindström, Hannah Devinney, Virginia Dignum, Petter Ericson, Anna Jonsson, Timotheus Kampik, Tom Lenaerts, Julian Alfredo Mendez, Juan Carlos Nieves

Abstract: Fairness is central to the ethical and responsible development and use of AI systems, with a large number of frameworks and formal notions of algorithmic fairness being available. However, many of the fairness solutions proposed revolve around technical considerations and not the needs of and consequences for the most impacted communities. We therefore want to take the focus away from definitions… ▽ More Fairness is central to the ethical and responsible development and use of AI systems, with a large number of frameworks and formal notions of algorithmic fairness being available. However, many of the fairness solutions proposed revolve around technical considerations and not the needs of and consequences for the most impacted communities. We therefore want to take the focus away from definitions and allow for the inclusion of societal and relational aspects to represent how the effects of AI systems impact and are experienced by individuals and social groups. In this paper, we do this by means of proposing the ACROCPoLis framework to represent allocation processes with a modeling emphasis on fairness aspects. The framework provides a shared vocabulary in which the factors relevant to fairness assessments for different situations and procedures are made explicit, as well as their interrelationships. This enables us to compare analogous situations, to highlight the differences in dissimilar situations, and to capture differing interpretations of the same situation by different stakeholders. △ Less

Submitted 19 April, 2023; originally announced April 2023.

Comments: To appear in the proceedings of ACM FAccT 2023

arXiv:2208.05358 [pdf, other]

CLEVR-Math: A Dataset for Compositional Language, Visual and Mathematical Reasoning

Authors: Adam Dahlgren Lindström, Savitha Sam Abraham

Abstract: We introduce CLEVR-Math, a multi-modal math word problems dataset consisting of simple math word problems involving addition/subtraction, represented partly by a textual description and partly by an image illustrating the scenario. The text describes actions performed on the scene that is depicted in the image. Since the question posed may not be about the scene in the image, but about the state o… ▽ More We introduce CLEVR-Math, a multi-modal math word problems dataset consisting of simple math word problems involving addition/subtraction, represented partly by a textual description and partly by an image illustrating the scenario. The text describes actions performed on the scene that is depicted in the image. Since the question posed may not be about the scene in the image, but about the state of the scene before or after the actions are applied, the solver envision or imagine the state changes due to these actions. Solving these word problems requires a combination of language, visual and mathematical reasoning. We apply state-of-the-art neural and neuro-symbolic models for visual question answering on CLEVR-Math and empirically evaluate their performances. Our results show how neither method generalise to chains of operations. We discuss the limitations of the two in addressing the task of multi-modal word problem solving. △ Less

Submitted 10 August, 2022; originally announced August 2022.

Comments: NeSy 2022, 16th International Workshop on Neural-Symbolic Learning and Reasoning, Cumberland Lodge, Windsor, UK

ACM Class: I.2.7; I.2.10; I.2.6; I.4.8; I.1.4

arXiv:2204.02813 [pdf, other]

An Algebraic Approach to Learning and Grounding

Authors: Johanna Björklund, Adam Dahlgren Lindström, Frank Drewes

Abstract: We consider the problem of learning the semantics of composite algebraic expressions from examples. The outcome is a versatile framework for studying learning tasks that can be put into the following abstract form: The input is a partial algebra $\alg$ and a finite set of examples $(\varphi_1, O_1), (\varphi_2, O_2), \ldots$, each consisting of an algebraic term $\varphi_i$ and a set of objects~… ▽ More We consider the problem of learning the semantics of composite algebraic expressions from examples. The outcome is a versatile framework for studying learning tasks that can be put into the following abstract form: The input is a partial algebra $\alg$ and a finite set of examples $(\varphi_1, O_1), (\varphi_2, O_2), \ldots$, each consisting of an algebraic term $\varphi_i$ and a set of objects~$O_i$. The objective is to simultaneously fill in the missing algebraic operations in $\alg$ and ground the variables of every $\varphi_i$ in $O_i$, so that the combined value of the terms is optimised. We demonstrate the applicability of this framework through case studies in grammatical inference, picture-language learning, and the grounding of logic scene descriptions. △ Less

Submitted 4 July, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

Comments: Accepted to LearnAut 2022 at ICALP 2022

ACM Class: I.2.4; I.2.7; I.2.10; I.1.3; I.2

arXiv:2102.11115 [pdf, other]

doi 10.18653/v1/2020.coling-main.64

Probing Multimodal Embeddings for Linguistic Properties: the Visual-Semantic Case

Authors: Adam Dahlgren Lindström, Suna Bensch, Johanna Björklund, Frank Drewes

Abstract: Semantic embeddings have advanced the state of the art for countless natural language processing tasks, and various extensions to multimodal domains, such as visual-semantic embeddings, have been proposed. While the power of visual-semantic embeddings comes from the distillation and enrichment of information through machine learning, their inner workings are poorly understood and there is a shorta… ▽ More Semantic embeddings have advanced the state of the art for countless natural language processing tasks, and various extensions to multimodal domains, such as visual-semantic embeddings, have been proposed. While the power of visual-semantic embeddings comes from the distillation and enrichment of information through machine learning, their inner workings are poorly understood and there is a shortage of analysis tools. To address this problem, we generalize the notion of probing tasks to the visual-semantic case. To this end, we (i) discuss the formalization of probing tasks for embeddings of image-caption pairs, (ii) define three concrete probing tasks within our general framework, (iii) train classifiers to probe for those properties, and (iv) compare various state-of-the-art embeddings under the lens of the proposed probing tasks. Our experiments reveal an up to 12% increase in accuracy on visual-semantic embeddings compared to the corresponding unimodal embeddings, which suggest that the text and image dimensions represented in the former do complement each other. △ Less

Submitted 22 February, 2021; originally announced February 2021.

Comments: Submitted July 1 2020, COLING 2020 main conference

Showing 1–5 of 5 results for author: Lindström, A D