-
The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret
Authors:
Lukas Fluri,
Leon Lang,
Alessandro Abate,
Patrick Forré,
David Krueger,
Joar Skalse
Abstract:
In reinforcement learning, specifying reward functions that capture the intended task can be very challenging. Reward learning aims to address this issue by learning the reward function. However, a learned reward model may have a low error on the training distribution, and yet subsequently produce a policy with large regret. We say that such a reward model has an error-regret mismatch. The main so…
▽ More
In reinforcement learning, specifying reward functions that capture the intended task can be very challenging. Reward learning aims to address this issue by learning the reward function. However, a learned reward model may have a low error on the training distribution, and yet subsequently produce a policy with large regret. We say that such a reward model has an error-regret mismatch. The main source of an error-regret mismatch is the distributional shift that commonly occurs during policy optimization. In this paper, we mathematically show that a sufficiently low expected test error of the reward model guarantees low worst-case regret, but that for any fixed expected test error, there exist realistic data distributions that allow for error-regret mismatch to occur. We then show that similar problems persist even when using policy regularization techniques, commonly employed in methods such as RLHF. Our theoretical results highlight the importance of develo** new ways to measure the quality of learned reward models.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback
Authors:
Leon Lang,
Davis Foote,
Stuart Russell,
Anca Dragan,
Erik Jenner,
Scott Emmons
Abstract:
Past analyses of reinforcement learning from human feedback (RLHF) assume that the human evaluators fully observe the environment. What happens when human feedback is based only on partial observations? We formally define two failure cases: deceptive inflation and overjustification. Modeling the human as Boltzmann-rational w.r.t. a belief over trajectories, we prove conditions under which RLHF is…
▽ More
Past analyses of reinforcement learning from human feedback (RLHF) assume that the human evaluators fully observe the environment. What happens when human feedback is based only on partial observations? We formally define two failure cases: deceptive inflation and overjustification. Modeling the human as Boltzmann-rational w.r.t. a belief over trajectories, we prove conditions under which RLHF is guaranteed to result in policies that deceptively inflate their performance, overjustify their behavior to make an impression, or both. Under the new assumption that the human's partial observability is known and accounted for, we then analyze how much information the feedback process provides about the return function. We show that sometimes, the human's feedback determines the return function uniquely up to an additive constant, but in other realistic cases, there is irreducible ambiguity. We propose exploratory research directions to help tackle these challenges and caution against blindly applying RLHF in partially observable settings.
△ Less
Submitted 8 June, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
Evaluating Shutdown Avoidance of Language Models in Textual Scenarios
Authors:
Teun van der Weij,
Simon Lermen,
Leon lang
Abstract:
Recently, there has been an increase in interest in evaluating large language models for emergent and dangerous capabilities. Importantly, agents could reason that in some scenarios their goal is better achieved if they are not turned off, which can lead to undesirable behaviors. In this paper, we investigate the potential of using toy textual scenarios to evaluate instrumental reasoning and shutd…
▽ More
Recently, there has been an increase in interest in evaluating large language models for emergent and dangerous capabilities. Importantly, agents could reason that in some scenarios their goal is better achieved if they are not turned off, which can lead to undesirable behaviors. In this paper, we investigate the potential of using toy textual scenarios to evaluate instrumental reasoning and shutdown avoidance in language models such as GPT-4 and Claude. Furthermore, we explore whether shutdown avoidance is merely a result of simple pattern matching between the dataset and the prompt or if it is a consistent behaviour across different environments and variations.
We evaluated behaviours manually and also experimented with using language models for automatic evaluations, and these evaluations demonstrate that simple pattern matching is likely not the sole contributing factor for shutdown avoidance. This study provides insights into the behaviour of language models in shutdown avoidance scenarios and inspires further research on the use of textual scenarios for evaluations.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
SML:Enhance the Network Smoothness with Skip Meta Logit for CTR Prediction
Authors:
Wenlong Deng,
Lang Lang,
Zhen Liu,
Bin Liu
Abstract:
In light of the smoothness property brought by skip connections in ResNet, this paper proposed the Skip Logit to introduce the skip connection mechanism that fits arbitrary DNN dimensions and embraces similar properties to ResNet. Meta Tanh Normalization (MTN) is designed to learn variance information and stabilize the training process. With these delicate designs, our Skip Meta Logit (SML) brough…
▽ More
In light of the smoothness property brought by skip connections in ResNet, this paper proposed the Skip Logit to introduce the skip connection mechanism that fits arbitrary DNN dimensions and embraces similar properties to ResNet. Meta Tanh Normalization (MTN) is designed to learn variance information and stabilize the training process. With these delicate designs, our Skip Meta Logit (SML) brought incremental boosts to the performance of extensive SOTA ctr prediction models on two real-world datasets. In the meantime, we prove that the optimization landscape of arbitrarily deep skip logit networks has no spurious local optima. Finally, SML can be easily added to building blocks and has delivered offline accuracy and online business metrics gains on app ads learning to rank systems at TikTok.
△ Less
Submitted 9 October, 2022;
originally announced October 2022.
-
Polarized deep diffractive neural network for classification, generation, multiplexing and de-multiplexing of orbital angular momentum modes
Authors:
Jiaqi Zhang,
Zhiyuan Ye,
Jianhua Yin,
Liying Lang,
Shuming Jiao
Abstract:
The multiplexing and de-multiplexing of orbital angular momentum (OAM) beams are critical issues in optical communication. Optical diffractive neural networks have been introduced to perform classification, generation, multiplexing and de-multiplexing of OAM beams. However, conventional diffractive neural networks cannot handle OAM modes with a varying spatial distribution of polarization directio…
▽ More
The multiplexing and de-multiplexing of orbital angular momentum (OAM) beams are critical issues in optical communication. Optical diffractive neural networks have been introduced to perform classification, generation, multiplexing and de-multiplexing of OAM beams. However, conventional diffractive neural networks cannot handle OAM modes with a varying spatial distribution of polarization directions. Herein, we propose a polarized optical deep diffractive neural network that is designed based on the concept of rectangular micro-structure meta-material. Our proposed polarized optical diffractive neural network is trained to classify, generate, multiplex and de-multiplex polarized OAM beams.The simulation results show that our network framework can successfully classify 14 kinds of orthogonally polarized vortex beams and de-multiplex the hybrid OAM beams into Gauss beams at two, three and four spatial positions respectively. 6 polarized OAM beams with identical total intensity and 8 cylinder vector beams with different topology charges also have been classified effectively. Additionally, results reveal that the network can generate hybrid OAM beams with high quality and multiplex two polarized linear beams into 8 kinds of cylinder vector beams.
△ Less
Submitted 30 March, 2022;
originally announced March 2022.
-
Information Decomposition Diagrams Applied beyond Shannon Entropy: A Generalization of Hu's Theorem
Authors:
Leon Lang,
Pierre Baudot,
Rick Quax,
Patrick Forré
Abstract:
In information theory, one major goal is to find useful functions that summarize the amount of information contained in the interaction of several random variables. Specifically, one can ask how the classical Shannon entropy, mutual information, and higher interaction information relate to each other. This is answered by Hu's theorem, which is widely known in the form of information diagrams: it r…
▽ More
In information theory, one major goal is to find useful functions that summarize the amount of information contained in the interaction of several random variables. Specifically, one can ask how the classical Shannon entropy, mutual information, and higher interaction information relate to each other. This is answered by Hu's theorem, which is widely known in the form of information diagrams: it relates shapes in a Venn diagram to information functions, thus establishing a bridge from set theory to information theory. In this work, we view random variables together with the joint operation as a monoid that acts by conditioning on information functions, and entropy as a function satisfying the chain rule of information. This abstract viewpoint allows to prove a generalization of Hu's theorem. It applies to Shannon and Tsallis entropy, (Tsallis) Kullback-Leibler Divergence, cross-entropy, Kolmogorov complexity, submodular information functions, and the generalization error in machine learning. Our result implies for Chaitin's Kolmogorov complexity that the interaction complexities of all degrees are in expectation close to Shannon interaction information. For well-behaved probability distributions on increasing sequence lengths, this shows that the per-bit expected interaction complexity and information asymptotically coincide, thus showing a strong bridge between algorithmic and classical information theory.
△ Less
Submitted 1 March, 2024; v1 submitted 18 February, 2022;
originally announced February 2022.
-
Towards Sensor Data Abstraction of Autonomous Vehicle Perception Systems
Authors:
Hannes Reichert,
Lukas Lang,
Kevin Rösch,
Daniel Bogdoll,
Konrad Doll,
Bernhard Sick,
Hans-Christian Reuss,
Christoph Stiller,
J. Marius Zöllner
Abstract:
Full-stack autonomous driving perception modules usually consist of data-driven models based on multiple sensor modalities. However, these models might be biased to the sensor setup used for data acquisition. This bias can seriously impair the perception models' transferability to new sensor setups, which continuously occur due to the market's competitive nature. We envision sensor data abstractio…
▽ More
Full-stack autonomous driving perception modules usually consist of data-driven models based on multiple sensor modalities. However, these models might be biased to the sensor setup used for data acquisition. This bias can seriously impair the perception models' transferability to new sensor setups, which continuously occur due to the market's competitive nature. We envision sensor data abstraction as an interface between sensor data and machine learning applications for highly automated vehicles (HAD).
For this purpose, we review the primary sensor modalities, camera, lidar, and radar, published in autonomous-driving related datasets, examine single sensor abstraction and abstraction of sensor setups, and identify critical paths towards an abstraction of sensor data from multiple perception configurations.
△ Less
Submitted 28 September, 2021; v1 submitted 14 May, 2021;
originally announced May 2021.
-
Prediction of progressive lens performance from neural network simulations
Authors:
Alexander Leube,
Lukas Lang,
Gerhard Kelch,
Siegfried Wahl
Abstract:
Purpose: The purpose of this study is to present a framework to predict visual acuity (VA) based on a convolutional neural network (CNN) and to further to compare PAL designs.
Method: A simple two hidden layer CNN was trained to classify the gap orientations of Landolt Cs by combining the feature extraction abilities of a CNN with psychophysical staircase methods. The simulation was validated re…
▽ More
Purpose: The purpose of this study is to present a framework to predict visual acuity (VA) based on a convolutional neural network (CNN) and to further to compare PAL designs.
Method: A simple two hidden layer CNN was trained to classify the gap orientations of Landolt Cs by combining the feature extraction abilities of a CNN with psychophysical staircase methods. The simulation was validated regarding its predictability of clinical VA from induced spherical defocus (between +/-1.5 D, step size: 0.5 D) from 39 subjectively measured eyes. Afterwards, a simulation for a presbyopic eye corrected by either a generic hard or a soft PAL design (addition power: 2.5 D) was performed including lower and higher order aberrations.
Result: The validation revealed consistent offset of +0.20 logMAR +/-0.035 logMAR from simulated VA. Bland-Altman analysis from offset-corrected results showed limits of agreement (+/-1.96 SD) of -0.08 logMAR and +0.07 logMAR, which is comparable to clinical repeatability of VA assessment. The application of the simulation for PALs confirmed a bigger far zone for generic hard design but did not reveal zone width differences for the intermediate or near zone. Furthermore, a horizontal area of better VA at the mid of the PAL was found, which confirms the importance for realistic performance simulations using object-based aberration and physiological performance measures as VA.
Conclusion: The proposed holistic simulation tool was shown to act as an accurate model for subjective visual performance. Further, the simulations application for PALs indicated its potential as an effective method to compare visual performance of different optical designs. Moreover, the simulation provides the basis to incorporate neural aspects of visual perception and thus simulate the VA including neural processing in future.
△ Less
Submitted 19 March, 2021;
originally announced March 2021.
-
A Wigner-Eckart Theorem for Group Equivariant Convolution Kernels
Authors:
Leon Lang,
Maurice Weiler
Abstract:
Group equivariant convolutional networks (GCNNs) endow classical convolutional networks with additional symmetry priors, which can lead to a considerably improved performance. Recent advances in the theoretical description of GCNNs revealed that such models can generally be understood as performing convolutions with G-steerable kernels, that is, kernels that satisfy an equivariance constraint them…
▽ More
Group equivariant convolutional networks (GCNNs) endow classical convolutional networks with additional symmetry priors, which can lead to a considerably improved performance. Recent advances in the theoretical description of GCNNs revealed that such models can generally be understood as performing convolutions with G-steerable kernels, that is, kernels that satisfy an equivariance constraint themselves. While the G-steerability constraint has been derived, it has to date only been solved for specific use cases - a general characterization of G-steerable kernel spaces is still missing. This work provides such a characterization for the practically relevant case of G being any compact group. Our investigation is motivated by a striking analogy between the constraints underlying steerable kernels on the one hand and spherical tensor operators from quantum mechanics on the other hand. By generalizing the famous Wigner-Eckart theorem for spherical tensor operators, we prove that steerable kernel spaces are fully understood and parameterized in terms of 1) generalized reduced matrix elements, 2) Clebsch-Gordan coefficients, and 3) harmonic basis functions on homogeneous spaces.
△ Less
Submitted 21 January, 2021; v1 submitted 21 October, 2020;
originally announced October 2020.
-
Learning to Request Guidance in Emergent Communication
Authors:
Benjamin Kolb,
Leon Lang,
Henning Bartsch,
Arwin Gansekoele,
Raymond Koopmanschap,
Leonardo Romor,
David Speck,
Mathijs Mul,
Elia Bruni
Abstract:
Previous research into agent communication has shown that a pre-trained guide can speed up the learning process of an imitation learning agent. The guide achieves this by providing the agent with discrete messages in an emerged language about how to solve the task. We extend this one-directional communication by a one-bit communication channel from the learner back to the guide: It is able to ask…
▽ More
Previous research into agent communication has shown that a pre-trained guide can speed up the learning process of an imitation learning agent. The guide achieves this by providing the agent with discrete messages in an emerged language about how to solve the task. We extend this one-directional communication by a one-bit communication channel from the learner back to the guide: It is able to ask the guide for help, and we limit the guidance by penalizing the learner for these requests. During training, the agent learns to control this gate based on its current observation. We find that the amount of requested guidance decreases over time and guidance is requested in situations of high uncertainty. We investigate the agent's performance in cases of open and closed gates and discuss potential motives for the observed gating behavior.
△ Less
Submitted 11 December, 2019;
originally announced December 2019.
-
A Numerical Framework for Efficient Motion Estimation on Evolving Sphere-Like Surfaces based on Brightness and Mass Conservation Laws
Authors:
Lukas F. Lang
Abstract:
In this work we consider brightness and mass conservation laws for motion estimation on evolving Riemannian 2-manifolds that allow for a radial parametrisation from the 2-sphere. While conservation of brightness constitutes the foundation for optical flow methods and has been generalised to said scenario, we formulate in this article the principle of mass conservation for time-varying surfaces whi…
▽ More
In this work we consider brightness and mass conservation laws for motion estimation on evolving Riemannian 2-manifolds that allow for a radial parametrisation from the 2-sphere. While conservation of brightness constitutes the foundation for optical flow methods and has been generalised to said scenario, we formulate in this article the principle of mass conservation for time-varying surfaces which are embedded in Euclidean 3-space and derive a generalised continuity equation. The main motivation for this work is efficient cell motion estimation in time-lapse (4D) volumetric fluorescence microscopy images of a living zebrafish embryo. Increasing spatial and temporal resolution of modern microscopes require efficient analysis of such data. With this application in mind we address this need and follow an emerging paradigm in this field: dimensional reduction. In light of the ill-posedness of considered conservation laws we employ Tikhonov regularisation and propose the use of spatially varying regularisation functionals that recover motion only in regions with cells. For the efficient numerical solution we devise a Galerkin method based on compactly supported (tangent) vectorial basis functions. Furthermore, for the fast and accurate estimation of the evolving sphere-like surface from scattered data we utilise surface interpolation with spatio-temporal regularisation. We present numerical results based on aforementioned zebrafish microscopy data featuring fluorescently labelled cells.
△ Less
Submitted 2 May, 2018;
originally announced May 2018.
-
A Dynamic Programming Solution to Bounded Dejittering Problems
Authors:
Lukas F. Lang
Abstract:
We propose a dynamic programming solution to image dejittering problems with bounded displacements and obtain efficient algorithms for the removal of line jitter, line pixel jitter, and pixel jitter.
We propose a dynamic programming solution to image dejittering problems with bounded displacements and obtain efficient algorithms for the removal of line jitter, line pixel jitter, and pixel jitter.
△ Less
Submitted 27 March, 2017;
originally announced March 2017.
-
Optical Flow on Evolving Sphere-Like Surfaces
Authors:
Lukas F. Lang,
Otmar Scherzer
Abstract:
In this work we consider optical flow on evolving Riemannian 2-manifolds which can be parametrised from the 2-sphere. Our main motivation is to estimate cell motion in time-lapse volumetric microscopy images depicting fluorescently labelled cells of a live zebrafish embryo. We exploit the fact that the recorded cells float on the surface of the embryo and allow for the extraction of an image seque…
▽ More
In this work we consider optical flow on evolving Riemannian 2-manifolds which can be parametrised from the 2-sphere. Our main motivation is to estimate cell motion in time-lapse volumetric microscopy images depicting fluorescently labelled cells of a live zebrafish embryo. We exploit the fact that the recorded cells float on the surface of the embryo and allow for the extraction of an image sequence together with a sphere-like surface. We solve the resulting variational problem by means of a Galerkin method based on vector spherical harmonics and present numerical results computed from the aforementioned microscopy data.
△ Less
Submitted 10 June, 2015;
originally announced June 2015.
-
Decomposition of Optical Flow on the Sphere
Authors:
Clemens Kirisits,
Lukas F. Lang,
Otmar Scherzer
Abstract:
We propose a number of variational regularisation methods for the estimation and decomposition of motion fields on the $2$-sphere. While motion estimation is based on the optical flow equation, the presented decomposition models are motivated by recent trends in image analysis. In particular we treat $u+v$ decomposition as well as hierarchical decomposition. Helmholtz decomposition of motion field…
▽ More
We propose a number of variational regularisation methods for the estimation and decomposition of motion fields on the $2$-sphere. While motion estimation is based on the optical flow equation, the presented decomposition models are motivated by recent trends in image analysis. In particular we treat $u+v$ decomposition as well as hierarchical decomposition. Helmholtz decomposition of motion fields is obtained as a natural by-product of the chosen numerical method based on vector spherical harmonics. All models are tested on time-lapse microscopy data depicting fluorescently labelled endodermal cells of a zebrafish embryo.
△ Less
Submitted 4 March, 2014; v1 submitted 16 December, 2013;
originally announced December 2013.
-
Optical Flow on Evolving Surfaces with Space and Time Regularisation
Authors:
Clemens Kirisits,
Lukas F. Lang,
Otmar Scherzer
Abstract:
We extend the concept of optical flow with spatiotemporal regularisation to a dynamic non-Euclidean setting. Optical flow is traditionally computed from a sequence of flat images. The purpose of this paper is to introduce variational motion estimation for images that are defined on an evolving surface. Volumetric microscopy images depicting a live zebrafish embryo serve as both biological motivati…
▽ More
We extend the concept of optical flow with spatiotemporal regularisation to a dynamic non-Euclidean setting. Optical flow is traditionally computed from a sequence of flat images. The purpose of this paper is to introduce variational motion estimation for images that are defined on an evolving surface. Volumetric microscopy images depicting a live zebrafish embryo serve as both biological motivation and test data.
△ Less
Submitted 25 June, 2014; v1 submitted 1 October, 2013;
originally announced October 2013.
-
Optical Flow on Evolving Surfaces with an Application to the Analysis of 4D Microscopy Data
Authors:
Clemens Kirisits,
Lukas F. Lang,
Otmar Scherzer
Abstract:
We extend the concept of optical flow to a dynamic non-Euclidean setting. Optical flow is traditionally computed from a sequence of flat images. It is the purpose of this paper to introduce variational motion estimation for images that are defined on an evolving surface. Volumetric microscopy images depicting a live zebrafish embryo serve as both biological motivation and test data.
We extend the concept of optical flow to a dynamic non-Euclidean setting. Optical flow is traditionally computed from a sequence of flat images. It is the purpose of this paper to introduce variational motion estimation for images that are defined on an evolving surface. Volumetric microscopy images depicting a live zebrafish embryo serve as both biological motivation and test data.
△ Less
Submitted 21 May, 2013; v1 submitted 8 January, 2013;
originally announced January 2013.