Search | arXiv e-print repository

The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret

Authors: Lukas Fluri, Leon Lang, Alessandro Abate, Patrick Forré, David Krueger, Joar Skalse

Abstract: In reinforcement learning, specifying reward functions that capture the intended task can be very challenging. Reward learning aims to address this issue by learning the reward function. However, a learned reward model may have a low error on the training distribution, and yet subsequently produce a policy with large regret. We say that such a reward model has an error-regret mismatch. The main so… ▽ More In reinforcement learning, specifying reward functions that capture the intended task can be very challenging. Reward learning aims to address this issue by learning the reward function. However, a learned reward model may have a low error on the training distribution, and yet subsequently produce a policy with large regret. We say that such a reward model has an error-regret mismatch. The main source of an error-regret mismatch is the distributional shift that commonly occurs during policy optimization. In this paper, we mathematically show that a sufficiently low expected test error of the reward model guarantees low worst-case regret, but that for any fixed expected test error, there exist realistic data distributions that allow for error-regret mismatch to occur. We then show that similar problems persist even when using policy regularization techniques, commonly employed in methods such as RLHF. Our theoretical results highlight the importance of develo** new ways to measure the quality of learned reward models. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: 58 pages, 1 figure

arXiv:2402.17747 [pdf, other]

When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback

Authors: Leon Lang, Davis Foote, Stuart Russell, Anca Dragan, Erik Jenner, Scott Emmons

Abstract: Past analyses of reinforcement learning from human feedback (RLHF) assume that the human evaluators fully observe the environment. What happens when human feedback is based only on partial observations? We formally define two failure cases: deceptive inflation and overjustification. Modeling the human as Boltzmann-rational w.r.t. a belief over trajectories, we prove conditions under which RLHF is… ▽ More Past analyses of reinforcement learning from human feedback (RLHF) assume that the human evaluators fully observe the environment. What happens when human feedback is based only on partial observations? We formally define two failure cases: deceptive inflation and overjustification. Modeling the human as Boltzmann-rational w.r.t. a belief over trajectories, we prove conditions under which RLHF is guaranteed to result in policies that deceptively inflate their performance, overjustify their behavior to make an impression, or both. Under the new assumption that the human's partial observability is known and accounted for, we then analyze how much information the feedback process provides about the return function. We show that sometimes, the human's feedback determines the return function uniquely up to an additive constant, but in other realistic cases, there is irreducible ambiguity. We propose exploratory research directions to help tackle these challenges and caution against blindly applying RLHF in partially observable settings. △ Less

Submitted 8 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

arXiv:2307.00787 [pdf, ps, other]

Evaluating Shutdown Avoidance of Language Models in Textual Scenarios

Authors: Teun van der Weij, Simon Lermen, Leon lang

Abstract: Recently, there has been an increase in interest in evaluating large language models for emergent and dangerous capabilities. Importantly, agents could reason that in some scenarios their goal is better achieved if they are not turned off, which can lead to undesirable behaviors. In this paper, we investigate the potential of using toy textual scenarios to evaluate instrumental reasoning and shutd… ▽ More Recently, there has been an increase in interest in evaluating large language models for emergent and dangerous capabilities. Importantly, agents could reason that in some scenarios their goal is better achieved if they are not turned off, which can lead to undesirable behaviors. In this paper, we investigate the potential of using toy textual scenarios to evaluate instrumental reasoning and shutdown avoidance in language models such as GPT-4 and Claude. Furthermore, we explore whether shutdown avoidance is merely a result of simple pattern matching between the dataset and the prompt or if it is a consistent behaviour across different environments and variations. We evaluated behaviours manually and also experimented with using language models for automatic evaluations, and these evaluations demonstrate that simple pattern matching is likely not the sole contributing factor for shutdown avoidance. This study provides insights into the behaviour of language models in shutdown avoidance scenarios and inspires further research on the use of textual scenarios for evaluations. △ Less

Submitted 3 July, 2023; originally announced July 2023.

arXiv:2210.10725 [pdf, other]

SML:Enhance the Network Smoothness with Skip Meta Logit for CTR Prediction

Authors: Wenlong Deng, Lang Lang, Zhen Liu, Bin Liu

Abstract: In light of the smoothness property brought by skip connections in ResNet, this paper proposed the Skip Logit to introduce the skip connection mechanism that fits arbitrary DNN dimensions and embraces similar properties to ResNet. Meta Tanh Normalization (MTN) is designed to learn variance information and stabilize the training process. With these delicate designs, our Skip Meta Logit (SML) brough… ▽ More In light of the smoothness property brought by skip connections in ResNet, this paper proposed the Skip Logit to introduce the skip connection mechanism that fits arbitrary DNN dimensions and embraces similar properties to ResNet. Meta Tanh Normalization (MTN) is designed to learn variance information and stabilize the training process. With these delicate designs, our Skip Meta Logit (SML) brought incremental boosts to the performance of extensive SOTA ctr prediction models on two real-world datasets. In the meantime, we prove that the optimization landscape of arbitrarily deep skip logit networks has no spurious local optima. Finally, SML can be easily added to building blocks and has delivered offline accuracy and online business metrics gains on app ads learning to rank systems at TikTok. △ Less

Submitted 9 October, 2022; originally announced October 2022.

arXiv:2203.16087 [pdf]

doi 10.1364/OE.463137

Polarized deep diffractive neural network for classification, generation, multiplexing and de-multiplexing of orbital angular momentum modes

Authors: Jiaqi Zhang, Zhiyuan Ye, Jianhua Yin, Liying Lang, Shuming Jiao

Abstract: The multiplexing and de-multiplexing of orbital angular momentum (OAM) beams are critical issues in optical communication. Optical diffractive neural networks have been introduced to perform classification, generation, multiplexing and de-multiplexing of OAM beams. However, conventional diffractive neural networks cannot handle OAM modes with a varying spatial distribution of polarization directio… ▽ More The multiplexing and de-multiplexing of orbital angular momentum (OAM) beams are critical issues in optical communication. Optical diffractive neural networks have been introduced to perform classification, generation, multiplexing and de-multiplexing of OAM beams. However, conventional diffractive neural networks cannot handle OAM modes with a varying spatial distribution of polarization directions. Herein, we propose a polarized optical deep diffractive neural network that is designed based on the concept of rectangular micro-structure meta-material. Our proposed polarized optical diffractive neural network is trained to classify, generate, multiplex and de-multiplex polarized OAM beams.The simulation results show that our network framework can successfully classify 14 kinds of orthogonally polarized vortex beams and de-multiplex the hybrid OAM beams into Gauss beams at two, three and four spatial positions respectively. 6 polarized OAM beams with identical total intensity and 8 cylinder vector beams with different topology charges also have been classified effectively. Additionally, results reveal that the network can generate hybrid OAM beams with high quality and multiplex two polarized linear beams into 8 kinds of cylinder vector beams. △ Less

Submitted 30 March, 2022; originally announced March 2022.

arXiv:2202.09393 [pdf, other]

Information Decomposition Diagrams Applied beyond Shannon Entropy: A Generalization of Hu's Theorem

Authors: Leon Lang, Pierre Baudot, Rick Quax, Patrick Forré

Abstract: In information theory, one major goal is to find useful functions that summarize the amount of information contained in the interaction of several random variables. Specifically, one can ask how the classical Shannon entropy, mutual information, and higher interaction information relate to each other. This is answered by Hu's theorem, which is widely known in the form of information diagrams: it r… ▽ More In information theory, one major goal is to find useful functions that summarize the amount of information contained in the interaction of several random variables. Specifically, one can ask how the classical Shannon entropy, mutual information, and higher interaction information relate to each other. This is answered by Hu's theorem, which is widely known in the form of information diagrams: it relates shapes in a Venn diagram to information functions, thus establishing a bridge from set theory to information theory. In this work, we view random variables together with the joint operation as a monoid that acts by conditioning on information functions, and entropy as a function satisfying the chain rule of information. This abstract viewpoint allows to prove a generalization of Hu's theorem. It applies to Shannon and Tsallis entropy, (Tsallis) Kullback-Leibler Divergence, cross-entropy, Kolmogorov complexity, submodular information functions, and the generalization error in machine learning. Our result implies for Chaitin's Kolmogorov complexity that the interaction complexities of all degrees are in expectation close to Shannon interaction information. For well-behaved probability distributions on increasing sequence lengths, this shows that the per-bit expected interaction complexity and information asymptotically coincide, thus showing a strong bridge between algorithmic and classical information theory. △ Less

Submitted 1 March, 2024; v1 submitted 18 February, 2022; originally announced February 2022.

Comments: 58 pages, 5 figures

arXiv:2105.06896 [pdf, other]

doi 10.1109/ISC253183.2021.9562912

Towards Sensor Data Abstraction of Autonomous Vehicle Perception Systems

Authors: Hannes Reichert, Lukas Lang, Kevin Rösch, Daniel Bogdoll, Konrad Doll, Bernhard Sick, Hans-Christian Reuss, Christoph Stiller, J. Marius Zöllner

Abstract: Full-stack autonomous driving perception modules usually consist of data-driven models based on multiple sensor modalities. However, these models might be biased to the sensor setup used for data acquisition. This bias can seriously impair the perception models' transferability to new sensor setups, which continuously occur due to the market's competitive nature. We envision sensor data abstractio… ▽ More Full-stack autonomous driving perception modules usually consist of data-driven models based on multiple sensor modalities. However, these models might be biased to the sensor setup used for data acquisition. This bias can seriously impair the perception models' transferability to new sensor setups, which continuously occur due to the market's competitive nature. We envision sensor data abstraction as an interface between sensor data and machine learning applications for highly automated vehicles (HAD). For this purpose, we review the primary sensor modalities, camera, lidar, and radar, published in autonomous-driving related datasets, examine single sensor abstraction and abstraction of sensor setups, and identify critical paths towards an abstraction of sensor data from multiple perception configurations. △ Less

Submitted 28 September, 2021; v1 submitted 14 May, 2021; originally announced May 2021.

Comments: Hannes Reichert, Lukas Lang, Kevin Rösch and Daniel Bogdoll contributed equally. Accepted for publication at ISC2 2021

arXiv:2103.10842 [pdf, ps, other]

Prediction of progressive lens performance from neural network simulations

Authors: Alexander Leube, Lukas Lang, Gerhard Kelch, Siegfried Wahl

Abstract: Purpose: The purpose of this study is to present a framework to predict visual acuity (VA) based on a convolutional neural network (CNN) and to further to compare PAL designs. Method: A simple two hidden layer CNN was trained to classify the gap orientations of Landolt Cs by combining the feature extraction abilities of a CNN with psychophysical staircase methods. The simulation was validated re… ▽ More Purpose: The purpose of this study is to present a framework to predict visual acuity (VA) based on a convolutional neural network (CNN) and to further to compare PAL designs. Method: A simple two hidden layer CNN was trained to classify the gap orientations of Landolt Cs by combining the feature extraction abilities of a CNN with psychophysical staircase methods. The simulation was validated regarding its predictability of clinical VA from induced spherical defocus (between +/-1.5 D, step size: 0.5 D) from 39 subjectively measured eyes. Afterwards, a simulation for a presbyopic eye corrected by either a generic hard or a soft PAL design (addition power: 2.5 D) was performed including lower and higher order aberrations. Result: The validation revealed consistent offset of +0.20 logMAR +/-0.035 logMAR from simulated VA. Bland-Altman analysis from offset-corrected results showed limits of agreement (+/-1.96 SD) of -0.08 logMAR and +0.07 logMAR, which is comparable to clinical repeatability of VA assessment. The application of the simulation for PALs confirmed a bigger far zone for generic hard design but did not reveal zone width differences for the intermediate or near zone. Furthermore, a horizontal area of better VA at the mid of the PAL was found, which confirms the importance for realistic performance simulations using object-based aberration and physiological performance measures as VA. Conclusion: The proposed holistic simulation tool was shown to act as an accurate model for subjective visual performance. Further, the simulations application for PALs indicated its potential as an effective method to compare visual performance of different optical designs. Moreover, the simulation provides the basis to incorporate neural aspects of visual perception and thus simulate the VA including neural processing in future. △ Less

Submitted 19 March, 2021; originally announced March 2021.

Comments: 9 pages, 4 figures

arXiv:2010.10952 [pdf, ps, other]

A Wigner-Eckart Theorem for Group Equivariant Convolution Kernels

Authors: Leon Lang, Maurice Weiler

Abstract: Group equivariant convolutional networks (GCNNs) endow classical convolutional networks with additional symmetry priors, which can lead to a considerably improved performance. Recent advances in the theoretical description of GCNNs revealed that such models can generally be understood as performing convolutions with G-steerable kernels, that is, kernels that satisfy an equivariance constraint them… ▽ More Group equivariant convolutional networks (GCNNs) endow classical convolutional networks with additional symmetry priors, which can lead to a considerably improved performance. Recent advances in the theoretical description of GCNNs revealed that such models can generally be understood as performing convolutions with G-steerable kernels, that is, kernels that satisfy an equivariance constraint themselves. While the G-steerability constraint has been derived, it has to date only been solved for specific use cases - a general characterization of G-steerable kernel spaces is still missing. This work provides such a characterization for the practically relevant case of G being any compact group. Our investigation is motivated by a striking analogy between the constraints underlying steerable kernels on the one hand and spherical tensor operators from quantum mechanics on the other hand. By generalizing the famous Wigner-Eckart theorem for spherical tensor operators, we prove that steerable kernel spaces are fully understood and parameterized in terms of 1) generalized reduced matrix elements, 2) Clebsch-Gordan coefficients, and 3) harmonic basis functions on homogeneous spaces. △ Less

Submitted 21 January, 2021; v1 submitted 21 October, 2020; originally announced October 2020.

Comments: 100 pages

arXiv:1912.05525 [pdf, other]

Learning to Request Guidance in Emergent Communication

Authors: Benjamin Kolb, Leon Lang, Henning Bartsch, Arwin Gansekoele, Raymond Koopmanschap, Leonardo Romor, David Speck, Mathijs Mul, Elia Bruni

Abstract: Previous research into agent communication has shown that a pre-trained guide can speed up the learning process of an imitation learning agent. The guide achieves this by providing the agent with discrete messages in an emerged language about how to solve the task. We extend this one-directional communication by a one-bit communication channel from the learner back to the guide: It is able to ask… ▽ More Previous research into agent communication has shown that a pre-trained guide can speed up the learning process of an imitation learning agent. The guide achieves this by providing the agent with discrete messages in an emerged language about how to solve the task. We extend this one-directional communication by a one-bit communication channel from the learner back to the guide: It is able to ask the guide for help, and we limit the guidance by penalizing the learner for these requests. During training, the agent learns to control this gate based on its current observation. We find that the amount of requested guidance decreases over time and guidance is requested in situations of high uncertainty. We investigate the agent's performance in cases of open and closed gates and discuss potential motives for the observed gating behavior. △ Less

Submitted 11 December, 2019; originally announced December 2019.

arXiv:1805.01006 [pdf, other]

A Numerical Framework for Efficient Motion Estimation on Evolving Sphere-Like Surfaces based on Brightness and Mass Conservation Laws

Authors: Lukas F. Lang

Abstract: In this work we consider brightness and mass conservation laws for motion estimation on evolving Riemannian 2-manifolds that allow for a radial parametrisation from the 2-sphere. While conservation of brightness constitutes the foundation for optical flow methods and has been generalised to said scenario, we formulate in this article the principle of mass conservation for time-varying surfaces whi… ▽ More In this work we consider brightness and mass conservation laws for motion estimation on evolving Riemannian 2-manifolds that allow for a radial parametrisation from the 2-sphere. While conservation of brightness constitutes the foundation for optical flow methods and has been generalised to said scenario, we formulate in this article the principle of mass conservation for time-varying surfaces which are embedded in Euclidean 3-space and derive a generalised continuity equation. The main motivation for this work is efficient cell motion estimation in time-lapse (4D) volumetric fluorescence microscopy images of a living zebrafish embryo. Increasing spatial and temporal resolution of modern microscopes require efficient analysis of such data. With this application in mind we address this need and follow an emerging paradigm in this field: dimensional reduction. In light of the ill-posedness of considered conservation laws we employ Tikhonov regularisation and propose the use of spatially varying regularisation functionals that recover motion only in regions with cells. For the efficient numerical solution we devise a Galerkin method based on compactly supported (tangent) vectorial basis functions. Furthermore, for the fast and accurate estimation of the evolving sphere-like surface from scattered data we utilise surface interpolation with spatio-temporal regularisation. We present numerical results based on aforementioned zebrafish microscopy data featuring fluorescently labelled cells. △ Less

Submitted 2 May, 2018; originally announced May 2018.

MSC Class: 35A15; 68U10; 92C55; 33C55; 92C37; 53A05; 65N30; 35L65

arXiv:1703.09161 [pdf, other]

A Dynamic Programming Solution to Bounded Dejittering Problems

Authors: Lukas F. Lang

Abstract: We propose a dynamic programming solution to image dejittering problems with bounded displacements and obtain efficient algorithms for the removal of line jitter, line pixel jitter, and pixel jitter. We propose a dynamic programming solution to image dejittering problems with bounded displacements and obtain efficient algorithms for the removal of line jitter, line pixel jitter, and pixel jitter. △ Less

Submitted 27 March, 2017; originally announced March 2017.

Comments: The final publication is available at link.springer.com

arXiv:1506.03358 [pdf, other]

Optical Flow on Evolving Sphere-Like Surfaces

Authors: Lukas F. Lang, Otmar Scherzer

Abstract: In this work we consider optical flow on evolving Riemannian 2-manifolds which can be parametrised from the 2-sphere. Our main motivation is to estimate cell motion in time-lapse volumetric microscopy images depicting fluorescently labelled cells of a live zebrafish embryo. We exploit the fact that the recorded cells float on the surface of the embryo and allow for the extraction of an image seque… ▽ More In this work we consider optical flow on evolving Riemannian 2-manifolds which can be parametrised from the 2-sphere. Our main motivation is to estimate cell motion in time-lapse volumetric microscopy images depicting fluorescently labelled cells of a live zebrafish embryo. We exploit the fact that the recorded cells float on the surface of the embryo and allow for the extraction of an image sequence together with a sphere-like surface. We solve the resulting variational problem by means of a Galerkin method based on vector spherical harmonics and present numerical results computed from the aforementioned microscopy data. △ Less

Submitted 10 June, 2015; originally announced June 2015.

arXiv:1312.4354 [pdf, other]

doi 10.1007/s13137-013-0055-8

Decomposition of Optical Flow on the Sphere

Authors: Clemens Kirisits, Lukas F. Lang, Otmar Scherzer

Abstract: We propose a number of variational regularisation methods for the estimation and decomposition of motion fields on the $2$-sphere. While motion estimation is based on the optical flow equation, the presented decomposition models are motivated by recent trends in image analysis. In particular we treat $u+v$ decomposition as well as hierarchical decomposition. Helmholtz decomposition of motion field… ▽ More We propose a number of variational regularisation methods for the estimation and decomposition of motion fields on the $2$-sphere. While motion estimation is based on the optical flow equation, the presented decomposition models are motivated by recent trends in image analysis. In particular we treat $u+v$ decomposition as well as hierarchical decomposition. Helmholtz decomposition of motion fields is obtained as a natural by-product of the chosen numerical method based on vector spherical harmonics. All models are tested on time-lapse microscopy data depicting fluorescently labelled endodermal cells of a zebrafish embryo. △ Less

Submitted 4 March, 2014; v1 submitted 16 December, 2013; originally announced December 2013.

Comments: The final publication is available at link.springer.com

MSC Class: 92C55; 92C37; 92C17; 35A15; 68U10; 33C55

arXiv:1310.0322 [pdf, other]

doi 10.1007/s10851-014-0513-4

Optical Flow on Evolving Surfaces with Space and Time Regularisation

Authors: Clemens Kirisits, Lukas F. Lang, Otmar Scherzer

Abstract: We extend the concept of optical flow with spatiotemporal regularisation to a dynamic non-Euclidean setting. Optical flow is traditionally computed from a sequence of flat images. The purpose of this paper is to introduce variational motion estimation for images that are defined on an evolving surface. Volumetric microscopy images depicting a live zebrafish embryo serve as both biological motivati… ▽ More We extend the concept of optical flow with spatiotemporal regularisation to a dynamic non-Euclidean setting. Optical flow is traditionally computed from a sequence of flat images. The purpose of this paper is to introduce variational motion estimation for images that are defined on an evolving surface. Volumetric microscopy images depicting a live zebrafish embryo serve as both biological motivation and test data. △ Less

Submitted 25 June, 2014; v1 submitted 1 October, 2013; originally announced October 2013.

Comments: The final publication is available at Springer via http://dx.doi.org/10.1007/s10851-014-0513-4. This is an extended version of arXiv:1301.1576

arXiv:1301.1576 [pdf, other]

doi 10.1007/978-3-642-38267-3_21

Optical Flow on Evolving Surfaces with an Application to the Analysis of 4D Microscopy Data

Authors: Clemens Kirisits, Lukas F. Lang, Otmar Scherzer

Abstract: We extend the concept of optical flow to a dynamic non-Euclidean setting. Optical flow is traditionally computed from a sequence of flat images. It is the purpose of this paper to introduce variational motion estimation for images that are defined on an evolving surface. Volumetric microscopy images depicting a live zebrafish embryo serve as both biological motivation and test data. We extend the concept of optical flow to a dynamic non-Euclidean setting. Optical flow is traditionally computed from a sequence of flat images. It is the purpose of this paper to introduce variational motion estimation for images that are defined on an evolving surface. Volumetric microscopy images depicting a live zebrafish embryo serve as both biological motivation and test data. △ Less

Submitted 21 May, 2013; v1 submitted 8 January, 2013; originally announced January 2013.

Comments: The final publication is available at link.springer.com

Showing 1–16 of 16 results for author: Lang, L