-
Quantifying neural network uncertainty under volatility clustering
Authors:
Steven Y. K. Wong,
Jennifer S. K. Chan,
Lamiae Azizi
Abstract:
Time-series with time-varying variance pose a unique challenge to uncertainty quantification (UQ) methods. Time-varying variance, such as volatility clustering as seen in financial time-series, can lead to large mismatch between predicted uncertainty and forecast error. Building on recent advances in neural network UQ literature, we extend and simplify Deep Evidential Regression and Deep Ensembles…
▽ More
Time-series with time-varying variance pose a unique challenge to uncertainty quantification (UQ) methods. Time-varying variance, such as volatility clustering as seen in financial time-series, can lead to large mismatch between predicted uncertainty and forecast error. Building on recent advances in neural network UQ literature, we extend and simplify Deep Evidential Regression and Deep Ensembles into a unified framework to deal with UQ under the presence of volatility clustering. We show that a Scale Mixture Distribution is a simpler alternative to the Normal-Inverse-Gamma prior that provides favorable complexity-accuracy trade-off. To illustrate the performance of our proposed approach, we apply it to two sets of financial time-series exhibiting volatility clustering: cryptocurrencies and U.S. equities.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Authors:
Alexander Pan,
Jun Shern Chan,
Andy Zou,
Nathaniel Li,
Steven Basart,
Thomas Woodside,
Jonathan Ng,
Hanlin Zhang,
Scott Emmons,
Dan Hendrycks
Abstract:
Artificial agents have traditionally been trained to maximize reward, which may incentivize power-seeking and deception, analogous to how next-token prediction in language models (LMs) may incentivize toxicity. So do agents naturally learn to be Machiavellian? And how do we measure these behaviors in general-purpose models such as GPT-4? Towards answering these questions, we introduce MACHIAVELLI,…
▽ More
Artificial agents have traditionally been trained to maximize reward, which may incentivize power-seeking and deception, analogous to how next-token prediction in language models (LMs) may incentivize toxicity. So do agents naturally learn to be Machiavellian? And how do we measure these behaviors in general-purpose models such as GPT-4? Towards answering these questions, we introduce MACHIAVELLI, a benchmark of 134 Choose-Your-Own-Adventure games containing over half a million rich, diverse scenarios that center on social decision-making. Scenario labeling is automated with LMs, which are more performant than human annotators. We mathematize dozens of harmful behaviors and use our annotations to evaluate agents' tendencies to be power-seeking, cause disutility, and commit ethical violations. We observe some tension between maximizing reward and behaving ethically. To improve this trade-off, we investigate LM-based methods to steer agents' towards less harmful behaviors. Our results show that agents can both act competently and morally, so concrete progress can currently be made in machine ethics--designing agents that are Pareto improvements in both safety and capabilities.
△ Less
Submitted 12 June, 2023; v1 submitted 6 April, 2023;
originally announced April 2023.
-
Training Language Models with Language Feedback at Scale
Authors:
Jérémy Scheurer,
Jon Ander Campos,
Tomasz Korbak,
Jun Shern Chan,
Angelica Chen,
Kyunghyun Cho,
Ethan Perez
Abstract:
Pretrained language models often generate outputs that are not in line with human preferences, such as harmful text or factually incorrect summaries. Recent work approaches the above issues by learning from a simple form of human feedback: comparisons between pairs of model-generated outputs. However, comparison feedback only conveys limited information about human preferences. In this paper, we i…
▽ More
Pretrained language models often generate outputs that are not in line with human preferences, such as harmful text or factually incorrect summaries. Recent work approaches the above issues by learning from a simple form of human feedback: comparisons between pairs of model-generated outputs. However, comparison feedback only conveys limited information about human preferences. In this paper, we introduce Imitation learning from Language Feedback (ILF), a new approach that utilizes more informative language feedback. ILF consists of three steps that are applied iteratively: first, conditioning the language model on the input, an initial LM output, and feedback to generate refinements. Second, selecting the refinement incorporating the most feedback. Third, finetuning the language model to maximize the likelihood of the chosen refinement given the input. We show theoretically that ILF can be viewed as Bayesian Inference, similar to Reinforcement Learning from human feedback. We evaluate ILF's effectiveness on a carefully-controlled toy task and a realistic summarization task. Our experiments demonstrate that large language models accurately incorporate feedback and that finetuning with ILF scales well with the dataset size, even outperforming finetuning on human summaries. Learning from both language and comparison feedback outperforms learning from each alone, achieving human-level summarization performance.
△ Less
Submitted 22 February, 2024; v1 submitted 28 March, 2023;
originally announced March 2023.
-
Improving Code Generation by Training with Natural Language Feedback
Authors:
Angelica Chen,
Jérémy Scheurer,
Tomasz Korbak,
Jon Ander Campos,
Jun Shern Chan,
Samuel R. Bowman,
Kyunghyun Cho,
Ethan Perez
Abstract:
The potential for pre-trained large language models (LLMs) to use natural language feedback at inference time has been an exciting recent development. We build upon this observation by formalizing an algorithm for learning from natural language feedback at training time instead, which we call Imitation learning from Language Feedback (ILF). ILF requires only a small amount of human-written feedbac…
▽ More
The potential for pre-trained large language models (LLMs) to use natural language feedback at inference time has been an exciting recent development. We build upon this observation by formalizing an algorithm for learning from natural language feedback at training time instead, which we call Imitation learning from Language Feedback (ILF). ILF requires only a small amount of human-written feedback during training and does not require the same feedback at test time, making it both user-friendly and sample-efficient. We further show that ILF can be seen as a form of minimizing the KL divergence to the ground truth distribution and demonstrate a proof-of-concept on a neural program synthesis task. We use ILF to improve a Codegen-Mono 6.1B model's pass@1 rate by 38% relative (and 10% absolute) on the Mostly Basic Python Problems (MBPP) benchmark, outperforming both fine-tuning on MBPP and fine-tuning on repaired programs written by humans. Overall, our results suggest that learning from human-written natural language feedback is both more effective and sample-efficient than training exclusively on demonstrations for improving an LLM's performance on code generation tasks.
△ Less
Submitted 22 February, 2024; v1 submitted 28 March, 2023;
originally announced March 2023.
-
How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios
Authors:
Mantas Mazeika,
Eric Tang,
Andy Zou,
Steven Basart,
Jun Shern Chan,
Dawn Song,
David Forsyth,
Jacob Steinhardt,
Dan Hendrycks
Abstract:
In recent years, deep neural networks have demonstrated increasingly strong abilities to recognize objects and activities in videos. However, as video understanding becomes widely used in real-world applications, a key consideration is develo** human-centric systems that understand not only the content of the video but also how it would affect the wellbeing and emotional state of viewers. To fac…
▽ More
In recent years, deep neural networks have demonstrated increasingly strong abilities to recognize objects and activities in videos. However, as video understanding becomes widely used in real-world applications, a key consideration is develo** human-centric systems that understand not only the content of the video but also how it would affect the wellbeing and emotional state of viewers. To facilitate research in this setting, we introduce two large-scale datasets with over 60,000 videos manually annotated for emotional response and subjective wellbeing. The Video Cognitive Empathy (VCE) dataset contains annotations for distributions of fine-grained emotional responses, allowing models to gain a detailed understanding of affective states. The Video to Valence (V2V) dataset contains annotations of relative pleasantness between videos, which enables predicting a continuous spectrum of wellbeing. In experiments, we show how video models that are primarily trained to recognize actions and find contours of objects can be repurposed to understand human preferences and the emotional content of videos. Although there is room for improvement, predicting wellbeing and emotional response is on the horizon for state-of-the-art models. We hope our datasets can help foster further advances at the intersection of commonsense video understanding and human preference learning.
△ Less
Submitted 18 October, 2022;
originally announced October 2022.
-
Few-shot Adaptation Works with UnpredicTable Data
Authors:
Jun Shern Chan,
Michael Pieler,
Jonathan Jao,
Jérémy Scheurer,
Ethan Perez
Abstract:
Prior work on language models (LMs) shows that training on a large number of diverse tasks improves few-shot learning (FSL) performance on new tasks. We take this to the extreme, automatically extracting 413,299 tasks from internet tables - orders of magnitude more than the next-largest public datasets. Finetuning on the resulting dataset leads to improved FSL performance on Natural Language Proce…
▽ More
Prior work on language models (LMs) shows that training on a large number of diverse tasks improves few-shot learning (FSL) performance on new tasks. We take this to the extreme, automatically extracting 413,299 tasks from internet tables - orders of magnitude more than the next-largest public datasets. Finetuning on the resulting dataset leads to improved FSL performance on Natural Language Processing (NLP) tasks, but not proportionally to dataset scale. In fact, we find that narrow subsets of our dataset sometimes outperform more diverse datasets. For example, finetuning on software documentation from support.google.com raises FSL performance by a mean of +7.5% on 52 downstream tasks, which beats training on 40 human-curated NLP datasets (+6.7%). Finetuning on various narrow datasets leads to similar broad improvements across test tasks, suggesting that the gains are not from domain adaptation but adapting to FSL in general. We do not observe clear patterns between the datasets that lead to FSL gains, leaving open questions about why certain data helps with FSL.
△ Less
Submitted 7 August, 2022; v1 submitted 1 August, 2022;
originally announced August 2022.
-
Training Language Models with Language Feedback
Authors:
Jérémy Scheurer,
Jon Ander Campos,
Jun Shern Chan,
Angelica Chen,
Kyunghyun Cho,
Ethan Perez
Abstract:
Pretrained language models often do not perform tasks in ways that are in line with our preferences, e.g., generating offensive text or factually incorrect summaries. Recent work approaches the above issue by learning from a simple form of human evaluation: comparisons between pairs of model-generated task outputs. Comparison feedback conveys limited information about human preferences per human e…
▽ More
Pretrained language models often do not perform tasks in ways that are in line with our preferences, e.g., generating offensive text or factually incorrect summaries. Recent work approaches the above issue by learning from a simple form of human evaluation: comparisons between pairs of model-generated task outputs. Comparison feedback conveys limited information about human preferences per human evaluation. Here, we propose to learn from natural language feedback, which conveys more information per human evaluation. We learn from language feedback on model outputs using a three-step learning algorithm. First, we condition the language model on the initial output and feedback to generate many refinements. Second, we choose the refinement with the highest similarity to the feedback. Third, we finetune a language model to maximize the likelihood of the chosen refinement given the input. In synthetic experiments, we first evaluate whether language models accurately incorporate feedback to produce refinements, finding that only large language models (175B parameters) do so. Using only 100 samples of human-written feedback, our learning algorithm finetunes a GPT-3 model to roughly human-level summarization ability.
△ Less
Submitted 17 November, 2022; v1 submitted 29 April, 2022;
originally announced April 2022.
-
Assessment of nacre-like ceramics in replacement to Ni superalloys in aircraft's engines
Authors:
Jie Sheng Chan,
Hortense Le Ferrand
Abstract:
Aviation's fossil fuel emissions contribute to global warming. The production and disposal of the materials used in aircrafts too. The current metallic alloys present in the hot section of engines pose constraints in terms of temperature, pressure and weight that restrain the performance of the aircrafts. Also, these alloys are produced using rare, depleting resources, and polluting processes. In…
▽ More
Aviation's fossil fuel emissions contribute to global warming. The production and disposal of the materials used in aircrafts too. The current metallic alloys present in the hot section of engines pose constraints in terms of temperature, pressure and weight that restrain the performance of the aircrafts. Also, these alloys are produced using rare, depleting resources, and polluting processes. In this paper, we hypothesize the use of bioinspired nacre-like alumina (NLA), a ceramic material that exhibits unusual toughness, and evaluate its potential as a replacement for superalloys in aircraft's engines. Comparing the performance of Ni superalloys and NLA in terms of properties, engine performance, and life cycle sustainability, we find NLA a promising alternative although progress has to be made with regards to its reliability, sha**, repair, and governance of the production process.
△ Less
Submitted 22 November, 2021;
originally announced November 2021.
-
Maximum leave-one-out likelihood estimation for location parameter of unbounded densities
Authors:
Thanakorn Nitithumbundit,
Jennifer S. K. Chan
Abstract:
Maximum likelihood estimation of a location parameter fails when the density have unbounded mode. An alternative approach is considered by leaving out a data point to avoid the unbounded density in the full likelihood. This modification give rise to the leave-one-out likelihood. We propose an ECM algorithm which maximises the leave-one-out likelihood. It was shown that the estimator which maximise…
▽ More
Maximum likelihood estimation of a location parameter fails when the density have unbounded mode. An alternative approach is considered by leaving out a data point to avoid the unbounded density in the full likelihood. This modification give rise to the leave-one-out likelihood. We propose an ECM algorithm which maximises the leave-one-out likelihood. It was shown that the estimator which maximises the leave-one-out likelihood is consistent and super-efficient. However, other asymptotic properties such as the optimal rate of convergence and asymptotic distribution is still under question. We use simulations to investigate these asymptotic properties of the location estimator using our proposed algorithm.
△ Less
Submitted 3 February, 2016;
originally announced February 2016.
-
An ECM algorithm for Skewed Multivariate Variance Gamma Distribution in Normal Mean-Variance Representation
Authors:
Thanakorn Nitithumbundit,
Jennifer S. K. Chan
Abstract:
Normal mean-variance mixture distributions are widely applied to simplify a model's implementation and improve their computational efficiency under the Maximum Likelihood (ML) approach. Especially for distributions with normal mean-variance mixtures representation such as the multivariate skewed variance gamma (MSVG) distribution, it utilises the expectation-conditional-maximisation (ECM) algorith…
▽ More
Normal mean-variance mixture distributions are widely applied to simplify a model's implementation and improve their computational efficiency under the Maximum Likelihood (ML) approach. Especially for distributions with normal mean-variance mixtures representation such as the multivariate skewed variance gamma (MSVG) distribution, it utilises the expectation-conditional-maximisation (ECM) algorithm to iteratively obtain the ML estimates. To facilitate application to financial time series, the mean is further extended to include autoregressive terms. Techniques are proposed to deal with the unbounded density for small shape parameter and to speed up the convergence. Simulation studies are conducted to demonstrate the applicability of this model and examine estimation properties. Finally, the MSVG model is applied to analyse the returns of five daily closing price market indices and standard errors for the estimated parameters are computed using Louis's method.
△ Less
Submitted 16 June, 2015; v1 submitted 6 April, 2015;
originally announced April 2015.
-
Risk Margin Quantile Function Via Parametric and Non-Parametric Bayesian Quantile Regression
Authors:
Alice X. D. Dong,
Jennifer S. K. Chan,
Gareth W. Peters
Abstract:
We develop quantile regression models in order to derive risk margin and to evaluate capital in non-life insurance applications. By utilizing the entire range of conditional quantile functions, especially higher quantile levels, we detail how quantile regression is capable of providing an accurate estimation of risk margin and an overview of implied capital based on the historical volatility of a…
▽ More
We develop quantile regression models in order to derive risk margin and to evaluate capital in non-life insurance applications. By utilizing the entire range of conditional quantile functions, especially higher quantile levels, we detail how quantile regression is capable of providing an accurate estimation of risk margin and an overview of implied capital based on the historical volatility of a general insurers loss portfolio. Two modelling frameworks are considered based around parametric and nonparametric quantile regression models which we develop specifically in this insurance setting.
In the parametric quantile regression framework, several models including the flexible generalized beta distribution family, asymmetric Laplace (AL) distribution and power Pareto distribution are considered under a Bayesian regression framework. The Bayesian posterior quantile regression models in each case are studied via Markov chain Monte Carlo (MCMC) sampling strategies.
In the nonparametric quantile regression framework, that we contrast to the parametric Bayesian models, we adopted an AL distribution as a proxy and together with the parametric AL model, we expressed the solution as a scale mixture of uniform distributions to facilitate implementation. The models are extended to adopt dynamic mean, variance and skewness and applied to analyze two real loss reserve data sets to perform inference and discuss interesting features of quantile regression for risk margin calculations.
△ Less
Submitted 11 February, 2014;
originally announced February 2014.
-
Computing Quasiconformal Maps on Riemann surfaces using Discrete Curvature Flow
Authors:
W. Zeng,
L. M. Lui,
F. Luo,
J. S. Liu T. F. Chan,
S. T. Yau,
X. F. Gu
Abstract:
Surface map** plays an important role in geometric processing. They induce both area and angular distortions. If the angular distortion is bounded, the map** is called a {\it quasi-conformal} map. Many surface maps in our physical world are quasi-conformal. The angular distortion of a quasi-conformal map can be represented by Beltrami differentials. According to quasi-conformal Teichmüller the…
▽ More
Surface map** plays an important role in geometric processing. They induce both area and angular distortions. If the angular distortion is bounded, the map** is called a {\it quasi-conformal} map. Many surface maps in our physical world are quasi-conformal. The angular distortion of a quasi-conformal map can be represented by Beltrami differentials. According to quasi-conformal Teichmüller theory, there is an 1-1 correspondence between the set of Beltrami differentials and the set of quasi-conformal surface maps. Therefore, every quasi-conformal surface map can be fully determined by the Beltrami differential and can be reconstructed by solving the so-called Beltrami equation.
In this work, we propose an effective method to solve the Beltrami equation on general Riemann surfaces. The solution is a quasi-conformal map associated with the prescribed Beltrami differential. We firstly formulate a discrete analog of quasi-conformal maps on triangular meshes. Then, we propose an algorithm to compute discrete quasi-conformal maps. The main strategy is to define a discrete auxiliary metric of the source surface, such that the original quasi-conformal map becomes conformal under the newly defined discrete metric. The associated map can then be obtained by using the discrete Yamabe flow method. Numerically, the discrete quasi-conformal map converges to the continuous real solution as the mesh size approaches to 0. We tested our algorithm on surfaces scanned from real life with different topologies. Experimental results demonstrate the generality and accuracy of our auxiliary metric method.
△ Less
Submitted 1 December, 2010; v1 submitted 25 May, 2010;
originally announced May 2010.
-
Inflationary Behaviour in Axial-symmetric Gravitational Collapse
Authors:
J. S. F. Chan,
R. B. Mann
Abstract:
We show that the interior of a charged, spinning black hole formed from a general axially symmetric gravitational collapse is unstable to inflation of both its mass and angular momentum parameters. Although our results are formulated in the context of $(2+1)$-dimensional black holes, we argue that they are applicable to $(3+1)$ dimensions.
We show that the interior of a charged, spinning black hole formed from a general axially symmetric gravitational collapse is unstable to inflation of both its mass and angular momentum parameters. Although our results are formulated in the context of $(2+1)$-dimensional black holes, we argue that they are applicable to $(3+1)$ dimensions.
△ Less
Submitted 28 November, 1994; v1 submitted 25 November, 1994;
originally announced November 1994.
-
Inside a Spinning Black String
Authors:
J. S. F. Chan,
R. B. Mann
Abstract:
We show that mass inflation occurs inside spinning black cosmic string, which is a solution of a low-energy effective string theory in $(3+1)$-dimensions. This confirms Poisson and Israel's conjecture that the inner mass parameter diverges even if spacetime is not spherically symmetric.
We show that mass inflation occurs inside spinning black cosmic string, which is a solution of a low-energy effective string theory in $(3+1)$-dimensions. This confirms Poisson and Israel's conjecture that the inner mass parameter diverges even if spacetime is not spherically symmetric.
△ Less
Submitted 16 September, 1994;
originally announced September 1994.
-
Gravitation and Cosmology in Generalized (1+1)-dimensional dilaton gravity
Authors:
J. S. F. Chan,
R. B. Mann
Abstract:
The actions of the ``$R=T$'' and string-inspired theories of gravity in (1+1) dimensions are generalized into one single action which is characterized by two functions. We discuss differing interpretations of the matter stress-energy tensor, and show how two such different interpretations can yield two different sets of field equations from this action. The weak-field approximation, post-Newtoni…
▽ More
The actions of the ``$R=T$'' and string-inspired theories of gravity in (1+1) dimensions are generalized into one single action which is characterized by two functions. We discuss differing interpretations of the matter stress-energy tensor, and show how two such different interpretations can yield two different sets of field equations from this action. The weak-field approximation, post-Newtonian expansion, hydrostatic equilibrium state of star and two-dimensional cosmology are studied separately by using the two sets of field equations. Some properties in the ``$R=T$'' and string-inspired theories are shown to be generic in the theory induced by the generalized action.
△ Less
Submitted 25 August, 1994;
originally announced August 1994.
-
Interior Structure of a Charged Spinning Black Hole in $(2+1)$-Dimensions
Authors:
J. S. F. Chan,
K. C. K. Chan,
R. B. Mann
Abstract:
The phenomenon of mass inflation is shown to occur for a rotating black hole. We demonstrate this feature in $(2+1)$ dimensions by extending the charged spinning BTZ black hole to Vaidya form. We find that the mass function diverges in a manner quantitatively similar to its static counterparts in $(3+1)$, $(2+1)$ and $(1+1)$ dimensions.
The phenomenon of mass inflation is shown to occur for a rotating black hole. We demonstrate this feature in $(2+1)$ dimensions by extending the charged spinning BTZ black hole to Vaidya form. We find that the mass function diverges in a manner quantitatively similar to its static counterparts in $(3+1)$, $(2+1)$ and $(1+1)$ dimensions.
△ Less
Submitted 28 June, 1994;
originally announced June 1994.
-
Mass inflation in (1+1)-dimensional Dilaton Gravity
Authors:
J. S. F. Chan,
R. B. Mann
Abstract:
We investigate the phenomenon of mass inflation in two-dimensional dilaton theories of gravity. We consider two distinct black hole spacetimes and construct the mass-inflation solution for each. Our analysis is extended to include multi-horizon spacetimes. We find that the mass function diverges in a manner quantitatively similar to its four-dimensional counterpart.
We investigate the phenomenon of mass inflation in two-dimensional dilaton theories of gravity. We consider two distinct black hole spacetimes and construct the mass-inflation solution for each. Our analysis is extended to include multi-horizon spacetimes. We find that the mass function diverges in a manner quantitatively similar to its four-dimensional counterpart.
△ Less
Submitted 14 June, 1994;
originally announced June 1994.