Search | arXiv e-print repository

arXiv:2407.01949 [pdf, other]

Mass-Balance MRV for Carbon Dioxide Removal by Enhanced Rock Weathering: Methods, Simulation, and Inference

Authors: Mark Baum, Henry Liu, Lily Schacht, Jake Schneider, Mary Yap

Abstract: Carbon dioxide will likely need to be removed from the atmosphere to avoid significant future warming and climate change. Technologies are being developed to remove large quantities of carbon from the atmosphere. Enhanced rock weathering (ERW), where fine-grained silicate minerals are spread on soil, is a promising carbon removal method that can also support crop yields and maintain overall soil h… ▽ More Carbon dioxide will likely need to be removed from the atmosphere to avoid significant future warming and climate change. Technologies are being developed to remove large quantities of carbon from the atmosphere. Enhanced rock weathering (ERW), where fine-grained silicate minerals are spread on soil, is a promising carbon removal method that can also support crop yields and maintain overall soil health. Quantifying the amount of carbon removed by ERW is crucial for understanding the potential of ERW globally and for building trust in commercial operations. However, reliable and scalable quantification in complex media like soil is challenging and there is not yet a consensus on the best method of doing so. Here we discuss mass-balance methods, where stocks of base cations in soil are monitored over time to infer the amount of inorganic carbon brought into solution by weathering reactions. First, we review the fundamental concepts of mass-balance methods and explain different ways of approaching the mass-balance problem. Then we discuss experimental planning and data collection, suggesting some best practices. Next, we present a software package designed to facilitate a range of tasks in ERW like uncertainty analysis, planning field trials, and validating statistical methods. Finally, we briefly review ways of estimating carbon removal using mass balance before discussing some advantages of Bayesian inference in this context and presenting an example Bayesian model. The model is fit to simulated data and recovers the correct answer with a clear representation of uncertainty. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2406.19105 [pdf]

doi 10.13140/RG.2.2.34990.11847/1

Benchmarking M6 Competitors: An Analysis of Financial Metrics and Discussion of Incentives

Authors: Matthew J. Schneider, Rufus Rankin, Prabir Burman, Alexander Aue

Abstract: The M6 Competition assessed the performance of competitors using a ranked probability score and an information ratio (IR). While these metrics do well at picking the winners in the competition, crucial questions remain for investors with longer-term incentives. To address these questions, we compare the competitors' performance to a number of conventional (long-only) and alternative indices using… ▽ More The M6 Competition assessed the performance of competitors using a ranked probability score and an information ratio (IR). While these metrics do well at picking the winners in the competition, crucial questions remain for investors with longer-term incentives. To address these questions, we compare the competitors' performance to a number of conventional (long-only) and alternative indices using standard industry metrics. We apply factor models to the competitors' returns and show the difficulty for any competitor to demonstrate a statistically significant value-add above industry-standard benchmarks within the short timeframe of the competition. We also uncover that most competitors generated lower risk-adjusted returns and lower maximum drawdowns than randomly selected portfolios, and that most competitors could not generate significant out-performance in raw returns. We further introduce two new strategies by picking the competitors with the best (Superstars) and worst (Superlosers) recent performance and show that it is challenging to identify skill amongst investment managers. Overall, our findings highlight the difference in incentives for competitors over professional investors, where the upside of winning the competition dwarfs the potential downside of not winning to maximize fees over an extended period of time. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: Forecasting Competitions, M Competitions, Financial Analysis, Investment Management, Hedge Fund, Portfolio Optimization

arXiv:2406.07585 [pdf, other]

Rate-Preserving Reductions for Blackwell Approachability

Authors: Christoph Dann, Yishay Mansour, Mehryar Mohri, Jon Schneider, Balasubramanian Sivan

Abstract: Abernethy et al. (2011) showed that Blackwell approachability and no-regret learning are equivalent, in the sense that any algorithm that solves a specific Blackwell approachability instance can be converted to a sublinear regret algorithm for a specific no-regret learning instance, and vice versa. In this paper, we study a more fine-grained form of such reductions, and ask when this translation b… ▽ More Abernethy et al. (2011) showed that Blackwell approachability and no-regret learning are equivalent, in the sense that any algorithm that solves a specific Blackwell approachability instance can be converted to a sublinear regret algorithm for a specific no-regret learning instance, and vice versa. In this paper, we study a more fine-grained form of such reductions, and ask when this translation between problems preserves not only a sublinear rate of convergence, but also preserves the optimal rate of convergence. That is, in which cases does it suffice to find the optimal regret bound for a no-regret learning instance in order to find the optimal rate of convergence for a corresponding approachability instance? We show that the reduction of Abernethy et al. (2011) does not preserve rates: their reduction may reduce a $d$-dimensional approachability instance $I_1$ with optimal convergence rate $R_1$ to a no-regret learning instance $I_2$ with optimal regret-per-round of $R_2$, with $R_{2}/R_{1}$ arbitrarily large (in particular, it is possible that $R_1 = 0$ and $R_{2} > 0$). On the other hand, we show that it is possible to tightly reduce any approachability instance to an instance of a generalized form of regret minimization we call improper $φ$-regret minimization (a variant of the $φ$-regret minimization of Gordon et al. (2008) where the transformation functions may map actions outside of the action set). Finally, we characterize when linear transformations suffice to reduce improper $φ$-regret minimization problems to standard classes of regret minimization problems in a rate preserving manner. We prove that some improper $φ$-regret minimization instances cannot be reduced to either subclass of instance in this way, suggesting that approachability can capture some problems that cannot be phrased in the language of online learning. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2401.01857 [pdf, ps, other]

Optimal cross-learning for contextual bandits with unknown context distributions

Authors: Jon Schneider, Julian Zimmert

Abstract: We consider the problem of designing contextual bandit algorithms in the ``cross-learning'' setting of Balseiro et al., where the learner observes the loss for the action they play in all possible contexts, not just the context of the current round. We specifically consider the setting where losses are chosen adversarially and contexts are sampled i.i.d. from an unknown distribution. In this setti… ▽ More We consider the problem of designing contextual bandit algorithms in the ``cross-learning'' setting of Balseiro et al., where the learner observes the loss for the action they play in all possible contexts, not just the context of the current round. We specifically consider the setting where losses are chosen adversarially and contexts are sampled i.i.d. from an unknown distribution. In this setting, we resolve an open problem of Balseiro et al. by providing an efficient algorithm with a nearly tight (up to logarithmic factors) regret bound of $\widetilde{O}(\sqrt{TK})$, independent of the number of contexts. As a consequence, we obtain the first nearly tight regret bounds for the problems of learning to bid in first-price auctions (under unknown value distributions) and slee** bandits with a stochastic action set. At the core of our algorithm is a novel technique for coordinating the execution of a learning algorithm over multiple epochs in such a way to remove correlations between estimation of the unknown distribution and the actions played by the algorithm. This technique may be of independent interest for other learning problems involving estimation of an unknown context distribution. △ Less

Submitted 3 January, 2024; originally announced January 2024.

Comments: Appeared at NeurIPS 2023

arXiv:2312.00267 [pdf, other]

Sample Efficient Reinforcement Learning from Human Feedback via Active Exploration

Authors: Viraj Mehta, Vikramjeet Das, Ojash Neopane, Yijia Dai, Ilija Bogunovic, Jeff Schneider, Willie Neiswanger

Abstract: Preference-based feedback is important for many applications in reinforcement learning where direct evaluation of a reward function is not feasible. A notable recent example arises in reinforcement learning from human feedback (RLHF) on large language models. For many applications of RLHF, the cost of acquiring the human feedback can be substantial. In this work, we take advantage of the fact that… ▽ More Preference-based feedback is important for many applications in reinforcement learning where direct evaluation of a reward function is not feasible. A notable recent example arises in reinforcement learning from human feedback (RLHF) on large language models. For many applications of RLHF, the cost of acquiring the human feedback can be substantial. In this work, we take advantage of the fact that one can often choose contexts at which to obtain human feedback in order to most efficiently identify a good policy, and formalize this as an offline contextual dueling bandit problem. We give an upper-confidence-bound style algorithm for this problem and prove a polynomial worst-case regret bound. We then provide empirical confirmation in a synthetic setting that our approach outperforms existing methods. After, we extend the setting and methodology for practical use in RLHF training of large language models. Here, our method is able to reach better performance with fewer samples of human preferences than multiple baselines on three real-world datasets. △ Less

Submitted 30 November, 2023; originally announced December 2023.

arXiv:2309.06455 [pdf, other]

Multimodal Outcomes in N-of-1 Trials: Combining Unsupervised Learning and Statistical Inference

Authors: Juliana Schneider, Thomas Gärtner, Stefan Konigorski

Abstract: N-of-1 trials are randomized multi-crossover trials in single participants with the purpose of investigating the possible effects of one or more treatments. Research in the field of N-of-1 trials has primarily focused on scalar outcomes. However, with the increasing use of digital technologies, we propose to adapt this design to multimodal outcomes, such as audio, video, or image data or also se… ▽ More N-of-1 trials are randomized multi-crossover trials in single participants with the purpose of investigating the possible effects of one or more treatments. Research in the field of N-of-1 trials has primarily focused on scalar outcomes. However, with the increasing use of digital technologies, we propose to adapt this design to multimodal outcomes, such as audio, video, or image data or also sensor measurements, that can easily be collected by the trial participants on their personal mobile devices. We present here a fully automated approach for analyzing multimodal N-of-1 trials by combining unsupervised deep learning models with statistical inference. First, we train an autoencoder on all images across all patients to create a lower-dimensional embedding. In the second step, the embeddings are reduced to a single dimension by projecting on the first principal component, again using all images. Finally, we test on an individual level whether treatment and non-treatment periods differ with respect to the component. We apply our proposed approach to a published series of multimodal N-of-1 trials of 5 participants who tested the effect of creams on acne captured through images over 16 days. We compare several parametric and non-parametric statistical tests, and we also compare the results to an expert analysis that rates the pictures directly with respect to their acne severity and applies a t-test on these scores. The results indicate a treatment effect for one individual in the expert analysis. This effect was replicated with the proposed unsupervised pipeline. In summary, our proposed approach enables the use of novel data types in N-of-1 trials while avoiding the need for manual labels. We anticipate that this can be the basis for further explorations of valid and interpretable approaches and their application in clinical multimodal N-of-1 trials. △ Less

Submitted 12 September, 2023; originally announced September 2023.

Comments: 11 pages, 4 figures

arXiv:2307.11288 [pdf, other]

Kernelized Offline Contextual Dueling Bandits

Authors: Viraj Mehta, Ojash Neopane, Vikramjeet Das, Sen Lin, Jeff Schneider, Willie Neiswanger

Abstract: Preference-based feedback is important for many applications where direct evaluation of a reward function is not feasible. A notable recent example arises in reinforcement learning from human feedback on large language models. For many of these applications, the cost of acquiring the human feedback can be substantial or even prohibitive. In this work, we take advantage of the fact that often the a… ▽ More Preference-based feedback is important for many applications where direct evaluation of a reward function is not feasible. A notable recent example arises in reinforcement learning from human feedback on large language models. For many of these applications, the cost of acquiring the human feedback can be substantial or even prohibitive. In this work, we take advantage of the fact that often the agent can choose contexts at which to obtain human feedback in order to most efficiently identify a good policy, and introduce the offline contextual dueling bandit setting. We give an upper-confidence-bound style algorithm for this setting and prove a regret bound. We also give empirical confirmation that this method outperforms a similar strategy that uses uniformly sampled contexts. △ Less

Submitted 20 July, 2023; originally announced July 2023.

arXiv:2305.07685 [pdf, other]

Synthetic data generation for a longitudinal cohort study -- Evaluation, method extension and reproduction of published data analysis results

Authors: Lisa Kühnel, Julian Schneider, Ines Perrar, Tim Adams, Fabian Prasser, Ute Nöthlings, Holger Fröhlich, Juliane Fluck

Abstract: Access to individual-level health data is essential for gaining new insights and advancing science. In particular, modern methods based on artificial intelligence rely on the availability of and access to large datasets. In the health sector, access to individual-level data is often challenging due to privacy concerns. A promising alternative is the generation of fully synthetic data, i.e. data ge… ▽ More Access to individual-level health data is essential for gaining new insights and advancing science. In particular, modern methods based on artificial intelligence rely on the availability of and access to large datasets. In the health sector, access to individual-level data is often challenging due to privacy concerns. A promising alternative is the generation of fully synthetic data, i.e. data generated through a randomised process that have similar statistical properties as the original data, but do not have a one-to-one correspondence with the original individual-level records. In this study, we use a state-of-the-art synthetic data generation method and perform in-depth quality analyses of the generated data for a specific use case in the field of nutrition. We demonstrate the need for careful analyses of synthetic data that go beyond descriptive statistics and provide valuable insights into how to realise the full potential of synthetic datasets. By extending the methods, but also by thoroughly analysing the effects of sampling from a trained model, we are able to largely reproduce significant real-world analysis results in the chosen use case. △ Less

Submitted 12 May, 2023; originally announced May 2023.

arXiv:2302.12728 [pdf, ps, other]

Statistical Principles for Platform Trials

Authors: ** Cui, Emily Ouyang, Yi Liu, **g**g Schneider, Hong Tian, Bushi Wang, Jason C. Hsu

Abstract: While within a clinical study there may be multiple doses and endpoints, across different studies each study will result in either an approval or a lack of approval of the drug compound studied. The term False Approval Rate (FAR) is the term this paper utilizes to represent the proportion of drug compounds that lack efficacy incorrectly approved by regulators. (In the U.S., compounds that have eff… ▽ More While within a clinical study there may be multiple doses and endpoints, across different studies each study will result in either an approval or a lack of approval of the drug compound studied. The term False Approval Rate (FAR) is the term this paper utilizes to represent the proportion of drug compounds that lack efficacy incorrectly approved by regulators. (In the U.S., compounds that have efficacy and are approved are not involved in the FAR consideration, according to our reading of the relevant U.S. Congressional statute). While Tukey's (1953) Error Rate Familywise (ERFw) is meant to be applied within a clinical study, Tukey's (1953) Error Rate per Family (ERpF), defined along-side ERFw, is meant to be applied across studies. We show that controlling Error Rate Familywise (ERFw) within a clinical study at 5% in turn controls Error Rate per Family (ERpF) across studies at 5-per-100, regardless of whether the studies are correlated or not. Further, we show that ongoing regulatory practice, the additive multiplicity adjustment method of controlling ERpF, is controlling False Approval Rate (FAR) exactly (not conservatively) at 5-per-100 (even for Platform trials). In contrast, if a regulatory agency chooses to control the False Discovery Rate (FDR) across studies at 5% instead, then this change in policy from ERpF control to FDR control will result in incorrectly approving drug compounds that lack efficacy at a rate higher than 5-per-100, because in essence it gives the industry additional rewards for successfully develo** compounds that have efficacy and are approved. Seems to us the discussion of such a change in policy would be at a level higher than merely statistical, needing harmonizsation/harmonization (In the U.S., policy is set by the Congress). △ Less

Submitted 17 June, 2024; v1 submitted 24 February, 2023; originally announced February 2023.

arXiv:2212.09510 [pdf, other]

Near-optimal Policy Identification in Active Reinforcement Learning

Authors: Xiang Li, Viraj Mehta, Johannes Kirschner, Ian Char, Willie Neiswanger, Jeff Schneider, Andreas Krause, Ilija Bogunovic

Abstract: Many real-world reinforcement learning tasks require control of complex dynamical systems that involve both costly data acquisition processes and large state spaces. In cases where the transition dynamics can be readily evaluated at specified states (e.g., via a simulator), agents can operate in what is often referred to as planning with a \emph{generative model}. We propose the AE-LSVI algorithm… ▽ More Many real-world reinforcement learning tasks require control of complex dynamical systems that involve both costly data acquisition processes and large state spaces. In cases where the transition dynamics can be readily evaluated at specified states (e.g., via a simulator), agents can operate in what is often referred to as planning with a \emph{generative model}. We propose the AE-LSVI algorithm for best-policy identification, a novel variant of the kernelized least-squares value iteration (LSVI) algorithm that combines optimism with pessimism for active exploration (AE). AE-LSVI provably identifies a near-optimal policy \emph{uniformly} over an entire state space and achieves polynomial sample complexity guarantees that are independent of the number of states. When specialized to the recently introduced offline contextual Bayesian optimization setting, our algorithm achieves improved sample complexity bounds. Experimentally, we demonstrate that AE-LSVI outperforms other RL algorithms in a variety of environments when robustness to the initial state is required. △ Less

Submitted 19 December, 2022; originally announced December 2022.

arXiv:2210.04642 [pdf, other]

Exploration via Planning for Information about the Optimal Trajectory

Authors: Viraj Mehta, Ian Char, Joseph Abbate, Rory Conlin, Mark D. Boyer, Stefano Ermon, Jeff Schneider, Willie Neiswanger

Abstract: Many potential applications of reinforcement learning (RL) are stymied by the large numbers of samples required to learn an effective policy. This is especially true when applying RL to real-world control tasks, e.g. in the sciences or robotics, where executing a policy in the environment is costly. In popular RL algorithms, agents typically explore either by adding stochasticity to a reward-maxim… ▽ More Many potential applications of reinforcement learning (RL) are stymied by the large numbers of samples required to learn an effective policy. This is especially true when applying RL to real-world control tasks, e.g. in the sciences or robotics, where executing a policy in the environment is costly. In popular RL algorithms, agents typically explore either by adding stochasticity to a reward-maximizing policy or by attempting to gather maximal information about environment dynamics without taking the given task into account. In this work, we develop a method that allows us to plan for exploration while taking both the task and the current knowledge about the dynamics into account. The key insight to our approach is to plan an action sequence that maximizes the expected information gain about the optimal trajectory for the task at hand. We demonstrate that our method learns strong policies with 2x fewer samples than strong exploration baselines and 200x fewer samples than model free methods on a diverse set of low-to-medium dimensional control tasks in both the open-loop and closed-loop control settings. △ Less

Submitted 6 October, 2022; originally announced October 2022.

Comments: Conference paper at Neurips 2022. Code available at https://github.com/fusion-ml/trajectory-information-rl. arXiv admin note: text overlap with arXiv:2112.05244

arXiv:2209.03253 [pdf, other]

doi 10.1016/j.conctc.2024.101282

Analyzing Population-Level Trials as N-of-1 Trials: an Application to Gait

Authors: Lin Zhou, Juliana Schneider, Bert Arnrich, Stefan Konigorski

Abstract: Studying individual causal effects of health interventions is of interest whenever intervention effects are heterogeneous between study participants. Conducting N-of-1 trials, which are single-person randomized controlled trials, is the gold standard for their analysis. In this study, we propose to re-analyze existing population-level studies as N-of-1 trials as an alternative, and we use gait as… ▽ More Studying individual causal effects of health interventions is of interest whenever intervention effects are heterogeneous between study participants. Conducting N-of-1 trials, which are single-person randomized controlled trials, is the gold standard for their analysis. In this study, we propose to re-analyze existing population-level studies as N-of-1 trials as an alternative, and we use gait as a use case for illustration. Gait data were collected from 16 young and healthy participants under fatigued and non-fatigued, as well as under single-task (only walking) and dual-task (walking while performing a cognitive task) conditions. We first computed standard population-level ANOVA models to evaluate differences in gait parameters (stride length and stride time) across conditions. Then, we estimated the effect of the interventions on gait parameters on the individual level through Bayesian linear mixed models, viewing each participant as their own trial, and compared the results. The results illustrated that while few overall population-level effects were visible, individual-level analyses showed nuanced differences between participants. Baseline values of the gait parameters varied largely among all participants, and the changes induced by fatigue and cognitive task performance were also highly heterogeneous, with some individuals showing effects in opposite direction. These differences between population-level and individual-level analyses were more pronounced for the fatigue intervention compared to the cognitive task intervention. Following our empirical analysis, we discuss re-analyzing population studies through the lens of N-of-1 trials more generally and highlight important considerations and requirements. Our work encourages future studies to investigate individual effects using population-level data. △ Less

Submitted 26 February, 2024; v1 submitted 7 September, 2022; originally announced September 2022.

Comments: Main content: 20 pages, 4 figures. Supplementary materials are included at the end in the same PDF file

arXiv:2205.14519 [pdf, other]

Online Learning with Bounded Recall

Authors: Jon Schneider, Kiran Vodrahalli

Abstract: We study the problem of full-information online learning in the "bounded recall" setting popular in the study of repeated games. An online learning algorithm $\mathcal{A}$ is $M$-$\textit{bounded-recall}$ if its output at time $t$ can be written as a function of the $M$ previous rewards (and not e.g. any other internal state of $\mathcal{A}$). We first demonstrate that a natural approach to constr… ▽ More We study the problem of full-information online learning in the "bounded recall" setting popular in the study of repeated games. An online learning algorithm $\mathcal{A}$ is $M$-$\textit{bounded-recall}$ if its output at time $t$ can be written as a function of the $M$ previous rewards (and not e.g. any other internal state of $\mathcal{A}$). We first demonstrate that a natural approach to constructing bounded-recall algorithms from mean-based no-regret learning algorithms (e.g., running Hedge over the last $M$ rounds) fails, and that any such algorithm incurs constant regret per round. We then construct a stationary bounded-recall algorithm that achieves a per-round regret of $Θ(1/\sqrt{M})$, which we complement with a tight lower bound. Finally, we show that unlike the perfect recall setting, any low regret bound bounded-recall algorithm must be aware of the ordering of the past $M$ losses -- any bounded-recall algorithm which plays a symmetric function of the past $M$ losses must incur constant regret per round. △ Less

Submitted 31 May, 2024; v1 submitted 28 May, 2022; originally announced May 2022.

Comments: 13 pages, 2 figures, accepted at ICML 2024

arXiv:2202.12785 [pdf]

doi 10.1007/978-3-031-01233-4_8

Confidence Calibration for Object Detection and Segmentation

Authors: Fabian Küppers, Anselm Haselhoff, Jan Kronenberger, Jonas Schneider

Abstract: Calibrated confidence estimates obtained from neural networks are crucial, particularly for safety-critical applications such as autonomous driving or medical image diagnosis. However, although the task of confidence calibration has been investigated on classification problems, thorough investigations on object detection and segmentation problems are still missing. Therefore, we focus on the inves… ▽ More Calibrated confidence estimates obtained from neural networks are crucial, particularly for safety-critical applications such as autonomous driving or medical image diagnosis. However, although the task of confidence calibration has been investigated on classification problems, thorough investigations on object detection and segmentation problems are still missing. Therefore, we focus on the investigation of confidence calibration for object detection and segmentation models in this chapter. We introduce the concept of multivariate confidence calibration that is an extension of well-known calibration methods to the task of object detection and segmentation. This allows for an extended confidence calibration that is also aware of additional features such as bounding box/pixel position, shape information, etc. Furthermore, we extend the expected calibration error (ECE) to measure miscalibration of object detection and segmentation models. We examine several network architectures on MS COCO as well as on Cityscapes and show that especially object detection as well as instance segmentation models are intrinsically miscalibrated given the introduced definition of calibration. Using our proposed calibration methods, we have been able to improve calibration so that it also has a positive impact on the quality of segmentation masks as well. △ Less

Submitted 20 June, 2022; v1 submitted 25 February, 2022; originally announced February 2022.

Comments: Book chapter in: Tim Fingerscheidt, Hanno Gottschalk, Sebastian Houben (eds.): "Deep Neural Networks and Data for Automated Driving", pp. 225--250, Springer Nature, Switzerland, 2022

Journal ref: In: Tim Fingerscheidt, Hanno Gottschalk, Sebastian Houben (eds.): "Deep Neural Networks and Data for Automated Driving", pp. 225--250, Springer Nature, Switzerland, 2022

arXiv:2112.05244 [pdf, other]

An Experimental Design Perspective on Model-Based Reinforcement Learning

Authors: Viraj Mehta, Biswajit Paria, Jeff Schneider, Stefano Ermon, Willie Neiswanger

Abstract: In many practical applications of RL, it is expensive to observe state transitions from the environment. For example, in the problem of plasma control for nuclear fusion, computing the next state for a given state-action pair requires querying an expensive transition function which can lead to many hours of computer simulation or dollars of scientific research. Such expensive data collection prohi… ▽ More In many practical applications of RL, it is expensive to observe state transitions from the environment. For example, in the problem of plasma control for nuclear fusion, computing the next state for a given state-action pair requires querying an expensive transition function which can lead to many hours of computer simulation or dollars of scientific research. Such expensive data collection prohibits application of standard RL algorithms which usually require a large number of observations to learn. In this work, we address the problem of efficiently learning a policy while making a minimal number of state-action queries to the transition function. In particular, we leverage ideas from Bayesian optimal experimental design to guide the selection of state-action queries for efficient learning. We propose an acquisition function that quantifies how much information a state-action pair would provide about the optimal solution to a Markov decision process. At each iteration, our algorithm maximizes this acquisition function, to choose the most informative state-action pair to be queried, thus yielding a data-efficient RL approach. We experiment with a variety of simulated continuous control problems and show that our approach learns an optimal policy with up to $5$ -- $1,000\times$ less data than model-based RL baselines and $10^3$ -- $10^5\times$ less data than model-free RL baselines. We also provide several ablated comparisons which point to substantial improvements arising from the principled method of obtaining data. △ Less

Submitted 15 March, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

Comments: Conference paper at ICLR 2022

arXiv:2109.10757 [pdf, other]

Unsupervised Movement Detection in Indoor Positioning Systems of Production Halls

Authors: Jonathan Flossdorf, Anne Meyer, Dmitri Artjuch, Jaques Schneider, Carsten Jentsch

Abstract: Consider indoor positioning systems (IPS) in production halls where objects equipped with sensors send their current position. Beside its large volume, the analyzation of the resulting raw data is challenging due to the susceptibility towards noise. Reasons are accuracy issues and undesired awakenings of sensors that occur due to the dynamics of logistic processes (e.g.~vibrations of passing forkl… ▽ More Consider indoor positioning systems (IPS) in production halls where objects equipped with sensors send their current position. Beside its large volume, the analyzation of the resulting raw data is challenging due to the susceptibility towards noise. Reasons are accuracy issues and undesired awakenings of sensors that occur due to the dynamics of logistic processes (e.g.~vibrations of passing forklifts). We propose a tailor-made statistical procedure for these challenges and combine visual analytics with movement detection. Contrary to common stay-point algorithms, we do not only distinguish between stops and moves, but also consider undesired awakenings. This leads to a more detailed interpretation scheme offering usages for online (e.g.~monitoring of orders) and offline applications (e.g.~detection of problematic areas). The approach does not require other information than the raw IPS output and enables an ad-hoc analysis. We underline our findings in an extensive case study with real IPS data of our industry partner. △ Less

Submitted 27 September, 2023; v1 submitted 21 August, 2021; originally announced September 2021.

arXiv:2109.10254 [pdf, other]

Uncertainty Toolbox: an Open-Source Library for Assessing, Visualizing, and Improving Uncertainty Quantification

Authors: Youngseog Chung, Ian Char, Han Guo, Jeff Schneider, Willie Neiswanger

Abstract: With increasing deployment of machine learning systems in various real-world tasks, there is a greater need for accurate quantification of predictive uncertainty. While the common goal in uncertainty quantification (UQ) in machine learning is to approximate the true distribution of the target data, many works in UQ tend to be disjoint in the evaluation metrics utilized, and disparate implementatio… ▽ More With increasing deployment of machine learning systems in various real-world tasks, there is a greater need for accurate quantification of predictive uncertainty. While the common goal in uncertainty quantification (UQ) in machine learning is to approximate the true distribution of the target data, many works in UQ tend to be disjoint in the evaluation metrics utilized, and disparate implementations for each metric lead to numerical results that are not directly comparable across different works. To address this, we introduce Uncertainty Toolbox, an open-source python library that helps to assess, visualize, and improve UQ. Uncertainty Toolbox additionally provides pedagogical resources, such as a glossary of key terms and an organized collection of key paper references. We hope that this toolbox is useful for accelerating and uniting research efforts in uncertainty in machine learning. △ Less

Submitted 21 September, 2021; originally announced September 2021.

arXiv:2011.09588 [pdf, other]

Beyond Pinball Loss: Quantile Methods for Calibrated Uncertainty Quantification

Authors: Youngseog Chung, Willie Neiswanger, Ian Char, Jeff Schneider

Abstract: Among the many ways of quantifying uncertainty in a regression setting, specifying the full quantile function is attractive, as quantiles are amenable to interpretation and evaluation. A model that predicts the true conditional quantiles for each input, at all quantile levels, presents a correct and efficient representation of the underlying uncertainty. To achieve this, many current quantile-base… ▽ More Among the many ways of quantifying uncertainty in a regression setting, specifying the full quantile function is attractive, as quantiles are amenable to interpretation and evaluation. A model that predicts the true conditional quantiles for each input, at all quantile levels, presents a correct and efficient representation of the underlying uncertainty. To achieve this, many current quantile-based methods focus on optimizing the so-called pinball loss. However, this loss restricts the scope of applicable regression models, limits the ability to target many desirable properties (e.g. calibration, sharpness, centered intervals), and may produce poor conditional quantiles. In this work, we develop new quantile methods that address these shortcomings. In particular, we propose methods that can apply to any class of regression model, allow for selecting a trade-off between calibration and sharpness, optimize for calibration of centered intervals, and produce more accurate conditional quantiles. We provide a thorough experimental evaluation of our methods, which includes a high dimensional uncertainty quantification task in nuclear fusion. △ Less

Submitted 9 December, 2021; v1 submitted 18 November, 2020; originally announced November 2020.

Comments: Appears in Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

arXiv:2011.01041 [pdf, other]

New definitions (measures) of skewness, mean and dispersion of fuzzy numbers -- by way of a new representation as parameterized curves

Authors: Jan Schneider

Abstract: We give a geometrically motivated measure of skewness, define a mean value triangle number, and dispersion (in that order) of a fuzzy number without reference or seeking analogy to the namesake but parallel concepts in probability theory. These measures come about by way of a new representation of fuzzy numbers as parameterized curves respectively their associated tangent bundle. Importantly skewn… ▽ More We give a geometrically motivated measure of skewness, define a mean value triangle number, and dispersion (in that order) of a fuzzy number without reference or seeking analogy to the namesake but parallel concepts in probability theory. These measures come about by way of a new representation of fuzzy numbers as parameterized curves respectively their associated tangent bundle. Importantly skewness and dispersion are given as functions of $α$ (the degree of membership) and such may be given separately and pointwise at each $α$-level, as well as overall. This allows for e.g., when a mathematical model is formulated in fuzzy numbers, to run optimization programs level-wise thereby encapsuling with deliberate accuracy the involved membership functions' characteristics while increasing the computational complexity by only a multiplicative factor compared to the same program formulated in real variables and parameters. As an example the work offers a contribution to the recently very popular fuzzy mean-variance-skewness portfolio optimization. △ Less

Submitted 28 October, 2020; originally announced November 2020.

arXiv:2009.05138 [pdf, other]

Learning Product Rankings Robust to Fake Users

Authors: Negin Golrezaei, Vahideh Manshadi, Jon Schneider, Shreyas Sekar

Abstract: In many online platforms, customers' decisions are substantially influenced by product rankings as most customers only examine a few top-ranked products. Concurrently, such platforms also use the same data corresponding to customers' actions to learn how these products must be ranked or ordered. These interactions in the underlying learning process, however, may incentivize sellers to artificially… ▽ More In many online platforms, customers' decisions are substantially influenced by product rankings as most customers only examine a few top-ranked products. Concurrently, such platforms also use the same data corresponding to customers' actions to learn how these products must be ranked or ordered. These interactions in the underlying learning process, however, may incentivize sellers to artificially inflate their position by employing fake users, as exemplified by the emergence of click farms. Motivated by such fraudulent behavior, we study the ranking problem of a platform that faces a mixture of real and fake users who are indistinguishable from one another. We first show that existing learning algorithms---that are optimal in the absence of fake users---may converge to highly sub-optimal rankings under manipulation by fake users. To overcome this deficiency, we develop efficient learning algorithms under two informational environments: in the first setting, the platform is aware of the number of fake users, and in the second setting, it is agnostic to the number of fake users. For both these environments, we prove that our algorithms converge to the optimal ranking, while being robust to the aforementioned fraudulent behavior; we also present worst-case performance guarantees for our methods, and show that they significantly outperform existing algorithms. At a high level, our work employs several novel approaches to guarantee robustness such as: (i) constructing product-ordering graphs that encode the pairwise relationships between products inferred from the customers' actions; and (ii) implementing multiple levels of learning with a judicious amount of bi-directional cross-learning between levels. △ Less

Submitted 10 September, 2020; originally announced September 2020.

Comments: 65 pages, 4 figures

arXiv:2008.07331 [pdf, other]

Interactive Visualization for Debugging RL

Authors: Shuby Deshpande, Benjamin Eysenbach, Jeff Schneider

Abstract: Visualization tools for supervised learning allow users to interpret, introspect, and gain an intuition for the successes and failures of their models. While reinforcement learning practitioners ask many of the same questions, existing tools are not applicable to the RL setting as these tools address challenges typically found in the supervised learning regime. In this work, we design and implemen… ▽ More Visualization tools for supervised learning allow users to interpret, introspect, and gain an intuition for the successes and failures of their models. While reinforcement learning practitioners ask many of the same questions, existing tools are not applicable to the RL setting as these tools address challenges typically found in the supervised learning regime. In this work, we design and implement an interactive visualization tool for debugging and interpreting RL algorithms. Our system addresses many features missing from previous tools such as (1) tools for supervised learning often are not interactive; (2) while debugging RL policies researchers use state representations that are different from those seen by the agent; (3) a framework designed to make the debugging RL policies more conducive. We provide an example workflow of how this system could be used, along with ideas for future extensions. △ Less

Submitted 18 August, 2020; v1 submitted 14 August, 2020; originally announced August 2020.

Comments: Builds on preliminary work presented at ICML 2020 (WHI) arXiv:2007.05577. An interactive demo of the system can be at https://tinyurl.com/y5gv5t4m

arXiv:2008.03665 [pdf, other]

Using social media to measure demographic responses to natural disaster: Insights from a large-scale Facebook survey following the 2019 Australia Bushfires

Authors: Paige Maas, Zack Almquist, Eugenia Giraudy, JW Schneider

Abstract: In this paper we explore a novel method for collecting survey data following a natural disaster and then combine this data with device-derived mobility information to explore demographic outcomes. Using social media as a survey platform for measuring demographic outcomes, especially those that are challenging or expensive to field for, is increasingly of interest to the demographic community. Rece… ▽ More In this paper we explore a novel method for collecting survey data following a natural disaster and then combine this data with device-derived mobility information to explore demographic outcomes. Using social media as a survey platform for measuring demographic outcomes, especially those that are challenging or expensive to field for, is increasingly of interest to the demographic community. Recent work by Schneider and Harknett (2019) explores the use of Facebook targeted advertisements to collect data on low-income shift workers in the United States. Other work has addressed immigrant assimilation (Stewart et al, 2019), world fertility (Ribeiro et al, 2020), and world migration stocks (Zagheni et al, 2017). We build on this work by introducing a rapid-response survey of post-disaster demographic and economic outcomes fielded through the Facebook app itself. We use these survey responses to augment app-derived mobility data that comprises Facebook Displacement Maps to assess the validity of and drivers underlying those observed behavioral trends. This survey was deployed following the 2019 Australia bushfires to better understand how these events displaced residents. In doing so we are able to test a number of key hypotheses around displacement and demographics. In particular, we uncover several gender differences in key areas, including in displacement decision-making and timing, and in access to protective equipment such as smoke masks. We conclude with a brief discussion of research and policy implications. △ Less

Submitted 9 August, 2020; originally announced August 2020.

arXiv:2008.01468 [pdf, other]

doi 10.5281/zenodo.3970396

On Feature Relevance Uncertainty: A Monte Carlo Dropout Sampling Approach

Authors: Kai Fischer, Jonas Schneider

Abstract: Understanding decisions made by neural networks is key for the deployment of intelligent systems in real world applications. However, the opaque decision making process of these systems is a disadvantage where interpretability is essential. Many feature-based explanation techniques have been introduced over the last few years in the field of machine learning to better understand decisions made by… ▽ More Understanding decisions made by neural networks is key for the deployment of intelligent systems in real world applications. However, the opaque decision making process of these systems is a disadvantage where interpretability is essential. Many feature-based explanation techniques have been introduced over the last few years in the field of machine learning to better understand decisions made by neural networks and have become an important component to verify their reasoning capabilities. However, existing methods do not allow statements to be made about the uncertainty regarding a feature's relevance for the prediction. In this paper, we introduce Monte Carlo Relevance Propagation (MCRP) for feature relevance uncertainty estimation. A simple but powerful method based on Monte Carlo estimation of the feature relevance distribution to compute feature relevance uncertainty scores that allow a deeper understanding of a neural network's perception and reasoning. △ Less

Submitted 11 April, 2023; v1 submitted 4 August, 2020; originally announced August 2020.

Comments: 18 pages, 15 figures

ACM Class: I.2.10; I.4

arXiv:2007.05577 [pdf, other]

Vizarel: A System to Help Better Understand RL Agents

Authors: Shuby Deshpande, Jeff Schneider

Abstract: Visualization tools for supervised learning have allowed users to interpret, introspect, and gain intuition for the successes and failures of their models. While reinforcement learning practitioners ask many of the same questions, existing tools are not applicable to the RL setting. In this work, we describe our initial attempt at constructing a prototype of these ideas, through identifying possib… ▽ More Visualization tools for supervised learning have allowed users to interpret, introspect, and gain intuition for the successes and failures of their models. While reinforcement learning practitioners ask many of the same questions, existing tools are not applicable to the RL setting. In this work, we describe our initial attempt at constructing a prototype of these ideas, through identifying possible features that such a system should encapsulate. Our design is motivated by envisioning the system to be a platform on which to experiment with interpretable reinforcement learning. △ Less

Submitted 10 July, 2020; originally announced July 2020.

Comments: Accepted to ICML 2020 Workshop on Human Interpretability in Machine Learning (Spotlight)

arXiv:2006.14718 [pdf, other]

Asynchronous Multi Agent Active Search

Authors: Ramina Ghods, Arundhati Banerjee, Jeff Schneider

Abstract: Active search refers to the problem of efficiently locating targets in an unknown environment by actively making data-collection decisions, and has many applications including detecting gas leaks, radiation sources or human survivors of disasters using aerial and/or ground robots (agents). Existing active search methods are in general only amenable to a single agent, or if they extend to multi age… ▽ More Active search refers to the problem of efficiently locating targets in an unknown environment by actively making data-collection decisions, and has many applications including detecting gas leaks, radiation sources or human survivors of disasters using aerial and/or ground robots (agents). Existing active search methods are in general only amenable to a single agent, or if they extend to multi agent they require a central control system to coordinate the actions of all agents. However, such control systems are often impractical in robotics applications. In this paper, we propose two distinct active search algorithms called SPATS (Sparse Parallel Asynchronous Thompson Sampling) and LATSI (LAplace Thompson Sampling with Information gain) that allow for multiple agents to independently make data-collection decisions without a central coordinator. Throughout we consider that targets are sparsely located around the environment in kee** with compressive sensing assumptions and its applicability in real world scenarios. Additionally, while most common search algorithms assume that agents can sense the entire environment (e.g. compressive sensing) or sense point-wise (e.g. Bayesian Optimization) at all times, we make a realistic assumption that each agent can only sense a contiguous region of space at a time. We provide simulation results as well as theoretical analysis to demonstrate the efficacy of our proposed algorithms. △ Less

Submitted 25 June, 2020; originally announced June 2020.

Comments: Preprint under review

arXiv:2006.12682 [pdf, other]

doi 10.1109/CDC45484.2021.9682807

Neural Dynamical Systems: Balancing Structure and Flexibility in Physical Prediction

Authors: Viraj Mehta, Ian Char, Willie Neiswanger, Youngseog Chung, Andrew Oakleigh Nelson, Mark D Boyer, Egemen Kolemen, Jeff Schneider

Abstract: We introduce Neural Dynamical Systems (NDS), a method of learning dynamical models in various gray-box settings which incorporates prior knowledge in the form of systems of ordinary differential equations. NDS uses neural networks to estimate free parameters of the system, predicts residual terms, and numerically integrates over time to predict future states. A key insight is that many real dynami… ▽ More We introduce Neural Dynamical Systems (NDS), a method of learning dynamical models in various gray-box settings which incorporates prior knowledge in the form of systems of ordinary differential equations. NDS uses neural networks to estimate free parameters of the system, predicts residual terms, and numerically integrates over time to predict future states. A key insight is that many real dynamical systems of interest are hard to model because the dynamics may vary across rollouts. We mitigate this problem by taking a trajectory of prior states as the input to NDS and train it to dynamically estimate system parameters using the preceding trajectory. We find that NDS learns dynamics with higher accuracy and fewer samples than a variety of deep learning methods that do not incorporate the prior knowledge and methods from the system identification literature which do. We demonstrate these advantages first on synthetic dynamical systems and then on real data captured from deuterium shots from a nuclear fusion reactor. Finally, we demonstrate that these benefits can be utilized for control in small-scale experiments. △ Less

Submitted 27 April, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

arXiv:2006.06519 [pdf, other]

Reserve Price Optimization for First Price Auctions

Authors: Zhe Feng, Sébastien Lahaie, Jon Schneider, **chao Ye

Abstract: The display advertising industry has recently transitioned from second- to first-price auctions as its primary mechanism for ad allocation and pricing. In light of this, publishers need to re-evaluate and optimize their auction parameters, notably reserve prices. In this paper, we propose a gradient-based algorithm to adaptively update and optimize reserve prices based on estimates of bidders' res… ▽ More The display advertising industry has recently transitioned from second- to first-price auctions as its primary mechanism for ad allocation and pricing. In light of this, publishers need to re-evaluate and optimize their auction parameters, notably reserve prices. In this paper, we propose a gradient-based algorithm to adaptively update and optimize reserve prices based on estimates of bidders' responsiveness to experimental shocks in reserves. Our key innovation is to draw on the inherent structure of the revenue objective in order to reduce the variance of gradient estimates and improve convergence rates in both theory and practice. We show that revenue in a first-price auction can be usefully decomposed into a \emph{demand} component and a \emph{bidding} component, and introduce techniques to reduce the variance of each component. We characterize the bias-variance trade-offs of these techniques and validate the performance of our proposed algorithm through experiments on synthetic data and real display ad auctions data from Google ad exchange. △ Less

Submitted 28 June, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

arXiv:2006.04944 [pdf, other]

A Machine Learning System for Retaining Patients in HIV Care

Authors: Avishek Kumar, Arthi Ramachandran, Adolfo De Unanue, Christina Sung, Joe Walsh, John Schneider, Jessica Ridgway, Stephanie Masiello Schuette, Jeff Lauritsen, Rayid Ghani

Abstract: Retaining persons living with HIV (PLWH) in medical care is paramount to preventing new transmissions of the virus and allowing PLWH to live normal and healthy lifespans. Maintaining regular appointments with an HIV provider and taking medication daily for a lifetime is exceedingly difficult. 51% of PLWH are non-adherent with their medications and eventually drop out of medical care. Current metho… ▽ More Retaining persons living with HIV (PLWH) in medical care is paramount to preventing new transmissions of the virus and allowing PLWH to live normal and healthy lifespans. Maintaining regular appointments with an HIV provider and taking medication daily for a lifetime is exceedingly difficult. 51% of PLWH are non-adherent with their medications and eventually drop out of medical care. Current methods of re-linking individuals to care are reactive (after a patient has dropped-out) and hence not very effective. We describe our system to predict who is most at risk to drop-out-of-care for use by the University of Chicago HIV clinic and the Chicago Department of Public Health. Models were selected based on their predictive performance under resource constraints, stability over time, as well as fairness. Our system is applicable as a point-of-care system in a clinical setting as well as a batch prediction system to support regular interventions at the city level. Our model performs 3x better than the baseline for the clinical model and 2.3x better than baseline for the city-wide model. The code has been released on github and we hope this methodology, particularly our focus on fairness, will be adopted by other clinics and public health agencies in order to curb the HIV epidemic. △ Less

Submitted 31 May, 2020; originally announced June 2020.

arXiv:2005.13630 [pdf, other]

Explaining Neural Networks by Decoding Layer Activations

Authors: Johannes Schneider, Michalis Vlachos

Abstract: We present a `CLAssifier-DECoder' architecture (\emph{ClaDec}) which facilitates the comprehension of the output of an arbitrary layer in a neural network (NN). It uses a decoder to transform the non-interpretable representation of the given layer to a representation that is more similar to the domain a human is familiar with. In an image recognition problem, one can recognize what information is… ▽ More We present a `CLAssifier-DECoder' architecture (\emph{ClaDec}) which facilitates the comprehension of the output of an arbitrary layer in a neural network (NN). It uses a decoder to transform the non-interpretable representation of the given layer to a representation that is more similar to the domain a human is familiar with. In an image recognition problem, one can recognize what information is represented by a layer by contrasting reconstructed images of \emph{ClaDec} with those of a conventional auto-encoder(AE) serving as reference. We also extend \emph{ClaDec} to allow the trade-off between human interpretability and fidelity. We evaluate our approach for image classification using Convolutional NNs. We show that reconstructed visualizations using encodings from a classifier capture more relevant information for classification than conventional AEs. Relevant code is available at \url{https://github.com/JohnTailor/ClaDec} △ Less

Submitted 26 February, 2021; v1 submitted 27 May, 2020; originally announced May 2020.

Journal ref: Intelligent Data Analysis (IDA), 2021

arXiv:2003.04422 [pdf, other]

Correlated Initialization for Correlated Data

Authors: Johannes Schneider

Abstract: Spatial data exhibits the property that nearby points are correlated. This also holds for learnt representations across layers, but not for commonly used weight initialization methods. Our theoretical analysis quantifies the learning behavior of weights of a single spatial filter. It is thus in contrast to a large body of work that discusses statistical properties of weights. It shows that uncorre… ▽ More Spatial data exhibits the property that nearby points are correlated. This also holds for learnt representations across layers, but not for commonly used weight initialization methods. Our theoretical analysis quantifies the learning behavior of weights of a single spatial filter. It is thus in contrast to a large body of work that discusses statistical properties of weights. It shows that uncorrelated initialization (i) might lead to poor convergence behavior and (ii) training of (some) parameters is likely subject to slow convergence. Empirical analysis shows that these findings for a single spatial filter extend to networks with many spatial filters. The impact of (correlated) initialization depends strongly on learning rates and l2-regularization. △ Less

Submitted 1 February, 2023; v1 submitted 9 March, 2020; originally announced March 2020.

arXiv:2001.07641 [pdf, other]

Deceptive AI Explanations: Creation and Detection

Authors: Johannes Schneider, Christian Meske, Michalis Vlachos

Abstract: Artificial intelligence (AI) comes with great opportunities but can also pose significant risks. Automatically generated explanations for decisions can increase transparency and foster trust, especially for systems based on automated predictions by AI models. However, given, e.g., economic incentives to create dishonest AI, to what extent can we trust explanations? To address this issue, our work… ▽ More Artificial intelligence (AI) comes with great opportunities but can also pose significant risks. Automatically generated explanations for decisions can increase transparency and foster trust, especially for systems based on automated predictions by AI models. However, given, e.g., economic incentives to create dishonest AI, to what extent can we trust explanations? To address this issue, our work investigates how AI models (i.e., deep learning, and existing instruments to increase transparency regarding AI decisions) can be used to create and detect deceptive explanations. As an empirical evaluation, we focus on text classification and alter the explanations generated by GradCAM, a well-established explanation technique in neural networks. Then, we evaluate the effect of deceptive explanations on users in an experiment with 200 participants. Our findings confirm that deceptive explanations can indeed fool humans. However, one can deploy machine learning (ML) methods to detect seemingly minor deception attempts with accuracy exceeding 80% given sufficient domain knowledge. Without domain knowledge, one can still infer inconsistencies in the explanations in an unsupervised manner, given basic knowledge of the predictive model under scrutiny. △ Less

Submitted 2 December, 2021; v1 submitted 21 January, 2020; originally announced January 2020.

Journal ref: International Conference on Agents and Artificial Intelligence (2022)

arXiv:2001.01793 [pdf, other]

Offline Contextual Bayesian Optimization for Nuclear Fusion

Authors: Youngseog Chung, Ian Char, Willie Neiswanger, Kirthevasan Kandasamy, Andrew Oakleigh Nelson, Mark D Boyer, Egemen Kolemen, Jeff Schneider

Abstract: Nuclear fusion is regarded as the energy of the future since it presents the possibility of unlimited clean energy. One obstacle in utilizing fusion as a feasible energy source is the stability of the reaction. Ideally, one would have a controller for the reactor that makes actions in response to the current state of the plasma in order to prolong the reaction as long as possible. In this work, we… ▽ More Nuclear fusion is regarded as the energy of the future since it presents the possibility of unlimited clean energy. One obstacle in utilizing fusion as a feasible energy source is the stability of the reaction. Ideally, one would have a controller for the reactor that makes actions in response to the current state of the plasma in order to prolong the reaction as long as possible. In this work, we make preliminary steps to learning such a controller. Since learning on a real world reactor is infeasible, we tackle this problem by attempting to learn optimal controls offline via a simulator, where the state of the plasma can be explicitly set. In particular, we introduce a theoretically grounded Bayesian optimization algorithm that recommends a state and action pair to evaluate at every iteration and show that this results in more efficient use of the simulator. △ Less

Submitted 6 January, 2020; originally announced January 2020.

Comments: 6 pages, 2 figures, Machine Learning and Physical Sciences workshop

arXiv:1912.06680 [pdf, other]

Dota 2 with Large Scale Deep Reinforcement Learning

Authors: OpenAI, :, Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław Dębiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, Rafal Józefowicz, Scott Gray, Catherine Olsson, Jakub Pachocki, Michael Petrov, Henrique P. d. O. Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Sidor, Ilya Sutskever, Jie Tang , et al. (2 additional authors not shown)

Abstract: On April 13th, 2019, OpenAI Five became the first AI system to defeat the world champions at an esports game. The game of Dota 2 presents novel challenges for AI systems such as long time horizons, imperfect information, and complex, continuous state-action spaces, all challenges which will become increasingly central to more capable AI systems. OpenAI Five leveraged existing reinforcement learnin… ▽ More On April 13th, 2019, OpenAI Five became the first AI system to defeat the world champions at an esports game. The game of Dota 2 presents novel challenges for AI systems such as long time horizons, imperfect information, and complex, continuous state-action spaces, all challenges which will become increasingly central to more capable AI systems. OpenAI Five leveraged existing reinforcement learning techniques, scaled to learn from batches of approximately 2 million frames every 2 seconds. We developed a distributed training system and tools for continual training which allowed us to train OpenAI Five for 10 months. By defeating the Dota 2 world champion (Team OG), OpenAI Five demonstrates that self-play reinforcement learning can achieve superhuman performance on a difficult task. △ Less

Submitted 13 December, 2019; originally announced December 2019.

arXiv:1912.03652 [pdf, other]

Human-to-AI Coach: Improving Human Inputs to AI Systems

Authors: Johannes Schneider

Abstract: Humans increasingly interact with Artificial intelligence(AI) systems. AI systems are optimized for objectives such as minimum computation or minimum error rate in recognizing and interpreting inputs from humans. In contrast, inputs created by humans are often treated as a given. We investigate how inputs of humans can be altered to reduce misinterpretation by the AI system and to improve efficien… ▽ More Humans increasingly interact with Artificial intelligence(AI) systems. AI systems are optimized for objectives such as minimum computation or minimum error rate in recognizing and interpreting inputs from humans. In contrast, inputs created by humans are often treated as a given. We investigate how inputs of humans can be altered to reduce misinterpretation by the AI system and to improve efficiency of input generation for the human while altered inputs should remain as similar as possible to the original inputs. These objectives result in trade-offs that are analyzed for a deep learning system classifying handwritten digits. To create examples that serve as demonstrations for humans to improve, we develop a model based on a conditional convolutional autoencoder (CCAE). Our quantitative and qualitative evaluation shows that in many occasions the generated proposals lead to lower error rates, require less effort to create and differ only modestly from the original samples. △ Less

Submitted 9 March, 2020; v1 submitted 8 December, 2019; originally announced December 2019.

Journal ref: Symposium on Intelligent Data Analysis 2020, Konstanz

arXiv:1910.07113 [pdf, other]

Solving Rubik's Cube with a Robot Hand

Authors: OpenAI, Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, Jonas Schneider, Nikolas Tezak, Jerry Tworek, Peter Welinder, Lilian Weng, Qiming Yuan, Wojciech Zaremba, Lei Zhang

Abstract: We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot. This is made possible by two key components: a novel algorithm, which we call automatic domain randomization (ADR) and a robot platform built for machine learning. ADR automatically generates a distribution over randomized environments of ever-increasing di… ▽ More We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot. This is made possible by two key components: a novel algorithm, which we call automatic domain randomization (ADR) and a robot platform built for machine learning. ADR automatically generates a distribution over randomized environments of ever-increasing difficulty. Control policies and vision state estimators trained with ADR exhibit vastly improved sim2real transfer. For control policies, memory-augmented models trained on an ADR-generated distribution of environments show clear signs of emergent meta-learning at test time. The combination of ADR with our custom robot platform allows us to solve a Rubik's cube with a humanoid robot hand, which involves both control and state estimation problems. Videos summarizing our results are available: https://openai.com/blog/solving-rubiks-cube/ △ Less

Submitted 15 October, 2019; originally announced October 2019.

arXiv:1909.02803 [pdf, other]

Personalization of Deep Learning

Authors: Johannes Schneider, Michail Vlachos

Abstract: We discuss training techniques, objectives and metrics toward personalization of deep learning models. In machine learning, personalization addresses the goal of a trained model to target a particular individual by optimizing one or more performance metrics, while conforming to certain constraints. To personalize, we investigate three methods of ``curriculum learning`` and two approaches for data… ▽ More We discuss training techniques, objectives and metrics toward personalization of deep learning models. In machine learning, personalization addresses the goal of a trained model to target a particular individual by optimizing one or more performance metrics, while conforming to certain constraints. To personalize, we investigate three methods of ``curriculum learning`` and two approaches for data grou**, i.e., augmenting the data of an individual by adding similar data identified with an auto-encoder. We show that both ``curriculuum learning'' and ``personalized'' data augmentation lead to improved performance on data of an individual. Mostly, this comes at the cost of reduced performance on a more general, broader dataset. △ Less

Submitted 9 March, 2020; v1 submitted 6 September, 2019; originally announced September 2019.

Journal ref: 3rd International Data Science Conference 2020, Austria

arXiv:1909.02414 [pdf, other]

Riemannian batch normalization for SPD neural networks

Authors: Daniel Brooks, Olivier Schwander, Frederic Barbaresco, Jean-Yves Schneider, Matthieu Cord

Abstract: Covariance matrices have attracted attention for machine learning applications due to their capacity to capture interesting structure in the data. The main challenge is that one needs to take into account the particular geometry of the Riemannian manifold of symmetric positive definite (SPD) matrices they belong to. In the context of deep networks, several architectures for these matrices have rec… ▽ More Covariance matrices have attracted attention for machine learning applications due to their capacity to capture interesting structure in the data. The main challenge is that one needs to take into account the particular geometry of the Riemannian manifold of symmetric positive definite (SPD) matrices they belong to. In the context of deep networks, several architectures for these matrices have recently been proposed. In our article, we introduce a Riemannian batch normalization (batchnorm) algorithm, which generalizes the one used in Euclidean nets. This novel layer makes use of geometric operations on the manifold, notably the Riemannian barycenter, parallel transport and non-linear structured matrix transformations. We derive a new manifold-constrained gradient descent algorithm working in the space of SPD matrices, allowing to learn the batchnorm layer. We validate our proposed approach with experiments in three different contexts on diverse data types: a drone recognition dataset from radar observations, and on emotion and action recognition datasets from video and motion capture data. Experiments show that the Riemannian batchnorm systematically gives better classification performance compared with leading methods and a remarkable robustness to lack of data. △ Less

Submitted 12 September, 2019; v1 submitted 3 September, 2019; originally announced September 2019.

Comments: Accepted to NeurIPS 2019

arXiv:1908.01425 [pdf, other]

ChemBO: Bayesian Optimization of Small Organic Molecules with Synthesizable Recommendations

Authors: Ksenia Korovina, Sailun Xu, Kirthevasan Kandasamy, Willie Neiswanger, Barnabas Poczos, Jeff Schneider, Eric P. Xing

Abstract: In applications such as molecule design or drug discovery, it is desirable to have an algorithm which recommends new candidate molecules based on the results of past tests. These molecules first need to be synthesized and then tested for objective properties. We describe ChemBO, a Bayesian optimization framework for generating and optimizing organic molecules for desired molecular properties. Whil… ▽ More In applications such as molecule design or drug discovery, it is desirable to have an algorithm which recommends new candidate molecules based on the results of past tests. These molecules first need to be synthesized and then tested for objective properties. We describe ChemBO, a Bayesian optimization framework for generating and optimizing organic molecules for desired molecular properties. While most existing data-driven methods for this problem do not account for sample efficiency or fail to enforce realistic constraints on synthesizability, our approach explores the synthesis graph in a sample-efficient way and produces synthesizable candidates. We implement ChemBO as a Gaussian process model and explore existing molecular kernels for it. Moreover, we propose a novel optimal-transport based distance and kernel that accounts for graphical information explicitly. In our experiments, we demonstrate the efficacy of the proposed approach on several molecular optimization problems. △ Less

Submitted 21 October, 2019; v1 submitted 4 August, 2019; originally announced August 2019.

arXiv:1908.00219 [pdf, other]

Deep Kinematic Models for Kinematically Feasible Vehicle Trajectory Predictions

Authors: Henggang Cui, Thi Nguyen, Fang-Chieh Chou, Tsung-Han Lin, Jeff Schneider, David Bradley, Nemanja Djuric

Abstract: Self-driving vehicles (SDVs) hold great potential for improving traffic safety and are poised to positively affect the quality of life of millions of people. To unlock this potential one of the critical aspects of the autonomous technology is understanding and predicting future movement of vehicles surrounding the SDV. This work presents a deep-learning-based method for kinematically feasible moti… ▽ More Self-driving vehicles (SDVs) hold great potential for improving traffic safety and are poised to positively affect the quality of life of millions of people. To unlock this potential one of the critical aspects of the autonomous technology is understanding and predicting future movement of vehicles surrounding the SDV. This work presents a deep-learning-based method for kinematically feasible motion prediction of such traffic actors. Previous work did not explicitly encode vehicle kinematics and instead relied on the models to learn the constraints directly from the data, potentially resulting in kinematically infeasible, suboptimal trajectory predictions. To address this issue we propose a method that seamlessly combines ideas from the AI with physically grounded vehicle motion models. In this way we employ best of the both worlds, coupling powerful learning models with strong feasibility guarantees for their outputs. The proposed approach is general, being applicable to any type of learning method. Extensive experiments using deep convnets on real-world data strongly indicate its benefits, outperforming the existing state-of-the-art. △ Less

Submitted 24 October, 2020; v1 submitted 1 August, 2019; originally announced August 2019.

Comments: Accepted for publication at IEEE International Conference on Robotics and Automation (ICRA) 2020

arXiv:1905.10661 [pdf, other]

Locality-Promoting Representation Learning

Authors: Johannes Schneider

Abstract: This work investigates fundamental questions related to learning features in convolutional neural networks (CNN). Empirical findings across multiple architectures such as VGG, ResNet, Inception, DenseNet and MobileNet indicate that weights near the center of a filter are larger than weights on the outside. Current regularization schemes violate this principle. Thus, we introduce Locality-promoting… ▽ More This work investigates fundamental questions related to learning features in convolutional neural networks (CNN). Empirical findings across multiple architectures such as VGG, ResNet, Inception, DenseNet and MobileNet indicate that weights near the center of a filter are larger than weights on the outside. Current regularization schemes violate this principle. Thus, we introduce Locality-promoting Regularization (LOCO-Reg), which yields accuracy gains across multiple architectures and datasets. We also show theoretically that the empirical finding is a consequence of maximizing feature cohesion under the assumption of spatial locality. △ Less

Submitted 29 March, 2021; v1 submitted 25 May, 2019; originally announced May 2019.

arXiv:1903.06694 [pdf, other]

Tuning Hyperparameters without Grad Students: Scalable and Robust Bayesian Optimisation with Dragonfly

Authors: Kirthevasan Kandasamy, Karun Raju Vysyaraju, Willie Neiswanger, Biswajit Paria, Christopher R. Collins, Jeff Schneider, Barnabas Poczos, Eric P. Xing

Abstract: Bayesian Optimisation (BO) refers to a suite of techniques for global optimisation of expensive black box functions, which use introspective Bayesian models of the function to efficiently search for the optimum. While BO has been applied successfully in many applications, modern optimisation tasks usher in new challenges where conventional methods fail spectacularly. In this work, we present Drago… ▽ More Bayesian Optimisation (BO) refers to a suite of techniques for global optimisation of expensive black box functions, which use introspective Bayesian models of the function to efficiently search for the optimum. While BO has been applied successfully in many applications, modern optimisation tasks usher in new challenges where conventional methods fail spectacularly. In this work, we present Dragonfly, an open source Python library for scalable and robust BO. Dragonfly incorporates multiple recently developed methods that allow BO to be applied in challenging real world settings; these include better methods for handling higher dimensional domains, methods for handling multi-fidelity evaluations when cheap approximations of an expensive function are available, methods for optimising over structured combinatorial spaces, such as the space of neural network architectures, and methods for handling parallel evaluations. Additionally, we develop new methodological improvements in BO for selecting the Bayesian model, selecting the acquisition function, and optimising over complex domains with different variable types and additional constraints. We compare Dragonfly to a suite of other packages and algorithms for global optimisation and demonstrate that when the above methods are integrated, they enable significant improvements in the performance of BO. The Dragonfly library is available at dragonfly.github.io. △ Less

Submitted 19 April, 2020; v1 submitted 15 March, 2019; originally announced March 2019.

Comments: Journal of Machine Learning Research 2020, Special Issue on Bayesian Optimization

arXiv:1901.11515 [pdf, other]

ProBO: Versatile Bayesian Optimization Using Any Probabilistic Programming Language

Authors: Willie Neiswanger, Kirthevasan Kandasamy, Barnabas Poczos, Jeff Schneider, Eric Xing

Abstract: Optimizing an expensive-to-query function is a common task in science and engineering, where it is beneficial to keep the number of queries to a minimum. A popular strategy is Bayesian optimization (BO), which leverages probabilistic models for this task. Most BO today uses Gaussian processes (GPs), or a few other surrogate models. However, there is a broad set of Bayesian modeling techniques that… ▽ More Optimizing an expensive-to-query function is a common task in science and engineering, where it is beneficial to keep the number of queries to a minimum. A popular strategy is Bayesian optimization (BO), which leverages probabilistic models for this task. Most BO today uses Gaussian processes (GPs), or a few other surrogate models. However, there is a broad set of Bayesian modeling techniques that could be used to capture complex systems and reduce the number of queries in BO. Probabilistic programming languages (PPLs) are modern tools that allow for flexible model definition, prior specification, model composition, and automatic inference. In this paper, we develop ProBO, a BO procedure that uses only standard operations common to most PPLs. This allows a user to drop in a model built with an arbitrary PPL and use it directly in BO. We describe acquisition functions for ProBO, and strategies for efficiently optimizing these functions given complex models or costly inference procedures. Using existing PPLs, we implement new models to aid in a few challenging optimization settings, and demonstrate these on model hyperparameter and architecture search tasks. △ Less

Submitted 4 July, 2019; v1 submitted 31 January, 2019; originally announced January 2019.

arXiv:1901.00770 [pdf]

Personalized explanation in machine learning: A conceptualization

Authors: Johanes Schneider, Joshua Handali

Abstract: Explanation in machine learning and related fields such as artificial intelligence aims at making machine learning models and their decisions understandable to humans. Existing work suggests that personalizing explanations might help to improve understandability. In this work, we derive a conceptualization of personalized explanation by defining and structuring the problem based on prior work on m… ▽ More Explanation in machine learning and related fields such as artificial intelligence aims at making machine learning models and their decisions understandable to humans. Existing work suggests that personalizing explanations might help to improve understandability. In this work, we derive a conceptualization of personalized explanation by defining and structuring the problem based on prior work on machine learning explanation, personalization (in machine learning) and concepts and techniques from other domains such as privacy and knowledge elicitation. We perform a categorization of explainee data used in the process of personalization as well as describing means to collect this data. We also identify three key explanation properties that are amendable to personalization: complexity, decision information and presentation. We also enhance existing work on explanation by introducing additional desiderata and measures to quantify the quality of personalized explanations. △ Less

Submitted 26 April, 2019; v1 submitted 3 January, 2019; originally announced January 2019.

Comments: Accepted at 27th European Conference on Information Systems (ECIS 2019), Stockholm-Uppsala, Sweden, June 2019

arXiv:1811.03577 [pdf, other]

doi 10.3847/1538-3881/aae9f4

Labeling Bias in Galaxy Morphologies

Authors: Guillermo Cabrera-Vives, Christopher J. Miller, Jeff Schneider

Abstract: We present a metric to quantify systematic labeling bias in galaxy morphology data sets stemming from the quality of the labeled data. This labeling bias is independent from labeling errors and requires knowledge about the intrinsic properties of the data with respect to the observed properties. We conduct a relative comparison of label bias for different low redshift galaxy morphology data sets.… ▽ More We present a metric to quantify systematic labeling bias in galaxy morphology data sets stemming from the quality of the labeled data. This labeling bias is independent from labeling errors and requires knowledge about the intrinsic properties of the data with respect to the observed properties. We conduct a relative comparison of label bias for different low redshift galaxy morphology data sets. We show our metric is able to recover previous de-biasing procedures based on redshift as biasing parameter. By using the image resolution instead, we find biases that have not been addressed. We find that the morphologies based on supervised machine-learning trained over features such as colors, shape, and concentration show significantly less bias than morphologies based on expert or citizen-science classifiers. This result holds even when there is underlying bias present in the training sets used in the supervised machine learning process. We use catalog simulations to validate our bias metric, and show how to bin the multidimensional intrinsic and observed galaxy properties used in the bias quantification. Our approach is designed to work on any other labeled multidimensional data sets and the code is publicly available. △ Less

Submitted 8 November, 2018; originally announced November 2018.

arXiv:1809.10732 [pdf, other]

Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks

Authors: Henggang Cui, Vladan Radosavljevic, Fang-Chieh Chou, Tsung-Han Lin, Thi Nguyen, Tzu-Kuo Huang, Jeff Schneider, Nemanja Djuric

Abstract: Autonomous driving presents one of the largest problems that the robotics and artificial intelligence communities are facing at the moment, both in terms of difficulty and potential societal impact. Self-driving vehicles (SDVs) are expected to prevent road accidents and save millions of lives while improving the livelihood and life quality of many more. However, despite large interest and a number… ▽ More Autonomous driving presents one of the largest problems that the robotics and artificial intelligence communities are facing at the moment, both in terms of difficulty and potential societal impact. Self-driving vehicles (SDVs) are expected to prevent road accidents and save millions of lives while improving the livelihood and life quality of many more. However, despite large interest and a number of industry players working in the autonomous domain, there still remains more to be done in order to develop a system capable of operating at a level comparable to best human drivers. One reason for this is high uncertainty of traffic behavior and large number of situations that an SDV may encounter on the roads, making it very difficult to create a fully generalizable system. To ensure safe and efficient operations, an autonomous vehicle is required to account for this uncertainty and to anticipate a multitude of possible behaviors of traffic actors in its surrounding. We address this critical problem and present a method to predict multiple possible trajectories of actors while also estimating their probabilities. The method encodes each actor's surrounding context into a raster image, used as input by deep convolutional networks to automatically derive relevant features for the task. Following extensive offline evaluation and comparison to state-of-the-art baselines, the method was successfully tested on SDVs in closed-course tests. △ Less

Submitted 1 March, 2019; v1 submitted 18 September, 2018; originally announced September 2018.

Comments: Accepted for publication at IEEE International Conference on Robotics and Automation (ICRA) 2019

arXiv:1809.09582 [pdf, other]

Contextual Bandits with Cross-learning

Authors: Santiago Balseiro, Negin Golrezaei, Mohammad Mahdian, Vahab Mirrokni, Jon Schneider

Abstract: In the classical contextual bandits problem, in each round $t$, a learner observes some context $c$, chooses some action $i$ to perform, and receives some reward $r_{i,t}(c)$. We consider the variant of this problem where in addition to receiving the reward $r_{i,t}(c)$, the learner also learns the values of $r_{i,t}(c')$ for some other contexts $c'$ in set $\mathcal{O}_i(c)$; i.e., the rewards th… ▽ More In the classical contextual bandits problem, in each round $t$, a learner observes some context $c$, chooses some action $i$ to perform, and receives some reward $r_{i,t}(c)$. We consider the variant of this problem where in addition to receiving the reward $r_{i,t}(c)$, the learner also learns the values of $r_{i,t}(c')$ for some other contexts $c'$ in set $\mathcal{O}_i(c)$; i.e., the rewards that would have been achieved by performing that action under different contexts $c'\in \mathcal{O}_i(c)$. This variant arises in several strategic settings, such as learning how to bid in non-truthful repeated auctions, which has gained a lot of attention lately as many platforms have switched to running first-price auctions. We call this problem the contextual bandits problem with cross-learning. The best algorithms for the classical contextual bandits problem achieve $\tilde{O}(\sqrt{CKT})$ regret against all stationary policies, where $C$ is the number of contexts, $K$ the number of actions, and $T$ the number of rounds. We design and analyze new algorithms for the contextual bandits problem with cross-learning and show that their regret has better dependence on the number of contexts. Under complete cross-learning where the rewards for all contexts are learned when choosing an action, i.e., set $\mathcal{O}_i(c)$ contains all contexts, we show that our algorithms achieve regret $\tilde{O}(\sqrt{KT})$, removing the dependence on $C$. For any other cases, i.e., under partial cross-learning where $|\mathcal{O}_i(c)|< C$ for some context-action pair of $(i,c)$, the regret bounds depend on how the sets $\mathcal O_i(c)$ impact the degree to which cross-learning between contexts is possible. We simulate our algorithms on real auction data from an ad exchange running first-price auctions and show that they outperform traditional contextual bandit algorithms. △ Less

Submitted 15 November, 2021; v1 submitted 25 September, 2018; originally announced September 2018.

Comments: 58 pages, 4 figures

arXiv:1808.05819 [pdf, other]

Uncertainty-aware Short-term Motion Prediction of Traffic Actors for Autonomous Driving

Authors: Nemanja Djuric, Vladan Radosavljevic, Henggang Cui, Thi Nguyen, Fang-Chieh Chou, Tsung-Han Lin, Nitin Singh, Jeff Schneider

Abstract: We address one of the crucial aspects necessary for safe and efficient operations of autonomous vehicles, namely predicting future state of traffic actors in the autonomous vehicle's surroundings. We introduce a deep learning-based approach that takes into account a current world state and produces raster images of each actor's vicinity. The rasters are then used as inputs to deep convolutional mo… ▽ More We address one of the crucial aspects necessary for safe and efficient operations of autonomous vehicles, namely predicting future state of traffic actors in the autonomous vehicle's surroundings. We introduce a deep learning-based approach that takes into account a current world state and produces raster images of each actor's vicinity. The rasters are then used as inputs to deep convolutional models to infer future movement of actors while also accounting for and capturing inherent uncertainty of the prediction task. Extensive experiments on real-world data strongly suggest benefits of the proposed approach. Moreover, following completion of the offline tests the system was successfully tested onboard self-driving vehicles. △ Less

Submitted 4 March, 2020; v1 submitted 17 August, 2018; originally announced August 2018.

Comments: Accepted for publication at IEEE Winter Conference on Applications of Computer Vision (WACV) 2020

arXiv:1808.00177 [pdf, other]

Learning Dexterous In-Hand Manipulation

Authors: OpenAI, Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, Jonas Schneider, Szymon Sidor, Josh Tobin, Peter Welinder, Lilian Weng, Wojciech Zaremba

Abstract: We use reinforcement learning (RL) to learn dexterous in-hand manipulation policies which can perform vision-based object reorientation on a physical Shadow Dexterous Hand. The training is performed in a simulated environment in which we randomize many of the physical properties of the system like friction coefficients and an object's appearance. Our policies transfer to the physical robot despite… ▽ More We use reinforcement learning (RL) to learn dexterous in-hand manipulation policies which can perform vision-based object reorientation on a physical Shadow Dexterous Hand. The training is performed in a simulated environment in which we randomize many of the physical properties of the system like friction coefficients and an object's appearance. Our policies transfer to the physical robot despite being trained entirely in simulation. Our method does not rely on any human demonstrations, but many behaviors found in human manipulation emerge naturally, including finger gaiting, multi-finger coordination, and the controlled use of gravity. Our results were obtained using the same distributed RL system that was used to train OpenAI Five. We also include a video of our results: https://youtu.be/jwSbzNHGflM △ Less

Submitted 18 January, 2019; v1 submitted 1 August, 2018; originally announced August 2018.

Comments: Making OpenAI the first author. We wish this paper to be cited as "Learning Dexterous In-Hand Manipulation" by OpenAI et al. We are replicating the approach from the physics community: arXiv:1812.06489

arXiv:1805.09964 [pdf, ps, other]

Myopic Bayesian Design of Experiments via Posterior Sampling and Probabilistic Programming

Authors: Kirthevasan Kandasamy, Willie Neiswanger, Reed Zhang, Akshay Krishnamurthy, Jeff Schneider, Barnabas Poczos

Abstract: We design a new myopic strategy for a wide class of sequential design of experiment (DOE) problems, where the goal is to collect data in order to to fulfil a certain problem specific goal. Our approach, Myopic Posterior Sampling (MPS), is inspired by the classical posterior (Thompson) sampling algorithm for multi-armed bandits and leverages the flexibility of probabilistic programming and approxim… ▽ More We design a new myopic strategy for a wide class of sequential design of experiment (DOE) problems, where the goal is to collect data in order to to fulfil a certain problem specific goal. Our approach, Myopic Posterior Sampling (MPS), is inspired by the classical posterior (Thompson) sampling algorithm for multi-armed bandits and leverages the flexibility of probabilistic programming and approximate Bayesian inference to address a broad set of problems. Empirically, this general-purpose strategy is competitive with more specialised methods in a wide array of DOE tasks, and more importantly, enables addressing complex DOE goals where no existing method seems applicable. On the theoretical side, we leverage ideas from adaptive submodularity and reinforcement learning to derive conditions under which MPS achieves sublinear regret against natural benchmark policies. △ Less

Submitted 24 May, 2018; originally announced May 2018.

arXiv:1802.07191 [pdf, other]

Neural Architecture Search with Bayesian Optimisation and Optimal Transport

Authors: Kirthevasan Kandasamy, Willie Neiswanger, Jeff Schneider, Barnabas Poczos, Eric Xing

Abstract: Bayesian Optimisation (BO) refers to a class of methods for global optimisation of a function $f$ which is only accessible via point evaluations. It is typically used in settings where $f$ is expensive to evaluate. A common use case for BO in machine learning is model selection, where it is not possible to analytically model the generalisation performance of a statistical model, and we resort to n… ▽ More Bayesian Optimisation (BO) refers to a class of methods for global optimisation of a function $f$ which is only accessible via point evaluations. It is typically used in settings where $f$ is expensive to evaluate. A common use case for BO in machine learning is model selection, where it is not possible to analytically model the generalisation performance of a statistical model, and we resort to noisy and expensive training and validation procedures to choose the best model. Conventional BO methods have focused on Euclidean and categorical domains, which, in the context of model selection, only permits tuning scalar hyper-parameters of machine learning algorithms. However, with the surge of interest in deep learning, there is an increasing demand to tune neural network \emph{architectures}. In this work, we develop NASBOT, a Gaussian process based BO framework for neural architecture search. To accomplish this, we develop a distance metric in the space of neural network architectures which can be computed efficiently via an optimal transport program. This distance might be of independent interest to the deep learning community as it may find applications outside of BO. We demonstrate that NASBOT outperforms other alternatives for architecture search in several cross validation based model selection tasks on multi-layer perceptrons and convolutional neural networks. △ Less

Submitted 15 March, 2019; v1 submitted 11 February, 2018; originally announced February 2018.

Journal ref: Neural Information Processing Systems (NeurIPS) 2018

Showing 1–50 of 83 results for author: Schneider, J