Search | arXiv e-print repository

When is Offline Policy Selection Sample Efficient for Reinforcement Learning?

Authors: Vincent Liu, Prabhat Nagarajan, Andrew Patterson, Martha White

Abstract: Offline reinforcement learning algorithms often require careful hyperparameter tuning. Consequently, before deployment, we need to select amongst a set of candidate policies. As yet, however, there is little understanding about the fundamental limits of this offline policy selection (OPS) problem. In this work we aim to provide clarity on when sample efficient OPS is possible, primarily by connect… ▽ More Offline reinforcement learning algorithms often require careful hyperparameter tuning. Consequently, before deployment, we need to select amongst a set of candidate policies. As yet, however, there is little understanding about the fundamental limits of this offline policy selection (OPS) problem. In this work we aim to provide clarity on when sample efficient OPS is possible, primarily by connecting OPS to off-policy policy evaluation (OPE) and Bellman error (BE) estimation. We first show a hardness result, that in the worst case, OPS is just as hard as OPE, by proving a reduction of OPE to OPS. As a result, no OPS method can be more sample efficient than OPE in the worst case. We then propose a BE method for OPS, called Identifiable BE Selection (IBES), that has a straightforward method for selecting its own hyperparameters. We highlight that using IBES for OPS generally has more requirements than OPE methods, but if satisfied, can be more sample efficient. We conclude with an empirical study comparing OPE and IBES, and by showing the difficulty of OPS on an offline Atari benchmark dataset. △ Less

Submitted 4 December, 2023; originally announced December 2023.

arXiv:2304.01315 [pdf, other]

Empirical Design in Reinforcement Learning

Authors: Andrew Patterson, Samuel Neumann, Martha White, Adam White

Abstract: Empirical design in reinforcement learning is no small task. Running good experiments requires attention to detail and at times significant computational resources. While compute resources available per dollar have continued to grow rapidly, so have the scale of typical experiments in reinforcement learning. It is now common to benchmark agents with millions of parameters against dozens of tasks,… ▽ More Empirical design in reinforcement learning is no small task. Running good experiments requires attention to detail and at times significant computational resources. While compute resources available per dollar have continued to grow rapidly, so have the scale of typical experiments in reinforcement learning. It is now common to benchmark agents with millions of parameters against dozens of tasks, each using the equivalent of 30 days of experience. The scale of these experiments often conflict with the need for proper statistical evidence, especially when comparing algorithms. Recent studies have highlighted how popular algorithms are sensitive to hyper-parameter settings and implementation details, and that common empirical practice leads to weak statistical evidence (Machado et al., 2018; Henderson et al., 2018). Here we take this one step further. This manuscript represents both a call to action, and a comprehensive resource for how to do good experiments in reinforcement learning. In particular, we cover: the statistical assumptions underlying common performance measures, how to properly characterize performance variation and stability, hypothesis testing, special considerations for comparing multiple agents, baseline and illustrative example construction, and how to deal with hyper-parameters and experimenter bias. Throughout we highlight common mistakes found in the literature and the statistical consequences of those in example experiments. The objective of this document is to provide answers on how we can use our unprecedented compute to do good science in reinforcement learning, as well as stay alert to potential pitfalls in our empirical design. △ Less

Submitted 3 April, 2023; originally announced April 2023.

Comments: In submission to JMLR

arXiv:2205.08464 [pdf, other]

Robust Losses for Learning Value Functions

Authors: Andrew Patterson, Victor Liao, Martha White

Abstract: Most value function learning algorithms in reinforcement learning are based on the mean squared (projected) Bellman error. However, squared errors are known to be sensitive to outliers, both skewing the solution of the objective and resulting in high-magnitude and high-variance gradients. To control these high-magnitude updates, typical strategies in RL involve clip** gradients, clip** rewards… ▽ More Most value function learning algorithms in reinforcement learning are based on the mean squared (projected) Bellman error. However, squared errors are known to be sensitive to outliers, both skewing the solution of the objective and resulting in high-magnitude and high-variance gradients. To control these high-magnitude updates, typical strategies in RL involve clip** gradients, clip** rewards, rescaling rewards, or clip** errors. While these strategies appear to be related to robust losses -- like the Huber loss -- they are built on semi-gradient update rules which do not minimize a known loss. In this work, we build on recent insights reformulating squared Bellman errors as a saddlepoint optimization problem and propose a saddlepoint reformulation for a Huber Bellman error and Absolute Bellman error. We start from a formalization of robust losses, then derive sound gradient-based approaches to minimize these losses in both the online off-policy prediction and control settings. We characterize the solutions of the robust losses, providing insight into the problem settings where the robust losses define notably better solutions than the mean squared Bellman error. Finally, we show that the resulting gradient-based algorithms are more stable, for both prediction and control, with less sensitivity to meta-parameters. △ Less

Submitted 17 April, 2023; v1 submitted 17 May, 2022; originally announced May 2022.

Comments: IEEE Transactions on Pattern Analysis and Machine Intelligence (2022)

arXiv:2202.02396 [pdf, other]

A Temporal-Difference Approach to Policy Gradient Estimation

Authors: Samuele Tosatto, Andrew Patterson, Martha White, A. Rupam Mahmood

Abstract: The policy gradient theorem (Sutton et al., 2000) prescribes the usage of a cumulative discounted state distribution under the target policy to approximate the gradient. Most algorithms based on this theorem, in practice, break this assumption, introducing a distribution shift that can cause the convergence to poor solutions. In this paper, we propose a new approach of reconstructing the policy gr… ▽ More The policy gradient theorem (Sutton et al., 2000) prescribes the usage of a cumulative discounted state distribution under the target policy to approximate the gradient. Most algorithms based on this theorem, in practice, break this assumption, introducing a distribution shift that can cause the convergence to poor solutions. In this paper, we propose a new approach of reconstructing the policy gradient from the start state without requiring a particular sampling strategy. The policy gradient calculation in this form can be simplified in terms of a gradient critic, which can be recursively estimated due to a new Bellman equation of gradients. By using temporal-difference updates of the gradient critic from an off-policy data stream, we develop the first estimator that sidesteps the distribution shift issue in a model-free way. We prove that, under certain realizability conditions, our estimator is unbiased regardless of the sampling strategy. We empirically show that our technique achieves a superior bias-variance trade-off and performance in presence of off-policy samples. △ Less

Submitted 7 July, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

arXiv:2104.13844 [pdf, other]

A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning

Authors: Andrew Patterson, Adam White, Martha White

Abstract: Many reinforcement learning algorithms rely on value estimation, however, the most widely used algorithms -- namely temporal difference algorithms -- can diverge under both off-policy sampling and nonlinear function approximation. Many algorithms have been developed for off-policy value estimation based on the linear mean squared projected Bellman error (MSPBE) and are sound under linear function… ▽ More Many reinforcement learning algorithms rely on value estimation, however, the most widely used algorithms -- namely temporal difference algorithms -- can diverge under both off-policy sampling and nonlinear function approximation. Many algorithms have been developed for off-policy value estimation based on the linear mean squared projected Bellman error (MSPBE) and are sound under linear function approximation. Extending these methods to the nonlinear case has been largely unsuccessful. Recently, several methods have been introduced that approximate a different objective -- the mean-squared Bellman error (MSBE) -- which naturally facilitate nonlinear approximation. In this work, we build on these insights and introduce a new generalized MSPBE that extends the linear MSPBE to the nonlinear setting. We show how this generalized objective unifies previous work and obtain new bounds for the value error of the solutions of the generalized objective. We derive an easy-to-use, but sound, algorithm to minimize the generalized objective, and show that it is more stable across runs, is less sensitive to hyperparameters, and performs favorably across four control domains with neural network function approximation. △ Less

Submitted 28 March, 2022; v1 submitted 28 April, 2021; originally announced April 2021.

Comments: Accepted for publication in JMLR 2022

arXiv:2009.03864 [pdf, other]

Contraction $\mathcal{L}_1$-Adaptive Control using Gaussian Processes

Authors: Aditya Gahlawat, Arun Lakshmanan, Lin Song, Andrew Patterson, Zhuohuan Wu, Naira Hovakimyan, Evangelos Theodorou

Abstract: We present $\mathcal{CL}_1$-$\mathcal{GP}$, a control framework that enables safe simultaneous learning and control for systems subject to uncertainties. The two main constituents are contraction theory-based $\mathcal{L}_1$ ($\mathcal{CL}_1$) control and Bayesian learning in the form of Gaussian process (GP) regression. The $\mathcal{CL}_1$ controller ensures that control objectives are met while… ▽ More We present $\mathcal{CL}_1$-$\mathcal{GP}$, a control framework that enables safe simultaneous learning and control for systems subject to uncertainties. The two main constituents are contraction theory-based $\mathcal{L}_1$ ($\mathcal{CL}_1$) control and Bayesian learning in the form of Gaussian process (GP) regression. The $\mathcal{CL}_1$ controller ensures that control objectives are met while providing safety certificates. Furthermore, $\mathcal{CL}_1$-$\mathcal{GP}$ incorporates any available data into a GP model of uncertainties, which improves performance and enables the motion planner to achieve optimality safely. This way, the safe operation of the system is always guaranteed, even during the learning transients. We provide a few illustrative examples for the safe learning and control of planar quadrotor systems in a variety of environments. △ Less

Submitted 30 November, 2021; v1 submitted 8 September, 2020; originally announced September 2020.

Comments: Submitted to Learning for Dynamics and Control (L4DC) Conference, 2021

arXiv:2007.00611 [pdf, other]

Gradient Temporal-Difference Learning with Regularized Corrections

Authors: Sina Ghiassian, Andrew Patterson, Shivam Garg, Dhawal Gupta, Adam White, Martha White

Abstract: It is still common to use Q-learning and temporal difference (TD) learning-even though they have divergence issues and sound Gradient TD alternatives exist-because divergence seems rare and they typically perform well. However, recent work with large neural network learning systems reveals that instability is more common than previously thought. Practitioners face a difficult dilemma: choose an ea… ▽ More It is still common to use Q-learning and temporal difference (TD) learning-even though they have divergence issues and sound Gradient TD alternatives exist-because divergence seems rare and they typically perform well. However, recent work with large neural network learning systems reveals that instability is more common than previously thought. Practitioners face a difficult dilemma: choose an easy to use and performant TD method, or a more complex algorithm that is more sound but harder to tune and all but unexplored with non-linear function approximation or control. In this paper, we introduce a new method called TD with Regularized Corrections (TDRC), that attempts to balance ease of use, soundness, and performance. It behaves as well as TD, when TD performs well, but is sound in cases where TD diverges. We empirically investigate TDRC across a range of problems, for both prediction and control, and for both linear and non-linear function approximation, and show, potentially for the first time, that gradient TD methods could be a better alternative to TD and Q-learning. △ Less

Submitted 17 September, 2020; v1 submitted 1 July, 2020; originally announced July 2020.

Comments: Appeared in Proceedings of the 37th International Conference on Machine Learning (ICML2020)

arXiv:2002.01965 [pdf, other]

Learning Probabilistic Intersection Traffic Models for Trajectory Prediction

Authors: Andrew Patterson, Aditya Gahlawat, Naira Hovakimyan

Abstract: Autonomous agents must be able to safely interact with other vehicles to integrate into urban environments. The safety of these agents is dependent on their ability to predict collisions with other vehicles' future trajectories for replanning and collision avoidance. The information needed to predict collisions can be learned from previously observed vehicle trajectories in a specific environment,… ▽ More Autonomous agents must be able to safely interact with other vehicles to integrate into urban environments. The safety of these agents is dependent on their ability to predict collisions with other vehicles' future trajectories for replanning and collision avoidance. The information needed to predict collisions can be learned from previously observed vehicle trajectories in a specific environment, generating a traffic model. The learned traffic model can then be incorporated as prior knowledge into any trajectory estimation method being used in this environment. This work presents a Gaussian process based probabilistic traffic model that is used to quantify vehicle behaviors in an intersection. The Gaussian process model provides estimates for the average vehicle trajectory, while also capturing the variance between the different paths a vehicle may take in the intersection. The method is demonstrated on a set of time-series position trajectories. These trajectories are reconstructed by removing object recognition errors and missed frames that may occur due to data source processing. To create the intersection traffic model, the reconstructed trajectories are clustered based on their source and destination lanes. For each cluster, a Gaussian process model is created to capture the average behavior and the variance of the cluster. To show the applicability of the Gaussian model, the test trajectories are classified with only partial observations. Performance is quantified by the number of observations required to correctly classify the vehicle trajectory. Both the intersection traffic modeling computations and the classification procedure are timed. These times are presented as results and demonstrate that the model can be constructed in a reasonable amount of time and the classification procedure can be used for online applications. △ Less

Submitted 5 February, 2020; originally announced February 2020.

arXiv:1907.09558 [pdf, other]

System-Level Development of a User-Integrated Semi-Autonomous Lawn Mowing System: Problem Overview, Basic Requirements, and Proposed Architecture

Authors: Albert E. Patterson, Yang Yuan, William R. Norris

Abstract: This concept paper outlines some recent efforts toward the design and development of user-integrated semi-autonomous home-sized lawn mowing systems from a systems engineering perspective. This is an important and emerging field of study within the robotics and systems engineering communities. The work presented includes a review of current progress on this problem, a discussion of the problem from… ▽ More This concept paper outlines some recent efforts toward the design and development of user-integrated semi-autonomous home-sized lawn mowing systems from a systems engineering perspective. This is an important and emerging field of study within the robotics and systems engineering communities. The work presented includes a review of current progress on this problem, a discussion of the problem from a systems engineering perspective, a general system architecture developed by the authors, and a preliminary set of design requirements. This work is meant to provide a baseline and motivation for the further development and refinement of these systems within the systems engineering and robotics communities and is relevant to both academic and commercial research. △ Less

Submitted 12 July, 2019; originally announced July 2019.

Comments: 11 pages, 8 figures, and 32 references

arXiv:1904.02765 [pdf, other]

Intent-Aware Probabilistic Trajectory Estimation for Collision Prediction with Uncertainty Quantification

Authors: Andrew Patterson, Arun Lakshmanan, Naira Hovakimyan

Abstract: Collision prediction in a dynamic and unknown environment relies on knowledge of how the environment is changing. Many collision prediction methods rely on deterministic knowledge of how obstacles are moving in the environment. However, complete deterministic knowledge of the obstacles' motion is often unavailable. This work proposes a Gaussian process based prediction method that replaces the ass… ▽ More Collision prediction in a dynamic and unknown environment relies on knowledge of how the environment is changing. Many collision prediction methods rely on deterministic knowledge of how obstacles are moving in the environment. However, complete deterministic knowledge of the obstacles' motion is often unavailable. This work proposes a Gaussian process based prediction method that replaces the assumption of deterministic knowledge of each obstacle's future behavior with probabilistic knowledge, to allow a larger class of obstacles to be considered. The method solely relies on position and velocity measurements to predict collisions with dynamic obstacles. We show that the uncertainty region for obstacle positions can be expressed in terms of a combination of polynomials generated with Gaussian process regression. To control the growth of uncertainty over arbitrary time horizons, a probabilistic obstacle intention is assumed as a distribution over obstacle positions and velocities, which can be naturally included in the Gaussian process framework. Our approach is demonstrated in two case studies in which (i), an obstacle overtakes the agent and (ii), an obstacle crosses the agent's path perpendicularly. In these simulations we show that the collision can be predicted despite having limited knowledge of the obstacle's behavior. △ Less

Submitted 4 April, 2019; originally announced April 2019.

arXiv:1902.05027 [pdf, other]

doi 10.15607/RSS.2019.XV.042

Proximity Queries for Absolutely Continuous Parametric Curves

Authors: Arun Lakshmanan, Andrew Patterson, Venanzio Cichella, Naira Hovakimyan

Abstract: In motion planning problems for autonomous robots, such as self-driving cars, the robot must ensure that its planned path is not in close proximity to obstacles in the environment. However, the problem of evaluating the proximity is generally non-convex and serves as a significant computational bottleneck for motion planning algorithms. In this paper, we present methods for a general class of abso… ▽ More In motion planning problems for autonomous robots, such as self-driving cars, the robot must ensure that its planned path is not in close proximity to obstacles in the environment. However, the problem of evaluating the proximity is generally non-convex and serves as a significant computational bottleneck for motion planning algorithms. In this paper, we present methods for a general class of absolutely continuous parametric curves to compute: (i) the minimum separating distance, (ii) tolerance verification, and (iii) collision detection. Our methods efficiently compute bounds on obstacle proximity by bounding the curve in a convex region. This bound is based on an upper bound on the curve arc length that can be expressed in closed form for a useful class of parametric curves including curves with trigonometric or polynomial bases. We demonstrate the computational efficiency and accuracy of our approach through numerical simulations of several proximity problems. △ Less

Submitted 19 June, 2019; v1 submitted 13 February, 2019; originally announced February 2019.

Comments: Proceedings of Robotics: Science and Systems

arXiv:1902.03383 [pdf, ps, other]

Cloud Programming Simplified: A Berkeley View on Serverless Computing

Authors: Eric Jonas, Johann Schleier-Smith, Vikram Sreekanti, Chia-Che Tsai, Anurag Khandelwal, Qifan Pu, Vaishaal Shankar, Joao Carreira, Karl Krauth, Neeraja Yadwadkar, Joseph E. Gonzalez, Raluca Ada Popa, Ion Stoica, David A. Patterson

Abstract: Serverless cloud computing handles virtually all the system administration operations needed to make it easier for programmers to use the cloud. It provides an interface that greatly simplifies cloud programming, and represents an evolution that parallels the transition from assembly language to high-level programming languages. This paper gives a quick history of cloud computing, including an acc… ▽ More Serverless cloud computing handles virtually all the system administration operations needed to make it easier for programmers to use the cloud. It provides an interface that greatly simplifies cloud programming, and represents an evolution that parallels the transition from assembly language to high-level programming languages. This paper gives a quick history of cloud computing, including an accounting of the predictions of the 2009 Berkeley View of Cloud Computing paper, explains the motivation for serverless computing, describes applications that stretch the current limits of serverless, and then lists obstacles and research opportunities required for serverless computing to fulfill its full potential. Just as the 2009 paper identified challenges for the cloud and predicted they would be addressed and that cloud use would accelerate, we predict these issues are solvable and that serverless computing will grow to dominate the future of cloud computing. △ Less

Submitted 9 February, 2019; originally announced February 2019.

arXiv:1811.02597 [pdf, other]

Online Off-policy Prediction

Authors: Sina Ghiassian, Andrew Patterson, Martha White, Richard S. Sutton, Adam White

Abstract: This paper investigates the problem of online prediction learning, where learning proceeds continuously as the agent interacts with an environment. The predictions made by the agent are contingent on a particular way of behaving, represented as a value function. However, the behavior used to select actions and generate the behavior data might be different from the one used to define the prediction… ▽ More This paper investigates the problem of online prediction learning, where learning proceeds continuously as the agent interacts with an environment. The predictions made by the agent are contingent on a particular way of behaving, represented as a value function. However, the behavior used to select actions and generate the behavior data might be different from the one used to define the predictions, and thus the samples are generated off-policy. The ability to learn behavior-contingent predictions online and off-policy has long been advocated as a key capability of predictive-knowledge learning systems but remained an open algorithmic challenge for decades. The issue lies with the temporal difference (TD) learning update at the heart of most prediction algorithms: combining bootstrap**, off-policy sampling and function approximation may cause the value estimate to diverge. A breakthrough came with the development of a new objective function that admitted stochastic gradient descent variants of TD. Since then, many sound online off-policy prediction algorithms have been developed, but there has been limited empirical work investigating the relative merits of all the variants. This paper aims to fill these empirical gaps and provide clarity on the key ideas behind each method. We summarize the large body of literature on off-policy learning, focusing on 1- methods that use computation linear in the number of features and are convergent under off-policy sampling, and 2- other methods which have proven useful with non-fixed, nonlinear function approximation. We provide an empirical study of off-policy prediction methods in two challenging microworlds. We report each method's parameter sensitivity, empirical convergence rate, and final performance, providing new insights that should enable practitioners to successfully extend these new methods to large-scale applications.[Abridged abstract] △ Less

Submitted 6 November, 2018; originally announced November 2018.

Comments: 68 pages

arXiv:1807.06763 [pdf, other]

doi 10.1613/jair.1.12105

General Value Function Networks

Authors: Matthew Schlegel, Andrew Jacobsen, Zaheer Abbas, Andrew Patterson, Adam White, Martha White

Abstract: State construction is important for learning in partially observable environments. A general purpose strategy for state construction is to learn the state update using a Recurrent Neural Network (RNN), which updates the internal state using the current internal state and the most recent observation. This internal state provides a summary of the observed sequence, to facilitate accurate predictions… ▽ More State construction is important for learning in partially observable environments. A general purpose strategy for state construction is to learn the state update using a Recurrent Neural Network (RNN), which updates the internal state using the current internal state and the most recent observation. This internal state provides a summary of the observed sequence, to facilitate accurate predictions and decision-making. At the same time, specifying and training RNNs is notoriously tricky, particularly as the common strategy to approximate gradients back in time, called truncated Back-prop Through Time (BPTT), can be sensitive to the truncation window. Further, domain-expertise--which can usually help constrain the function class and so improve trainability--can be difficult to incorporate into complex recurrent units used within RNNs. In this work, we explore how to use multi-step predictions to constrain the RNN and incorporate prior knowledge. In particular, we revisit the idea of using predictions to construct state and ask: does constraining (parts of) the state to consist of predictions about the future improve RNN trainability? We formulate a novel RNN architecture, called a General Value Function Network (GVFN), where each internal state component corresponds to a prediction about the future represented as a value function. We first provide an objective for optimizing GVFNs, and derive several algorithms to optimize this objective. We then show that GVFNs are more robust to the truncation level, in many cases only requiring one-step gradient updates. △ Less

Submitted 2 February, 2021; v1 submitted 17 July, 2018; originally announced July 2018.

Comments: Published in the Journal of Artificial Intelligence Research

Journal ref: Journal of Artificial Intelligence Research, 70, 497-543 (2021)

arXiv:1806.04624 [pdf, other]

Organizing Experience: A Deeper Look at Replay Mechanisms for Sample-based Planning in Continuous State Domains

Authors: Yangchen Pan, Muhammad Zaheer, Adam White, Andrew Patterson, Martha White

Abstract: Model-based strategies for control are critical to obtain sample efficient learning. Dyna is a planning paradigm that naturally interleaves learning and planning, by simulating one-step experience to update the action-value function. This elegant planning strategy has been mostly explored in the tabular setting. The aim of this paper is to revisit sample-based planning, in stochastic and continuou… ▽ More Model-based strategies for control are critical to obtain sample efficient learning. Dyna is a planning paradigm that naturally interleaves learning and planning, by simulating one-step experience to update the action-value function. This elegant planning strategy has been mostly explored in the tabular setting. The aim of this paper is to revisit sample-based planning, in stochastic and continuous domains with learned models. We first highlight the flexibility afforded by a model over Experience Replay (ER). Replay-based methods can be seen as stochastic planning methods that repeatedly sample from a buffer of recent agent-environment interactions and perform updates to improve data efficiency. We show that a model, as opposed to a replay buffer, is particularly useful for specifying which states to sample from during planning, such as predecessor states that propagate information in reverse from a state more quickly. We introduce a semi-parametric model learning approach, called Reweighted Experience Models (REMs), that makes it simple to sample next states or predecessors. We demonstrate that REM-Dyna exhibits similar advantages over replay-based methods in learning in continuous state problems, and that the performance gap grows when moving to stochastic domains, of increasing size. △ Less

Submitted 12 June, 2018; originally announced June 2018.

Comments: IJCAI 2018

arXiv:1801.10588 [pdf, other]

Percolation for D2D Networks on Street Systems

Authors: Elie Cali, Nila Novita Gafur, Christian Hirsch, Benedikt Jahnel, Taoufik En-Najjary, Robert I. A. Patterson

Abstract: We study fundamental characteristics for the connectivity of multi-hop D2D networks. Devices are randomly distributed on street systems and are able to communicate with each other whenever their separation is smaller than some connectivity threshold. We model the street systems as Poisson-Voronoi or Poisson-Delaunay tessellations with varying street lengths. We interpret the existence of adequate… ▽ More We study fundamental characteristics for the connectivity of multi-hop D2D networks. Devices are randomly distributed on street systems and are able to communicate with each other whenever their separation is smaller than some connectivity threshold. We model the street systems as Poisson-Voronoi or Poisson-Delaunay tessellations with varying street lengths. We interpret the existence of adequate D2D connectivity as percolation of the underlying random graph. We derive and compare approximations for the critical device-intensity for percolation, the percolation probability and the graph distance. Our results show that for urban areas, the Poisson Boolean Model gives a very good approximation, while for rural areas, the percolation probability stays far from 1 even far above the percolation threshold. △ Less

Submitted 31 January, 2018; originally announced January 2018.

Comments: 6 pages, 7 figures, 1 table

arXiv:1607.02318 [pdf, other]

The Renewed Case for the Reduced Instruction Set Computer: Avoiding ISA Bloat with Macro-Op Fusion for RISC-V

Authors: Christopher Celio, Palmer Dabbelt, David A. Patterson, Krste Asanović

Abstract: This report makes the case that a well-designed Reduced Instruction Set Computer (RISC) can match, and even exceed, the performance and code density of existing commercial Complex Instruction Set Computers (CISC) while maintaining the simplicity and cost-effectiveness that underpins the original RISC goals. We begin by comparing the dynamic instruction counts and dynamic instruction bytes fetche… ▽ More This report makes the case that a well-designed Reduced Instruction Set Computer (RISC) can match, and even exceed, the performance and code density of existing commercial Complex Instruction Set Computers (CISC) while maintaining the simplicity and cost-effectiveness that underpins the original RISC goals. We begin by comparing the dynamic instruction counts and dynamic instruction bytes fetched for the popular proprietary ARMv7, ARMv8, IA-32, and x86-64 Instruction Set Architectures (ISAs) against the free and open RISC-V RV64G and RV64GC ISAs when running the SPEC CINT2006 benchmark suite. RISC-V was designed as a very small ISA to support a wide range of implementations, and has a less mature compiler toolchain. However, we observe that on SPEC CINT2006 RV64G executes on average 16% more instructions than x86-64, 3% more instructions than IA-32, 9% more instructions than ARMv8, but 4% fewer instructions than ARMv7. CISC x86 implementations break up complex instructions into smaller internal RISC-like micro-ops, and the RV64G instruction count is within 2% of the x86-64 retired micro-op count. RV64GC, the compressed variant of RV64G, is the densest ISA studied, fetching 8% fewer dynamic instruction bytes than x86-64. We observed that much of the increased RISC-V instruction count is due to a small set of common multi-instruction idioms. Exploiting this fact, the RV64G and RV64GC effective instruction count can be reduced by 5.4% on average by leveraging macro-op fusion. Combining the compressed RISC-V ISA extension with macro-op fusion provides both the densest ISA and the fewest dynamic operations retired per program, reducing the motivation to add more instructions to the ISA. This approach retains a single simple ISA suitable for both low-end and high-end implementations, where high-end implementations can boost performance through microarchitectural techniques. △ Less

Submitted 8 July, 2016; originally announced July 2016.

Report number: UCB/EECS-2016-130

arXiv:1507.03325 [pdf, other]

Scientific Computing Meets Big Data Technology: An Astronomy Use Case

Authors: Zhao Zhang, Kyle Barbary, Frank Austin Nothaft, Evan Sparks, Oliver Zahn, Michael J. Franklin, David A. Patterson, Saul Perlmutter

Abstract: Scientific analyses commonly compose multiple single-process programs into a dataflow. An end-to-end dataflow of single-process programs is known as a many-task application. Typically, tools from the HPC software stack are used to parallelize these analyses. In this work, we investigate an alternate approach that uses Apache Spark -- a modern big data platform -- to parallelize many-task applicati… ▽ More Scientific analyses commonly compose multiple single-process programs into a dataflow. An end-to-end dataflow of single-process programs is known as a many-task application. Typically, tools from the HPC software stack are used to parallelize these analyses. In this work, we investigate an alternate approach that uses Apache Spark -- a modern big data platform -- to parallelize many-task applications. We present Kira, a flexible and distributed astronomy image processing toolkit using Apache Spark. We then use the Kira toolkit to implement a Source Extractor application for astronomy images, called Kira SE. With Kira SE as the use case, we study the programming flexibility, dataflow richness, scheduling capacity and performance of Apache Spark running on the EC2 cloud. By exploiting data locality, Kira SE achieves a 2.5x speedup over an equivalent C program when analyzing a 1TB dataset using 512 cores on the Amazon EC2 cloud. Furthermore, we show that by leveraging software originally designed for big data infrastructure, Kira SE achieves competitive performance to the C implementation running on the NERSC Edison supercomputer. Our experience with Kira indicates that emerging Big Data platforms such as Apache Spark are a performant alternative for many-task scientific applications. △ Less

Submitted 14 March, 2016; v1 submitted 13 July, 2015; originally announced July 2015.

ACM Class: D.1.3; J.2

arXiv:1206.5265 [pdf]

Consensus ranking under the exponential model

Authors: Marina Meila, Kapil Phadnis, Arthur Patterson, Jeff A. Bilmes

Abstract: We analyze the generalized Mallows model, a popular exponential model over rankings. Estimating the central (or consensus) ranking from data is NP-hard. We obtain the following new results: (1) We show that search methods can estimate both the central ranking pi0 and the model parameters theta exactly. The search is n! in the worst case, but is tractable when the true distribution is concentrated… ▽ More We analyze the generalized Mallows model, a popular exponential model over rankings. Estimating the central (or consensus) ranking from data is NP-hard. We obtain the following new results: (1) We show that search methods can estimate both the central ranking pi0 and the model parameters theta exactly. The search is n! in the worst case, but is tractable when the true distribution is concentrated around its mode; (2) We show that the generalized Mallows model is jointly exponential in (pi0; theta), and introduce the conjugate prior for this model class; (3) The sufficient statistics are the pairwise marginal probabilities that item i is preferred to item j. Preliminary experiments confirm the theoretical predictions and compare the new algorithm and existing heuristics. △ Less

Submitted 20 June, 2012; originally announced June 2012.

Comments: Appears in Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI2007)

Report number: UAI-P-2007-PG-285-294

arXiv:1111.7166 [pdf, other]

PIQL: Success-Tolerant Query Processing in the Cloud

Authors: Michael Armbrust, Kristal Curtis, Tim Kraska, Armando Fox, Michael J. Franklin, David A. Patterson

Abstract: Newly-released web applications often succumb to a "Success Disaster," where overloaded database machines and resulting high response times destroy a previously good user experience. Unfortunately, the data independence provided by a traditional relational database system, while useful for agile development, only exacerbates the problem by hiding potentially expensive queries under simple declarat… ▽ More Newly-released web applications often succumb to a "Success Disaster," where overloaded database machines and resulting high response times destroy a previously good user experience. Unfortunately, the data independence provided by a traditional relational database system, while useful for agile development, only exacerbates the problem by hiding potentially expensive queries under simple declarative expressions. As a result, developers of these applications are increasingly abandoning relational databases in favor of imperative code written against distributed key/value stores, losing the many benefits of data independence in the process. Instead, we propose PIQL, a declarative language that also provides scale independence by calculating an upper bound on the number of key/value store operations that will be performed for any query. Coupled with a service level objective (SLO) compliance prediction model and PIQL's scalable database architecture, these bounds make it easy for developers to write success-tolerant applications that support an arbitrarily large number of users while still providing acceptable performance. In this paper, we present the PIQL query processing system and evaluate its scale independence on hundreds of machines using two benchmarks, TPC-W and SCADr. △ Less

Submitted 30 November, 2011; originally announced November 2011.

Comments: VLDB2012

Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 3, pp. 181-192 (2011)

arXiv:0911.0878 [pdf, other]

One-bit stochastic resonance storage device

Authors: S. A. Ibáñez, P. I. Fierens, G. A. Patterson, R. P. J. Perazzo, D. F. Grosz

Abstract: The increasing capacity of modern computers, driven by Moore's Law, is accompanied by smaller noise margins and higher error rates. In this paper we propose a memory device, consisting of a ring of two identical overdamped bistable forward-coupled oscillators, which may serve as a building block in a larger scale solution to this problem. We show that such a system is capable of storing one bit… ▽ More The increasing capacity of modern computers, driven by Moore's Law, is accompanied by smaller noise margins and higher error rates. In this paper we propose a memory device, consisting of a ring of two identical overdamped bistable forward-coupled oscillators, which may serve as a building block in a larger scale solution to this problem. We show that such a system is capable of storing one bit and its performance improves with the addition of noise. The proposed device can be regarded as asynchronous, in the sense that stored information can be retrieved at any time and, after a certain synchronization time, the probability of erroneous retrieval does not depend on the interrogated oscillator. We characterize memory persistence time and show it to be maximized for the same noise range that both minimizes the probability of error and ensures synchronization. We also present experimental results for a hard-wired version of the proposed memory, consisting of a loop of two Schmitt triggers. We show that this device is capable of storing one bit and does so more efficiently in the presence of noise. △ Less

Submitted 4 November, 2009; originally announced November 2009.

Comments: 12 pages, 7 figures

arXiv:cs/0408035 [pdf]

Monitoring, Analyzing, and Controlling Internet-scale Systems with ACME

Authors: David Oppenheimer, Vitaliy Vatkovskiy, Hakim Weatherspoon, Jason Lee, David A. Patterson, John Kubiatowicz

Abstract: Analyzing and controlling large distributed services under a wide range of conditions is difficult. Yet these capabilities are essential to a number of important development and operational tasks such as benchmarking, testing, and system management. To facilitate these tasks, we have built the Application Control and Monitoring Environment (ACME), a scalable, flexible infrastructure for monitori… ▽ More Analyzing and controlling large distributed services under a wide range of conditions is difficult. Yet these capabilities are essential to a number of important development and operational tasks such as benchmarking, testing, and system management. To facilitate these tasks, we have built the Application Control and Monitoring Environment (ACME), a scalable, flexible infrastructure for monitoring, analyzing, and controlling Internet-scale systems. ACME consists of two parts. ISING, the Internet Sensor In-Network agGregator, queries sensors and aggregates the results as they are routed through an overlay network. ENTRIE, the ENgine for TRiggering Internet Events, uses the data streams supplied by ISING, in combination with a user's XML configuration file, to trigger actuators such as killing processes during a robustness benchmark or paging a system administrator when predefined anomalous conditions are observed. In this paper we describe the design, implementation, and evaluation of ACME and its constituent parts. We find that for a 512-node system running atop an emulated Internet topology, ISING's use of in-network aggregation can reduce end-to-end query-response latency by more than 50% compared to using either direct network connections or the same overlay network without aggregation. We also find that an untuned implementation of ACME can invoke an actuator on one or all nodes in response to a discrete or aggregate event in less than four seconds, and we illustrate ACME's applicability to concrete benchmarking and monitoring scenarios. △ Less

Submitted 14 August, 2004; originally announced August 2004.

Report number: UCB//CSD-03-1276

Showing 1–22 of 22 results for author: Patterson, A