Skip to main content

Showing 1–22 of 22 results for author: Patterson, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.02355  [pdf, other

    cs.LG cs.AI

    When is Offline Policy Selection Sample Efficient for Reinforcement Learning?

    Authors: Vincent Liu, Prabhat Nagarajan, Andrew Patterson, Martha White

    Abstract: Offline reinforcement learning algorithms often require careful hyperparameter tuning. Consequently, before deployment, we need to select amongst a set of candidate policies. As yet, however, there is little understanding about the fundamental limits of this offline policy selection (OPS) problem. In this work we aim to provide clarity on when sample efficient OPS is possible, primarily by connect… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  2. arXiv:2304.01315  [pdf, other

    cs.LG cs.AI

    Empirical Design in Reinforcement Learning

    Authors: Andrew Patterson, Samuel Neumann, Martha White, Adam White

    Abstract: Empirical design in reinforcement learning is no small task. Running good experiments requires attention to detail and at times significant computational resources. While compute resources available per dollar have continued to grow rapidly, so have the scale of typical experiments in reinforcement learning. It is now common to benchmark agents with millions of parameters against dozens of tasks,… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: In submission to JMLR

  3. arXiv:2205.08464  [pdf, other

    cs.LG

    Robust Losses for Learning Value Functions

    Authors: Andrew Patterson, Victor Liao, Martha White

    Abstract: Most value function learning algorithms in reinforcement learning are based on the mean squared (projected) Bellman error. However, squared errors are known to be sensitive to outliers, both skewing the solution of the objective and resulting in high-magnitude and high-variance gradients. To control these high-magnitude updates, typical strategies in RL involve clip** gradients, clip** rewards… ▽ More

    Submitted 17 April, 2023; v1 submitted 17 May, 2022; originally announced May 2022.

    Comments: IEEE Transactions on Pattern Analysis and Machine Intelligence (2022)

  4. arXiv:2202.02396  [pdf, other

    cs.LG cs.AI

    A Temporal-Difference Approach to Policy Gradient Estimation

    Authors: Samuele Tosatto, Andrew Patterson, Martha White, A. Rupam Mahmood

    Abstract: The policy gradient theorem (Sutton et al., 2000) prescribes the usage of a cumulative discounted state distribution under the target policy to approximate the gradient. Most algorithms based on this theorem, in practice, break this assumption, introducing a distribution shift that can cause the convergence to poor solutions. In this paper, we propose a new approach of reconstructing the policy gr… ▽ More

    Submitted 7 July, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

  5. arXiv:2104.13844  [pdf, other

    cs.LG cs.AI

    A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning

    Authors: Andrew Patterson, Adam White, Martha White

    Abstract: Many reinforcement learning algorithms rely on value estimation, however, the most widely used algorithms -- namely temporal difference algorithms -- can diverge under both off-policy sampling and nonlinear function approximation. Many algorithms have been developed for off-policy value estimation based on the linear mean squared projected Bellman error (MSPBE) and are sound under linear function… ▽ More

    Submitted 28 March, 2022; v1 submitted 28 April, 2021; originally announced April 2021.

    Comments: Accepted for publication in JMLR 2022

  6. arXiv:2009.03864  [pdf, other

    eess.SY cs.LG cs.RO math.OC

    Contraction $\mathcal{L}_1$-Adaptive Control using Gaussian Processes

    Authors: Aditya Gahlawat, Arun Lakshmanan, Lin Song, Andrew Patterson, Zhuohuan Wu, Naira Hovakimyan, Evangelos Theodorou

    Abstract: We present $\mathcal{CL}_1$-$\mathcal{GP}$, a control framework that enables safe simultaneous learning and control for systems subject to uncertainties. The two main constituents are contraction theory-based $\mathcal{L}_1$ ($\mathcal{CL}_1$) control and Bayesian learning in the form of Gaussian process (GP) regression. The $\mathcal{CL}_1$ controller ensures that control objectives are met while… ▽ More

    Submitted 30 November, 2021; v1 submitted 8 September, 2020; originally announced September 2020.

    Comments: Submitted to Learning for Dynamics and Control (L4DC) Conference, 2021

  7. arXiv:2007.00611  [pdf, other

    cs.LG cs.AI stat.ML

    Gradient Temporal-Difference Learning with Regularized Corrections

    Authors: Sina Ghiassian, Andrew Patterson, Shivam Garg, Dhawal Gupta, Adam White, Martha White

    Abstract: It is still common to use Q-learning and temporal difference (TD) learning-even though they have divergence issues and sound Gradient TD alternatives exist-because divergence seems rare and they typically perform well. However, recent work with large neural network learning systems reveals that instability is more common than previously thought. Practitioners face a difficult dilemma: choose an ea… ▽ More

    Submitted 17 September, 2020; v1 submitted 1 July, 2020; originally announced July 2020.

    Comments: Appeared in Proceedings of the 37th International Conference on Machine Learning (ICML2020)

  8. arXiv:2002.01965  [pdf, other

    cs.RO cs.LG

    Learning Probabilistic Intersection Traffic Models for Trajectory Prediction

    Authors: Andrew Patterson, Aditya Gahlawat, Naira Hovakimyan

    Abstract: Autonomous agents must be able to safely interact with other vehicles to integrate into urban environments. The safety of these agents is dependent on their ability to predict collisions with other vehicles' future trajectories for replanning and collision avoidance. The information needed to predict collisions can be learned from previously observed vehicle trajectories in a specific environment,… ▽ More

    Submitted 5 February, 2020; originally announced February 2020.

  9. arXiv:1907.09558  [pdf, other

    cs.RO cs.SE

    System-Level Development of a User-Integrated Semi-Autonomous Lawn Mowing System: Problem Overview, Basic Requirements, and Proposed Architecture

    Authors: Albert E. Patterson, Yang Yuan, William R. Norris

    Abstract: This concept paper outlines some recent efforts toward the design and development of user-integrated semi-autonomous home-sized lawn mowing systems from a systems engineering perspective. This is an important and emerging field of study within the robotics and systems engineering communities. The work presented includes a review of current progress on this problem, a discussion of the problem from… ▽ More

    Submitted 12 July, 2019; originally announced July 2019.

    Comments: 11 pages, 8 figures, and 32 references

  10. arXiv:1904.02765  [pdf, other

    cs.RO cs.LG eess.SY math.OC

    Intent-Aware Probabilistic Trajectory Estimation for Collision Prediction with Uncertainty Quantification

    Authors: Andrew Patterson, Arun Lakshmanan, Naira Hovakimyan

    Abstract: Collision prediction in a dynamic and unknown environment relies on knowledge of how the environment is changing. Many collision prediction methods rely on deterministic knowledge of how obstacles are moving in the environment. However, complete deterministic knowledge of the obstacles' motion is often unavailable. This work proposes a Gaussian process based prediction method that replaces the ass… ▽ More

    Submitted 4 April, 2019; originally announced April 2019.

  11. arXiv:1902.05027  [pdf, other

    cs.RO cs.CG cs.GR

    Proximity Queries for Absolutely Continuous Parametric Curves

    Authors: Arun Lakshmanan, Andrew Patterson, Venanzio Cichella, Naira Hovakimyan

    Abstract: In motion planning problems for autonomous robots, such as self-driving cars, the robot must ensure that its planned path is not in close proximity to obstacles in the environment. However, the problem of evaluating the proximity is generally non-convex and serves as a significant computational bottleneck for motion planning algorithms. In this paper, we present methods for a general class of abso… ▽ More

    Submitted 19 June, 2019; v1 submitted 13 February, 2019; originally announced February 2019.

    Comments: Proceedings of Robotics: Science and Systems

  12. arXiv:1902.03383  [pdf, ps, other

    cs.OS

    Cloud Programming Simplified: A Berkeley View on Serverless Computing

    Authors: Eric Jonas, Johann Schleier-Smith, Vikram Sreekanti, Chia-Che Tsai, Anurag Khandelwal, Qifan Pu, Vaishaal Shankar, Joao Carreira, Karl Krauth, Neeraja Yadwadkar, Joseph E. Gonzalez, Raluca Ada Popa, Ion Stoica, David A. Patterson

    Abstract: Serverless cloud computing handles virtually all the system administration operations needed to make it easier for programmers to use the cloud. It provides an interface that greatly simplifies cloud programming, and represents an evolution that parallels the transition from assembly language to high-level programming languages. This paper gives a quick history of cloud computing, including an acc… ▽ More

    Submitted 9 February, 2019; originally announced February 2019.

  13. arXiv:1811.02597  [pdf, other

    cs.LG cs.AI stat.ML

    Online Off-policy Prediction

    Authors: Sina Ghiassian, Andrew Patterson, Martha White, Richard S. Sutton, Adam White

    Abstract: This paper investigates the problem of online prediction learning, where learning proceeds continuously as the agent interacts with an environment. The predictions made by the agent are contingent on a particular way of behaving, represented as a value function. However, the behavior used to select actions and generate the behavior data might be different from the one used to define the prediction… ▽ More

    Submitted 6 November, 2018; originally announced November 2018.

    Comments: 68 pages

  14. arXiv:1807.06763  [pdf, other

    cs.LG cs.AI stat.ML

    General Value Function Networks

    Authors: Matthew Schlegel, Andrew Jacobsen, Zaheer Abbas, Andrew Patterson, Adam White, Martha White

    Abstract: State construction is important for learning in partially observable environments. A general purpose strategy for state construction is to learn the state update using a Recurrent Neural Network (RNN), which updates the internal state using the current internal state and the most recent observation. This internal state provides a summary of the observed sequence, to facilitate accurate predictions… ▽ More

    Submitted 2 February, 2021; v1 submitted 17 July, 2018; originally announced July 2018.

    Comments: Published in the Journal of Artificial Intelligence Research

    Journal ref: Journal of Artificial Intelligence Research, 70, 497-543 (2021)

  15. arXiv:1806.04624  [pdf, other

    cs.AI

    Organizing Experience: A Deeper Look at Replay Mechanisms for Sample-based Planning in Continuous State Domains

    Authors: Yangchen Pan, Muhammad Zaheer, Adam White, Andrew Patterson, Martha White

    Abstract: Model-based strategies for control are critical to obtain sample efficient learning. Dyna is a planning paradigm that naturally interleaves learning and planning, by simulating one-step experience to update the action-value function. This elegant planning strategy has been mostly explored in the tabular setting. The aim of this paper is to revisit sample-based planning, in stochastic and continuou… ▽ More

    Submitted 12 June, 2018; originally announced June 2018.

    Comments: IJCAI 2018

  16. arXiv:1801.10588  [pdf, other

    cs.NI

    Percolation for D2D Networks on Street Systems

    Authors: Elie Cali, Nila Novita Gafur, Christian Hirsch, Benedikt Jahnel, Taoufik En-Najjary, Robert I. A. Patterson

    Abstract: We study fundamental characteristics for the connectivity of multi-hop D2D networks. Devices are randomly distributed on street systems and are able to communicate with each other whenever their separation is smaller than some connectivity threshold. We model the street systems as Poisson-Voronoi or Poisson-Delaunay tessellations with varying street lengths. We interpret the existence of adequate… ▽ More

    Submitted 31 January, 2018; originally announced January 2018.

    Comments: 6 pages, 7 figures, 1 table

  17. arXiv:1607.02318  [pdf, other

    cs.AR

    The Renewed Case for the Reduced Instruction Set Computer: Avoiding ISA Bloat with Macro-Op Fusion for RISC-V

    Authors: Christopher Celio, Palmer Dabbelt, David A. Patterson, Krste Asanović

    Abstract: This report makes the case that a well-designed Reduced Instruction Set Computer (RISC) can match, and even exceed, the performance and code density of existing commercial Complex Instruction Set Computers (CISC) while maintaining the simplicity and cost-effectiveness that underpins the original RISC goals. We begin by comparing the dynamic instruction counts and dynamic instruction bytes fetche… ▽ More

    Submitted 8 July, 2016; originally announced July 2016.

    Report number: UCB/EECS-2016-130

  18. arXiv:1507.03325  [pdf, other

    cs.DC

    Scientific Computing Meets Big Data Technology: An Astronomy Use Case

    Authors: Zhao Zhang, Kyle Barbary, Frank Austin Nothaft, Evan Sparks, Oliver Zahn, Michael J. Franklin, David A. Patterson, Saul Perlmutter

    Abstract: Scientific analyses commonly compose multiple single-process programs into a dataflow. An end-to-end dataflow of single-process programs is known as a many-task application. Typically, tools from the HPC software stack are used to parallelize these analyses. In this work, we investigate an alternate approach that uses Apache Spark -- a modern big data platform -- to parallelize many-task applicati… ▽ More

    Submitted 14 March, 2016; v1 submitted 13 July, 2015; originally announced July 2015.

    ACM Class: D.1.3; J.2

  19. arXiv:1206.5265  [pdf

    cs.LG cs.AI stat.ML

    Consensus ranking under the exponential model

    Authors: Marina Meila, Kapil Phadnis, Arthur Patterson, Jeff A. Bilmes

    Abstract: We analyze the generalized Mallows model, a popular exponential model over rankings. Estimating the central (or consensus) ranking from data is NP-hard. We obtain the following new results: (1) We show that search methods can estimate both the central ranking pi0 and the model parameters theta exactly. The search is n! in the worst case, but is tractable when the true distribution is concentrated… ▽ More

    Submitted 20 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI2007)

    Report number: UAI-P-2007-PG-285-294

  20. arXiv:1111.7166  [pdf, other

    cs.DB

    PIQL: Success-Tolerant Query Processing in the Cloud

    Authors: Michael Armbrust, Kristal Curtis, Tim Kraska, Armando Fox, Michael J. Franklin, David A. Patterson

    Abstract: Newly-released web applications often succumb to a "Success Disaster," where overloaded database machines and resulting high response times destroy a previously good user experience. Unfortunately, the data independence provided by a traditional relational database system, while useful for agile development, only exacerbates the problem by hiding potentially expensive queries under simple declarat… ▽ More

    Submitted 30 November, 2011; originally announced November 2011.

    Comments: VLDB2012

    Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 3, pp. 181-192 (2011)

  21. arXiv:0911.0878  [pdf, other

    nlin.AO cond-mat.other cs.OH

    One-bit stochastic resonance storage device

    Authors: S. A. Ibáñez, P. I. Fierens, G. A. Patterson, R. P. J. Perazzo, D. F. Grosz

    Abstract: The increasing capacity of modern computers, driven by Moore's Law, is accompanied by smaller noise margins and higher error rates. In this paper we propose a memory device, consisting of a ring of two identical overdamped bistable forward-coupled oscillators, which may serve as a building block in a larger scale solution to this problem. We show that such a system is capable of storing one bit… ▽ More

    Submitted 4 November, 2009; originally announced November 2009.

    Comments: 12 pages, 7 figures

  22. arXiv:cs/0408035  [pdf

    cs.DC cs.NI

    Monitoring, Analyzing, and Controlling Internet-scale Systems with ACME

    Authors: David Oppenheimer, Vitaliy Vatkovskiy, Hakim Weatherspoon, Jason Lee, David A. Patterson, John Kubiatowicz

    Abstract: Analyzing and controlling large distributed services under a wide range of conditions is difficult. Yet these capabilities are essential to a number of important development and operational tasks such as benchmarking, testing, and system management. To facilitate these tasks, we have built the Application Control and Monitoring Environment (ACME), a scalable, flexible infrastructure for monitori… ▽ More

    Submitted 14 August, 2004; originally announced August 2004.

    Report number: UCB//CSD-03-1276