Skip to main content

Showing 1–19 of 19 results for author: Wagenmaker, A

.
  1. arXiv:2406.10522  [pdf, other

    cs.LG cs.AI cs.CL

    Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning

    Authors: Jifan Zhang, Lalit Jain, Yang Guo, Jiayi Chen, Kuan Lok Zhou, Siddharth Suresh, Andrew Wagenmaker, Scott Sievert, Timothy Rogers, Kevin Jamieson, Robert Mankoff, Robert Nowak

    Abstract: We present a novel multimodal preference dataset for creative tasks, consisting of over 250 million human ratings on more than 2.2 million captions, collected through crowdsourcing rating data for The New Yorker's weekly cartoon caption contest over the past eight years. This unique dataset supports the development and evaluation of multimodal large language models and preference-based fine-tuning… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  2. arXiv:2406.06856  [pdf, ps, other

    cs.LG cs.AI

    Sample Complexity Reduction via Policy Difference Estimation in Tabular Reinforcement Learning

    Authors: Adhyyan Narang, Andrew Wagenmaker, Lillian Ratliff, Kevin Jamieson

    Abstract: In this paper, we study the non-asymptotic sample complexity for the pure exploration problem in contextual bandits and tabular reinforcement learning (RL): identifying an epsilon-optimal policy from a set of policies with high probability. Existing work in bandits has shown that it is possible to identify the best policy by estimating only the difference between the behaviors of individual polici… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 59 pages, 2 Figures

  3. arXiv:2404.12308  [pdf, other

    cs.RO cs.LG eess.SY

    ASID: Active Exploration for System Identification in Robotic Manipulation

    Authors: Marius Memmel, Andrew Wagenmaker, Chuning Zhu, Patrick Yin, Dieter Fox, Abhishek Gupta

    Abstract: Model-free control strategies such as reinforcement learning have shown the ability to learn control strategies without requiring an accurate model or simulator of the world. While this is appealing due to the lack of modeling requirements, such methods can be sample inefficient, making them impractical in many real-world domains. On the other hand, model-based control techniques leveraging accura… ▽ More

    Submitted 26 June, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Project website at https://weirdlabuw.github.io/asid

  4. arXiv:2312.08559  [pdf, other

    cs.LG cs.CY stat.ML

    Fair Active Learning in Low-Data Regimes

    Authors: Romain Camilleri, Andrew Wagenmaker, Jamie Morgenstern, Lalit Jain, Kevin Jamieson

    Abstract: In critical machine learning applications, ensuring fairness is essential to avoid perpetuating social inequities. In this work, we address the challenges of reducing bias and improving accuracy in data-scarce environments, where the cost of collecting labeled data prohibits the use of large, labeled datasets. In such settings, active learning promises to maximize marginal accuracy gains of small… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

  5. arXiv:2306.09210  [pdf, other

    cs.LG cs.RO eess.SY math.OC stat.ML

    Optimal Exploration for Model-Based RL in Nonlinear Systems

    Authors: Andrew Wagenmaker, Guanya Shi, Kevin Jamieson

    Abstract: Learning to control unknown nonlinear dynamical systems is a fundamental problem in reinforcement learning and control theory. A commonly applied approach is to first explore the environment (exploration), learn an accurate model of it (system identification), and then compute an optimal controller with the minimum cost on this estimated system (policy optimization). While existing work has shown… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

  6. arXiv:2304.12466  [pdf, other

    cs.LG stat.ML

    Instance-Optimality in Interactive Decision Making: Toward a Non-Asymptotic Theory

    Authors: Andrew Wagenmaker, Dylan J. Foster

    Abstract: We consider the development of adaptive, instance-dependent algorithms for interactive decision making (bandits, reinforcement learning, and beyond) that, rather than only performing well in the worst case, adapt to favorable properties of real-world instances for improved performance. We aim for instance-optimality, a strong notion of adaptivity which asserts that, on any particular problem insta… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

  7. arXiv:2211.04974  [pdf, other

    cs.LG cs.AI stat.ML

    Leveraging Offline Data in Online Reinforcement Learning

    Authors: Andrew Wagenmaker, Aldo Pacchiano

    Abstract: Two central paradigms have emerged in the reinforcement learning (RL) community: online RL and offline RL. In the online RL setting, the agent has no prior knowledge of the environment, and must interact with it in order to find an $ε$-optimal policy. In the offline RL setting, the learner instead has access to a fixed dataset to learn from, but is unable to otherwise interact with the environment… ▽ More

    Submitted 20 July, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

  8. arXiv:2207.02575  [pdf, other

    cs.LG stat.ML

    Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design

    Authors: Andrew Wagenmaker, Kevin Jamieson

    Abstract: While much progress has been made in understanding the minimax sample complexity of reinforcement learning (RL) -- the complexity of learning on the "worst-case" instance -- such measures of complexity often do not capture the true difficulty of learning. In practice, on an "easy" instance, we might hope to achieve a complexity far better than that achievable on the worst-case instance. In this wo… ▽ More

    Submitted 20 July, 2023; v1 submitted 6 July, 2022; originally announced July 2022.

  9. arXiv:2206.11183  [pdf, other

    cs.LG stat.ML

    Active Learning with Safety Constraints

    Authors: Romain Camilleri, Andrew Wagenmaker, Jamie Morgenstern, Lalit Jain, Kevin Jamieson

    Abstract: Active learning methods have shown great promise in reducing the number of samples necessary for learning. As automated learning systems are adopted into real-time, real-world decision-making pipelines, it is increasingly important that such algorithms are designed with safety in mind. In this work we investigate the complexity of learning the best safe decision in interactive environments. We red… ▽ More

    Submitted 22 June, 2022; originally announced June 2022.

  10. arXiv:2201.11206  [pdf, other

    cs.LG stat.ML

    Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes

    Authors: Andrew Wagenmaker, Yifang Chen, Max Simchowitz, Simon S. Du, Kevin Jamieson

    Abstract: Reward-free reinforcement learning (RL) considers the setting where the agent does not have access to a reward function during exploration, but must propose a near-optimal policy for an arbitrary reward function revealed only after exploring. In the the tabular setting, it is well known that this is a more difficult problem than reward-aware (PAC) RL -- where the agent has access to the reward fun… ▽ More

    Submitted 18 June, 2022; v1 submitted 26 January, 2022; originally announced January 2022.

  11. arXiv:2112.03432  [pdf, other

    cs.LG stat.ML

    First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach

    Authors: Andrew Wagenmaker, Yifang Chen, Max Simchowitz, Simon S. Du, Kevin Jamieson

    Abstract: Obtaining first-order regret bounds -- regret bounds scaling not as the worst-case but with some measure of the performance of the optimal policy on a given instance -- is a core question in sequential decision-making. While such bounds exist in many settings, they have proven elusive in reinforcement learning with large state spaces. In this work we address this gap, and show that it is possible… ▽ More

    Submitted 20 October, 2022; v1 submitted 6 December, 2021; originally announced December 2021.

  12. arXiv:2111.12151  [pdf, other

    cs.LG stat.ML

    Best Arm Identification with Safety Constraints

    Authors: Zhenlin Wang, Andrew Wagenmaker, Kevin Jamieson

    Abstract: The best arm identification problem in the multi-armed bandit setting is an excellent model of many real-world decision-making problems, yet it fails to capture the fact that in the real-world, safety constraints often must be met while learning. In this work we study the question of best-arm identification in safety-critical settings, where the goal of the agent is to find the best safe option ou… ▽ More

    Submitted 23 November, 2021; originally announced November 2021.

  13. arXiv:2108.02717  [pdf, other

    cs.LG stat.ML

    Beyond No Regret: Instance-Dependent PAC Reinforcement Learning

    Authors: Andrew Wagenmaker, Max Simchowitz, Kevin Jamieson

    Abstract: The theory of reinforcement learning has focused on two fundamental problems: achieving low regret, and identifying $ε$-optimal policies. While a simple reduction allows one to apply a low-regret algorithm to obtain an $ε$-optimal policy and achieve the worst-case optimal rate, it is unknown whether low-regret algorithms can obtain the instance-optimal rate for policy identification. We show this… ▽ More

    Submitted 21 June, 2022; v1 submitted 5 August, 2021; originally announced August 2021.

  14. arXiv:2102.05214  [pdf, other

    cs.LG math.OC stat.ML

    Task-Optimal Exploration in Linear Dynamical Systems

    Authors: Andrew Wagenmaker, Max Simchowitz, Kevin Jamieson

    Abstract: Exploration in unknown environments is a fundamental problem in reinforcement learning and control. In this work, we study task-guided exploration and determine what precisely an agent must learn about their environment in order to complete a particular task. Formally, we study a broad class of decision-making problems in the setting of linear dynamical systems, a class that includes the linear qu… ▽ More

    Submitted 9 July, 2021; v1 submitted 9 February, 2021; originally announced February 2021.

  15. arXiv:2011.00576  [pdf, other

    cs.LG stat.ML

    Experimental Design for Regret Minimization in Linear Bandits

    Authors: Andrew Wagenmaker, Julian Katz-Samuels, Kevin Jamieson

    Abstract: In this paper we propose a novel experimental design-based algorithm to minimize regret in online stochastic linear and combinatorial bandits. While existing literature tends to focus on optimism-based algorithms--which have been shown to be suboptimal in many cases--our approach carefully plans which action to take by balancing the tradeoff between information gain and reward, overcoming the fail… ▽ More

    Submitted 26 February, 2021; v1 submitted 1 November, 2020; originally announced November 2020.

  16. arXiv:2002.00495  [pdf, other

    cs.LG eess.SY stat.ML

    Active Learning for Identification of Linear Dynamical Systems

    Authors: Andrew Wagenmaker, Kevin Jamieson

    Abstract: We propose an algorithm to actively estimate the parameters of a linear dynamical system. Given complete control over the system's input, our algorithm adaptively chooses the inputs to accelerate estimation. We show a finite time bound quantifying the estimation rate our algorithm attains and prove matching upper and lower bounds which guarantee its asymptotic optimality, up to constants. In addit… ▽ More

    Submitted 22 June, 2020; v1 submitted 2 February, 2020; originally announced February 2020.

  17. arXiv:1710.08873  [pdf, other

    cs.CV stat.ML

    Robust Photometric Stereo via Dictionary Learning

    Authors: Andrew J. Wagenmaker, Brian E. Moore, Raj Rao Nadakuditi

    Abstract: Photometric stereo is a method that seeks to reconstruct the normal vectors of an object from a set of images of the object illuminated under different light sources. While effective in some situations, classical photometric stereo relies on a diffuse surface model that cannot handle objects with complex reflectance patterns, and it is sensitive to non-idealities in the images. In this work, we pr… ▽ More

    Submitted 7 August, 2018; v1 submitted 24 October, 2017; originally announced October 2017.

    Comments: To appear in IEEE Transactions on Computational Imaging

  18. arXiv:1710.00230  [pdf, other

    cs.CV stat.ML

    Robust Surface Reconstruction from Gradients via Adaptive Dictionary Regularization

    Authors: Andrew J. Wagenmaker, Brian E. Moore, Raj Rao Nadakuditi

    Abstract: This paper introduces a novel approach to robust surface reconstruction from photometric stereo normal vector maps that is particularly well-suited for reconstructing surfaces from noisy gradients. Specifically, we propose an adaptive dictionary learning based approach that attempts to simultaneously integrate the gradient fields while sparsely representing the spatial patches of the reconstructed… ▽ More

    Submitted 30 September, 2017; originally announced October 2017.

    Comments: ICIP 2017

  19. arXiv:1710.00002  [pdf, other

    cs.CV stat.ML

    Robust Photometric Stereo Using Learned Image and Gradient Dictionaries

    Authors: Andrew J. Wagenmaker, Brian E. Moore, Raj Rao Nadakuditi

    Abstract: Photometric stereo is a method for estimating the normal vectors of an object from images of the object under varying lighting conditions. Motivated by several recent works that extend photometric stereo to more general objects and lighting conditions, we study a new robust approach to photometric stereo that utilizes dictionary learning. Specifically, we propose and analyze two approaches to adap… ▽ More

    Submitted 30 September, 2017; originally announced October 2017.

    Comments: ICIP 2017