Skip to main content

Showing 1–50 of 81 results for author: Brunskill, E

.
  1. arXiv:2407.00870  [pdf, other

    cs.CL cs.HC

    Roleplay-doh: Enabling Domain-Experts to Create LLM-simulated Patients via Eliciting and Adhering to Principles

    Authors: Ryan Louie, Ananjan Nandi, William Fang, Cheng Chang, Emma Brunskill, Diyi Yang

    Abstract: Recent works leverage LLMs to roleplay realistic social scenarios, aiding novices in practicing their social skills. However, simulating sensitive interactions, such as in mental health, is challenging. Privacy concerns restrict data access, and collecting expert feedback, although vital, is laborious. To address this, we develop Roleplay-doh, a novel human-LLM collaboration pipeline that elicits… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 34 pages, 24 figures, 11 Tables

  2. arXiv:2405.06061  [pdf, other

    cs.HC

    Supporting Physical Activity Behavior Change with LLM-Based Conversational Agents

    Authors: Matthew Jörke, Shardul Sapkota, Lyndsea Warkenthien, Niklas Vainio, Paul Schmiedmayer, Emma Brunskill, James Landay

    Abstract: Physical activity has significant benefits to health, yet large portions of the population remain physically inactive. Mobile health applications show promising potential for low-cost, scalable physical activity promotion, but existing approaches are often insufficiently personalized to a user's context and life circumstances. In this work, we explore the potential for large language model (LLM) b… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  3. arXiv:2405.04636  [pdf, ps, other

    cs.LG stat.ML

    Data-driven Error Estimation: Upper Bounding Multiple Errors with No Technical Debt

    Authors: Sanath Kumar Krishnamurthy, Susan Athey, Emma Brunskill

    Abstract: We formulate the problem of constructing multiple simultaneously valid confidence intervals (CIs) as estimating a high probability upper bound on the maximum error for a class/set of estimate-estimand-error tuples, and refer to this as the error estimation problem. For a single such tuple, data-driven confidence intervals can often be used to bound the error in our estimate. However, for a class o… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  4. arXiv:2403.02795  [pdf, other

    cs.AI cs.CL

    Evaluating and Optimizing Educational Content with Large Language Model Judgments

    Authors: Joy He-Yueya, Noah D. Goodman, Emma Brunskill

    Abstract: Creating effective educational materials generally requires expensive and time-consuming studies of student learning outcomes. To overcome this barrier, one idea is to build computational models of student learning and use them to optimize instructional materials. However, it is difficult to model the cognitive processes of learning dynamics. We propose an alternative approach that uses Language M… ▽ More

    Submitted 6 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: 11 pages

  5. arXiv:2403.01386  [pdf, other

    stat.ME econ.EM

    Minimax-Regret Sample Selection in Randomized Experiments

    Authors: Yuchen Hu, Henry Zhu, Emma Brunskill, Stefan Wager

    Abstract: Randomized controlled trials are often run in settings with many subpopulations that may have differential benefits from the treatment being evaluated. We consider the problem of sample selection, i.e., whom to enroll in a randomized trial, such as to optimize welfare in a heterogeneous population. We formalize this problem within the minimax-regret framework, and derive optimal sample-selection s… ▽ More

    Submitted 25 June, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

  6. arXiv:2401.05193  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Experiment Planning with Function Approximation

    Authors: Aldo Pacchiano, Jonathan N. Lee, Emma Brunskill

    Abstract: We study the problem of experiment planning with function approximation in contextual bandit problems. In settings where there is a significant overhead to deploying adaptive algorithms -- for example, when the execution of the data collection policies is required to be distributed, or a human in the loop is needed to implement these policies -- producing in advance a set of policies for data coll… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: 10 pages main

  7. Improving Student Learning with Hybrid Human-AI Tutoring: A Three-Study Quasi-Experimental Investigation

    Authors: Danielle R. Thomas, Jionghao Lin, Erin Gatz, Ashish Gurung, Shivang Gupta, Kole Norberg, Stephen E. Fancsali, Vincent Aleven, Lee Branstetter, Emma Brunskill, Kenneth R. Koedinger

    Abstract: Artificial intelligence (AI) applications to support human tutoring have potential to significantly improve learning outcomes, but engagement issues persist, especially among students from low-income backgrounds. We introduce an AI-assisted tutoring model that combines human and AI tutoring and hypothesize that this synergy will have positive impacts on learning processes. To investigate this hypo… ▽ More

    Submitted 21 December, 2023; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: 17 pages

  8. arXiv:2312.02438  [pdf, other

    cs.LG

    Adaptive Instrument Design for Indirect Experiments

    Authors: Yash Chandak, Shiv Shankar, Vasilis Syrgkanis, Emma Brunskill

    Abstract: Indirect experiments provide a valuable framework for estimating treatment effects in situations where conducting randomized control trials (RCTs) is impractical or unethical. Unlike RCTs, indirect experiments estimate treatment effects by leveraging (conditional) instrumental variables, enabling estimation through encouragement and recommendation rather than strict treatment assignment. However,… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  9. arXiv:2311.09483  [pdf, other

    cs.LG cs.AI

    Adaptive Interventions with User-Defined Goals for Health Behavior Change

    Authors: Aishwarya Mandyam, Matthew Jörke, William Denton, Barbara E. Engelhardt, Emma Brunskill

    Abstract: Promoting healthy lifestyle behaviors remains a major public health concern, particularly due to their crucial role in preventing chronic conditions such as cancer, heart disease, and type 2 diabetes. Mobile health applications present a promising avenue for low-cost, scalable health behavior change promotion. Researchers are increasingly exploring adaptive algorithms that personalize intervention… ▽ More

    Submitted 23 May, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2023, December 10th, 2023, New Orleans, United States, 5 pages Full paper to be presented at Conference on Health Inference and Learning (CHIL) 2024, June 27th, 2024, New York City, United States, 11 pages

  10. arXiv:2308.14089  [pdf, other

    cs.CL cs.AI cs.LG

    MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records

    Authors: Scott L. Fleming, Alejandro Lozano, William J. Haberkorn, Jenelle A. **dal, Eduardo P. Reis, Rahul Thapa, Louis Blankemeier, Julian Z. Genkins, Ethan Steinberg, Ashwin Nayak, Birju S. Patel, Chia-Chun Chiang, Alison Callahan, Zepeng Huo, Sergios Gatidis, Scott J. Adams, Oluseyi Fayanju, Shreya J. Shah, Thomas Savage, Ethan Goh, Akshay S. Chaudhari, Nima Aghaeepour, Christopher Sharp, Michael A. Pfeffer, Percy Liang , et al. (5 additional authors not shown)

    Abstract: The ability of large language models (LLMs) to follow natural language instructions with human-level fluency suggests many opportunities in healthcare to reduce administrative burden and improve quality of care. However, evaluating LLMs on realistic text generation tasks for healthcare remains challenging. Existing question answering datasets for electronic health record (EHR) data fail to capture… ▽ More

    Submitted 24 December, 2023; v1 submitted 27 August, 2023; originally announced August 2023.

  11. arXiv:2307.02108  [pdf, other

    cs.LG stat.ML

    Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization

    Authors: Sanath Kumar Krishnamurthy, Ruohan Zhan, Susan Athey, Emma Brunskill

    Abstract: In many applications, e.g. in healthcare and e-commerce, the goal of a contextual bandit may be to learn an optimal treatment assignment policy at the end of the experiment. That is, to minimize simple regret. However, this objective remains understudied. We propose a new family of computationally efficient bandit algorithms for the stochastic contextual bandit setting, where a tuning parameter de… ▽ More

    Submitted 2 November, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

  12. arXiv:2306.14892  [pdf, other

    cs.LG cs.AI

    Supervised Pretraining Can Learn In-Context Reinforcement Learning

    Authors: Jonathan N. Lee, Annie Xie, Aldo Pacchiano, Yash Chandak, Chelsea Finn, Ofir Nachum, Emma Brunskill

    Abstract: Large transformer models trained on diverse datasets have shown a remarkable ability to learn in-context, achieving high few-shot performance on tasks they were not explicitly trained to solve. In this paper, we study the in-context learning capabilities of transformers in decision-making problems, i.e., reinforcement learning (RL) for bandits and Markov decision processes. To do so, we introduce… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

  13. arXiv:2306.14069  [pdf, other

    cs.LG

    Waypoint Transformer: Reinforcement Learning via Supervised Learning with Intermediate Targets

    Authors: Anirudhan Badrinath, Yannis Flet-Berliac, Allen Nie, Emma Brunskill

    Abstract: Despite the recent advancements in offline reinforcement learning via supervised learning (RvS) and the success of the decision transformer (DT) architecture in various domains, DTs have fallen short in several challenging benchmarks. The root cause of this underperformance lies in their inability to seamlessly connect segments of suboptimal trajectories. To overcome this limitation, we present a… ▽ More

    Submitted 18 November, 2023; v1 submitted 24 June, 2023; originally announced June 2023.

    Comments: Accepted to the Conference on Neural Information Processing Systems 2023 (NeurIPS 2023)

  14. arXiv:2306.12389  [pdf, other

    stat.AP

    Automated Reminders Reduce Incarceration for Missed Court Dates: Evidence from a Text Message Experiment

    Authors: Alex Chohlas-Wood, Madison Coots, Joe Nudell, Julian Nyarko, Emma Brunskill, Todd Rogers, Sharad Goel

    Abstract: Millions of Americans must attend mandatory court dates every year. To boost appearance rates, jurisdictions nationwide are increasingly turning to automated reminders, but previous research offers mixed evidence on their effectiveness. In partnership with the Santa Clara County Public Defender Office, we randomly assigned 5,709 public defender clients to either receive automated text message remi… ▽ More

    Submitted 22 March, 2024; v1 submitted 21 June, 2023; originally announced June 2023.

  15. arXiv:2304.04933  [pdf, other

    cs.AI cs.CL

    Reinforcement Learning Tutor Better Supported Lower Performers in a Math Task

    Authors: Sherry Ruan, Allen Nie, William Steenbergen, Jiayu He, JQ Zhang, Meng Guo, Yao Liu, Kyle Dang Nguyen, Catherine Y Wang, Rui Ying, James A Landay, Emma Brunskill

    Abstract: Resource limitations make it hard to provide all students with one of the most effective educational interventions: personalized instruction. Reinforcement learning could be a key tool to reduce the development cost and improve the effectiveness of intelligent tutoring software that aims to provide the right support, at the right time, to a student. Here we illustrate that deep reinforcement learn… ▽ More

    Submitted 13 April, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

    Comments: 23 pages. Under review

  16. arXiv:2302.09451  [pdf, other

    cs.LG stat.ML

    Estimating Optimal Policy Value in General Linear Contextual Bandits

    Authors: Jonathan N. Lee, Weihao Kong, Aldo Pacchiano, Vidya Muthukumar, Emma Brunskill

    Abstract: In many bandit problems, the maximal reward achievable by a policy is often unknown in advance. We consider the problem of estimating the optimal policy value in the sublinear data regime before the optimal policy is even learnable. We refer to this as $V^*$ estimation. It was recently shown that fast $V^*$ estimation is possible but only in disjoint linear bandits with Gaussian covariates. Whethe… ▽ More

    Submitted 18 February, 2023; originally announced February 2023.

  17. arXiv:2301.11426  [pdf, other

    cs.LG

    Model-based Offline Reinforcement Learning with Local Misspecification

    Authors: Kefan Dong, Yannis Flet-Berliac, Allen Nie, Emma Brunskill

    Abstract: We present a model-based offline reinforcement learning policy performance lower bound that explicitly captures dynamics model misspecification and distribution mismatch and we propose an empirical algorithm for optimal offline policy selection. Theoretically, we prove a novel safe policy improvement theorem by establishing pessimism approximations to the value function. Our key insight is to join… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

    Comments: Accepted by AAAI-23

  18. arXiv:2211.08802  [pdf, other

    cs.LG cs.AI stat.ML

    Giving Feedback on Interactive Student Programs with Meta-Exploration

    Authors: Evan Zheran Liu, Moritz Stephan, Allen Nie, Chris Piech, Emma Brunskill, Chelsea Finn

    Abstract: Develo** interactive software, such as websites or games, is a particularly engaging way to learn computer science. However, teaching and giving feedback on such software is time-consuming -- standard approaches require instructors to manually grade student-implemented interactive programs. As a result, online platforms that serve millions, like Code.org, are unable to provide any feedback on as… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

    Comments: Advances in Neural Information Processing Systems (NeurIPS 2022). Selected as Oral

  19. arXiv:2211.02016  [pdf, other

    cs.LG cs.AI

    Oracle Inequalities for Model Selection in Offline Reinforcement Learning

    Authors: Jonathan N. Lee, George Tucker, Ofir Nachum, Bo Dai, Emma Brunskill

    Abstract: In offline reinforcement learning (RL), a learner leverages prior logged data to learn a good policy without interacting with the environment. A major challenge in applying such methods in practice is the lack of both theoretically principled and practical tools for model selection and evaluation. To address this, we study the problem of model selection in offline RL with value function approximat… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

  20. arXiv:2210.08642  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data

    Authors: Allen Nie, Yannis Flet-Berliac, Deon R. Jordan, William Steenbergen, Emma Brunskill

    Abstract: Offline reinforcement learning (RL) can be used to improve future performance by leveraging historical data. There exist many different algorithms for offline RL, and it is well recognized that these algorithms, and their hyperparameter settings, can lead to decision policies with substantially differing performance. This prompts the need for pipelines that allow practitioners to systematically pe… ▽ More

    Submitted 12 January, 2023; v1 submitted 16 October, 2022; originally announced October 2022.

    Comments: 32 pages. Published at NeurIPS 2022. Presented at RLDM 2022

  21. arXiv:2207.00632  [pdf, other

    cs.LG

    Offline Policy Optimization with Eligible Actions

    Authors: Yao Liu, Yannis Flet-Berliac, Emma Brunskill

    Abstract: Offline policy optimization could have a large impact on many real-world decision-making problems, as online learning may be infeasible in many applications. Importance sampling and its variants are a commonly used type of estimator in offline policy evaluation, and such estimators typically do not require assumptions on the properties and representational capabilities of value function or decisio… ▽ More

    Submitted 1 July, 2022; originally announced July 2022.

    Comments: Accepted at the 38th Conference on Uncertainty in Artificial Intelligence (UAI 2022)

  22. arXiv:2112.15221  [pdf, other

    cs.AI

    Constraint Sampling Reinforcement Learning: Incorporating Expertise For Faster Learning

    Authors: Tong Mu, Georgios Theocharous, David Arbour, Emma Brunskill

    Abstract: Online reinforcement learning (RL) algorithms are often difficult to deploy in complex human-facing applications as they may learn slowly and have poor early performance. To address this, we introduce a practical algorithm for incorporating human insight to speed learning. Our algorithm, Constraint Sampling Reinforcement Learning (CSRL), incorporates prior domain knowledge as constraints/restricti… ▽ More

    Submitted 30 December, 2021; originally announced December 2021.

    Journal ref: AAAI2022

  23. arXiv:2111.14272  [pdf, other

    cs.LG cs.AI stat.ME

    Identification of Subgroups With Similar Benefits in Off-Policy Policy Evaluation

    Authors: Ramtin Keramati, Omer Gottesman, Leo Anthony Celi, Finale Doshi-Velez, Emma Brunskill

    Abstract: Off-policy policy evaluation methods for sequential decision making can be used to help identify if a proposed decision policy is better than a current baseline policy. However, a new decision policy may be better than a baseline policy for some individuals but not others. This has motivated a push towards personalization and accurate per-state estimates of heterogeneous treatment effects (HTEs).… ▽ More

    Submitted 28 November, 2021; originally announced November 2021.

  24. arXiv:2111.07966  [pdf, other

    stat.ME stat.ML

    Evaluating Treatment Prioritization Rules via Rank-Weighted Average Treatment Effects

    Authors: Steve Yadlowsky, Scott Fleming, Nigam Shah, Emma Brunskill, Stefan Wager

    Abstract: There are a number of available methods for selecting whom to prioritize for treatment, including ones based on treatment effect estimation, risk scoring, and hand-crafted rules. We propose rank-weighted average treatment effect (RATE) metrics as a simple and general family of metrics for comparing and testing the quality of treatment prioritization rules. RATE metrics are agnostic as to how the p… ▽ More

    Submitted 28 November, 2023; v1 submitted 15 November, 2021; originally announced November 2021.

  25. arXiv:2110.14615  [pdf, other

    cs.AI cs.CY cs.LG

    Play to Grade: Testing Coding Games as Classifying Markov Decision Process

    Authors: Allen Nie, Emma Brunskill, Chris Piech

    Abstract: Contemporary coding education often presents students with the task of develo** programs that have user interaction and complex dynamic systems, such as mouse based games. While pedagogically compelling, there are no contemporary autonomous methods for providing feedback. Notably, interactive programs are impossible to grade by traditional unit tests. In this paper we formalize the challenge of… ▽ More

    Submitted 14 December, 2021; v1 submitted 27 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021, 16 pages, 7 figures

  26. arXiv:2109.08792  [pdf, other

    cs.LG cs.CY

    Learning to be Fair: A Consequentialist Approach to Equitable Decision-Making

    Authors: Alex Chohlas-Wood, Madison Coots, Henry Zhu, Emma Brunskill, Sharad Goel

    Abstract: In an attempt to make algorithms fair, the machine learning literature has largely focused on equalizing decisions, outcomes, or error rates across race or gender groups. To illustrate, consider a hypothetical government rideshare program that provides transportation assistance to low-income people with upcoming court dates. Following this literature, one might allocate rides to those with the hig… ▽ More

    Submitted 12 February, 2024; v1 submitted 17 September, 2021; originally announced September 2021.

  27. arXiv:2108.08812  [pdf, ps, other

    cs.LG

    Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning

    Authors: Andrea Zanette, Martin J. Wainwright, Emma Brunskill

    Abstract: Actor-critic methods are widely used in offline reinforcement learning practice, but are not so well-understood theoretically. We propose a new offline actor-critic algorithm that naturally incorporates the pessimism principle, leading to several key advantages compared to the state of the art. The algorithm can operate when the Bellman evaluation operator is closed with respect to the action valu… ▽ More

    Submitted 19 August, 2021; originally announced August 2021.

    Comments: Initial submission; appeared as spotlight talk in ICML 2021 Workshop on Theory of RL

  28. arXiv:2108.07258  [pdf, other

    cs.LG cs.AI cs.CY

    On the Opportunities and Risks of Foundation Models

    Authors: Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh , et al. (89 additional authors not shown)

    Abstract: AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their cap… ▽ More

    Submitted 12 July, 2022; v1 submitted 16 August, 2021; originally announced August 2021.

    Comments: Authored by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI). Report page with citation guidelines: https://crfm.stanford.edu/report.html

  29. arXiv:2107.09912  [pdf, other

    cs.LG stat.ML

    Design of Experiments for Stochastic Contextual Linear Bandits

    Authors: Andrea Zanette, Kefan Dong, Jonathan Lee, Emma Brunskill

    Abstract: In the stochastic linear contextual bandit setting there exist several minimax procedures for exploration with policies that are reactive to the data being acquired. In practice, there can be a significant engineering overhead to deploy these algorithms, especially when the dataset is collected in a distributed fashion or when a human in the loop is needed to implement a different policy. Explorin… ▽ More

    Submitted 22 July, 2021; v1 submitted 21 July, 2021; originally announced July 2021.

    Comments: Initial submission

  30. arXiv:2104.12820  [pdf, other

    cs.LG

    Universal Off-Policy Evaluation

    Authors: Yash Chandak, Scott Niekum, Bruno Castro da Silva, Erik Learned-Miller, Emma Brunskill, Philip S. Thomas

    Abstract: When faced with sequential decision-making problems, it is often useful to be able to predict what would happen if decisions were made using a new policy. Those predictions must often be based on data collected under some previously used decision-making rule. Many previous methods enable such off-policy (or counterfactual) estimation of the expected value of a performance measure called the return… ▽ More

    Submitted 2 November, 2021; v1 submitted 26 April, 2021; originally announced April 2021.

    Comments: Accepted at Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021)

  31. arXiv:2011.09750  [pdf, ps, other

    cs.LG stat.ML

    Online Model Selection for Reinforcement Learning with Function Approximation

    Authors: Jonathan N. Lee, Aldo Pacchiano, Vidya Muthukumar, Weihao Kong, Emma Brunskill

    Abstract: Deep reinforcement learning has achieved impressive successes yet often requires a very large amount of interaction data. This result is perhaps unsurprising, as using complicated function approximation often requires more data to fit, and early theoretical results on linear Markov decision processes provide regret bounds that scale with the dimension of the linear approximation. Ideally, we would… ▽ More

    Submitted 19 November, 2020; originally announced November 2020.

  32. arXiv:2008.07737  [pdf, ps, other

    cs.LG stat.ML

    Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration

    Authors: Andrea Zanette, Alessandro Lazaric, Mykel J. Kochenderfer, Emma Brunskill

    Abstract: There has been growing progress on theoretical analyses for provably efficient learning in MDPs with linear function approximation, but much of the existing work has made strong assumptions to enable exploration by conventional exploration frameworks. Typically these assumptions are stronger than what is needed to find good solutions in the batch setting. In this work, we show how under a more sta… ▽ More

    Submitted 21 October, 2020; v1 submitted 18 August, 2020; originally announced August 2020.

    Comments: Minor update; appears in NeurIPS

  33. arXiv:2007.08202  [pdf, other

    cs.LG cs.AI stat.ML

    Provably Good Batch Reinforcement Learning Without Great Exploration

    Authors: Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

    Abstract: Batch reinforcement learning (RL) is important to apply RL algorithms to many high stakes tasks. Doing batch RL in a way that yields a reliable new policy in large domains is challenging: a new decision policy may visit states and actions outside the support of the batch data, and function approximation and optimization with limited samples can further increase the potential of learning policies w… ▽ More

    Submitted 22 July, 2020; v1 submitted 16 July, 2020; originally announced July 2020.

    Comments: 36 pages, 7 figures

  34. arXiv:2007.05896  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Abstract Models for Strategic Exploration and Fast Reward Transfer

    Authors: Evan Zheran Liu, Ramtin Keramati, Sudarshan Seshadri, Kelvin Guu, Panupong Pasupat, Emma Brunskill, Percy Liang

    Abstract: Model-based reinforcement learning (RL) is appealing because (i) it enables planning and thus more strategic exploration, and (ii) by decoupling dynamics from rewards, it enables fast transfer to new reward functions. However, learning an accurate Markov Decision Process (MDP) over high-dimensional states (e.g., raw pixels) is extremely challenging because it requires function approximation, which… ▽ More

    Submitted 11 July, 2020; originally announced July 2020.

  35. arXiv:2004.06230  [pdf, other

    cs.LG stat.ML

    Power Constrained Bandits

    Authors: Jiayu Yao, Emma Brunskill, Weiwei Pan, Susan Murphy, Finale Doshi-Velez

    Abstract: Contextual bandits often provide simple and effective personalization in decision making problems, making them popular tools to deliver personalized interventions in mobile health as well as other health applications. However, when bandits are deployed in the context of a scientific study -- e.g. a clinical trial to test if a mobile health intervention is effective -- the aim is not only to person… ▽ More

    Submitted 27 July, 2021; v1 submitted 13 April, 2020; originally announced April 2020.

    Comments: Accepted at MLHC 2021

  36. Value Driven Representation for Human-in-the-Loop Reinforcement Learning

    Authors: Ramtin Keramati, Emma Brunskill

    Abstract: Interactive adaptive systems powered by Reinforcement Learning (RL) have many potential applications, such as intelligent tutoring systems. In such systems there is typically an external human system designer that is creating, monitoring and modifying the interactive adaptive system, trying to improve its performance on the target outcomes. In this paper we focus on algorithmic foundation of how t… ▽ More

    Submitted 2 April, 2020; originally announced April 2020.

    Journal ref: UMAP 2019, 27th ACM Conference on User Modeling, Adaptation and Personalization

  37. arXiv:2003.05623  [pdf, other

    stat.ML cs.LG

    Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding

    Authors: Hongseok Namkoong, Ramtin Keramati, Steve Yadlowsky, Emma Brunskill

    Abstract: When observed decisions depend only on observed features, off-policy policy evaluation (OPE) methods for sequential decision making problems can estimate the performance of evaluation policies before deploying them. This assumption is frequently violated due to unobserved confounders, unrecorded variables that impact both the decisions and their outcomes. We assess robustness of OPE methods under… ▽ More

    Submitted 12 March, 2020; originally announced March 2020.

  38. arXiv:2003.00153  [pdf, ps, other

    cs.LG cs.AI

    Learning Near Optimal Policies with Low Inherent Bellman Error

    Authors: Andrea Zanette, Alessandro Lazaric, Mykel Kochenderfer, Emma Brunskill

    Abstract: We study the exploration problem with approximate linear action-value functions in episodic reinforcement learning under the notion of low inherent Bellman error, a condition normally employed to show convergence of approximate value iteration. First we relate this condition to other common frameworks and show that it is strictly more general than the low rank (or linear) MDP assumption of prior w… ▽ More

    Submitted 28 June, 2020; v1 submitted 28 February, 2020; originally announced March 2020.

    Comments: Bug fixes in appendix; appears in ICML 2020

  39. arXiv:2002.05651  [pdf, other

    cs.CY cs.LG

    Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning

    Authors: Peter Henderson, Jieru Hu, Joshua Romoff, Emma Brunskill, Dan Jurafsky, Joelle Pineau

    Abstract: Accurate reporting of energy and carbon usage is essential for understanding the potential climate impacts of machine learning research. We introduce a framework that makes this easier by providing a simple interface for tracking realtime energy consumption and carbon emissions, as well as generating standardized online appendices. Utilizing this framework, we create a leaderboard for energy effic… ▽ More

    Submitted 29 November, 2022; v1 submitted 31 January, 2020; originally announced February 2020.

    Comments: Published in JMLR: https://jmlr.org/papers/v21/20-312.html

  40. arXiv:2002.03478  [pdf, other

    cs.LG stat.ML

    Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions

    Authors: Omer Gottesman, Joseph Futoma, Yao Liu, Sonali Parbhoo, Leo Anthony Celi, Emma Brunskill, Finale Doshi-Velez

    Abstract: Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes in domains such as healthcare and education, but safe deployment in high stakes settings requires ways of assessing its validity. Traditional measures such as confidence intervals may be insufficient due to noise, limited data and confounding. In this paper we develop a method t… ▽ More

    Submitted 11 August, 2020; v1 submitted 9 February, 2020; originally announced February 2020.

    Comments: ICML final version

  41. arXiv:1912.06111  [pdf, other

    cs.LG stat.ML

    Sublinear Optimal Policy Value Estimation in Contextual Bandits

    Authors: Weihao Kong, Gregory Valiant, Emma Brunskill

    Abstract: We study the problem of estimating the expected reward of the optimal policy in the stochastic disjoint linear bandit setting. We prove that for certain settings it is possible to obtain an accurate estimate of the optimal policy value even with a number of samples that is sublinear in the number that would be required to \emph{find} a policy that realizes a value close to this optima. We establis… ▽ More

    Submitted 13 December, 2019; v1 submitted 12 December, 2019; originally announced December 2019.

    Comments: Extended to the mixture of Gaussians setting

  42. arXiv:1911.07084  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Missingness as Stability: Understanding the Structure of Missingness in Longitudinal EHR data and its Impact on Reinforcement Learning in Healthcare

    Authors: Scott L. Fleming, Kuhan Jeyapragasan, Tony Duan, Daisy Ding, Saurabh Gombar, Nigam Shah, Emma Brunskill

    Abstract: There is an emerging trend in the reinforcement learning for healthcare literature. In order to prepare longitudinal, irregularly sampled, clinical datasets for reinforcement learning algorithms, many researchers will resample the time series data to short, regular intervals and use last-observation-carried-forward (LOCF) imputation to fill in these gaps. Typically, they will not maintain any expl… ▽ More

    Submitted 16 November, 2019; originally announced November 2019.

    Comments: Machine Learning for Health (ML4H) at NeurIPS 2019 - Extended Abstract

  43. arXiv:1911.01546  [pdf, other

    cs.LG cs.AI

    Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy

    Authors: Ramtin Keramati, Christoph Dann, Alex Tamkin, Emma Brunskill

    Abstract: While maximizing expected return is the goal in most reinforcement learning approaches, risk-sensitive objectives such as conditional value at risk (CVaR) are more suitable for many high-stakes applications. However, relatively little is known about how to explore to quickly learn policies with good CVaR. In this paper, we present the first algorithm for sample-efficient learning of CVaR-optimal p… ▽ More

    Submitted 2 April, 2020; v1 submitted 4 November, 2019; originally announced November 2019.

    Journal ref: Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020)

  44. arXiv:1911.00954  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs

    Authors: Andrea Zanette, Emma Brunskill

    Abstract: In order to make good decision under uncertainty an agent must learn from observations. To do so, two of the most common frameworks are Contextual Bandits and Markov Decision Processes (MDPs). In this paper, we study whether there exist algorithms for the more general framework (MDP) which automatically provide the best performance bounds for the specific problem at hand without user intervention… ▽ More

    Submitted 3 November, 2019; originally announced November 2019.

    Journal ref: International Conference on Machine Learning, 2018

  45. arXiv:1911.00567  [pdf, ps, other

    cs.LG stat.ML

    Frequentist Regret Bounds for Randomized Least-Squares Value Iteration

    Authors: Andrea Zanette, David Brandfonbrener, Emma Brunskill, Matteo Pirotta, Alessandro Lazaric

    Abstract: We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning (RL). When the state space is large or continuous, traditional tabular approaches are unfeasible and some form of function approximation is mandatory. In this paper, we introduce an optimistically-initialized variant of the popular randomized least-squares value iteration (RLSVI), a model-free algorithm where… ▽ More

    Submitted 8 September, 2023; v1 submitted 1 November, 2019; originally announced November 2019.

    Comments: Minor bug fixes

  46. arXiv:1910.06508  [pdf, other

    cs.LG stat.ML

    Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling

    Authors: Yao Liu, Pierre-Luc Bacon, Emma Brunskill

    Abstract: Off-policy policy estimators that use importance sampling (IS) can suffer from high variance in long-horizon domains, and there has been particular excitement over new IS methods that leverage the structure of Markov decision processes. We analyze the variance of the most popular approaches through the viewpoint of conditional Monte Carlo. Surprisingly, we find that in finite horizon MDPs there is… ▽ More

    Submitted 5 June, 2020; v1 submitted 14 October, 2019; originally announced October 2019.

    Comments: Accepted by ICML 2020, 21 pages, 1 figure

  47. arXiv:1906.07805  [pdf, other

    cs.LG cs.AI stat.ML

    Directed Exploration for Reinforcement Learning

    Authors: Zhaohan Daniel Guo, Emma Brunskill

    Abstract: Efficient exploration is necessary to achieve good sample efficiency for reinforcement learning in general. From small, tabular settings such as gridworlds to large, continuous and sparse reward settings such as robotic object manipulation tasks, exploration through adding an uncertainty bonus to the reward function has been shown to be effective when the uncertainty is able to accurately drive ex… ▽ More

    Submitted 18 June, 2019; originally announced June 2019.

  48. arXiv:1905.09751  [pdf, other

    stat.ME stat.ML

    Learning When-to-Treat Policies

    Authors: Xinkun Nie, Emma Brunskill, Stefan Wager

    Abstract: Many applied decision-making problems have a dynamic component: The policymaker needs not only to choose whom to treat, but also when to start which treatment. For example, a medical doctor may choose between postponing treatment (watchful waiting) and prescribing one of several available treatments during the many visits from a patient. We develop an "advantage doubly robust" estimator for learni… ▽ More

    Submitted 30 April, 2020; v1 submitted 23 May, 2019; originally announced May 2019.

  49. arXiv:1905.05787  [pdf, ps, other

    cs.LG stat.ML

    Combining Parametric and Nonparametric Models for Off-Policy Evaluation

    Authors: Omer Gottesman, Yao Liu, Scott Sussex, Emma Brunskill, Finale Doshi-Velez

    Abstract: We consider a model-based approach to perform batch off-policy evaluation in reinforcement learning. Our method takes a mixture-of-experts approach to combine parametric and non-parametric models of the environment such that the final value estimate has the least expected error. We do so by first estimating the local accuracy of each model and then using a planner to select which model to use at e… ▽ More

    Submitted 15 May, 2019; v1 submitted 14 May, 2019; originally announced May 2019.

    Journal ref: PMLR 97:2366-2375, 2019

  50. arXiv:1904.09162  [pdf, other

    cs.LG cs.MA stat.ML

    PLOTS: Procedure Learning from Observations using Subtask Structure

    Authors: Tong Mu, Karan Goel, Emma Brunskill

    Abstract: In many cases an intelligent agent may want to learn how to mimic a single observed demonstrated trajectory. In this work we consider how to perform such procedural learning from observation, which could help to enable agents to better use the enormous set of video data on observation sequences. Our approach exploits the properties of this setting to incrementally build an open loop action plan th… ▽ More

    Submitted 17 April, 2019; originally announced April 2019.

    Comments: To appear in the proceedings of AAMAS 2019