Skip to main content

Showing 1–22 of 22 results for author: Perrault, A

.
  1. arXiv:2407.00087  [pdf, other

    cs.AI cs.CL cs.LG

    ARES: Alternating Reinforcement Learning and Supervised Fine-Tuning for Enhanced Multi-Modal Chain-of-Thought Reasoning Through Diverse AI Feedback

    Authors: Ju-Seung Byun, Jiyun Chun, Jihyung Kil, Andrew Perrault

    Abstract: Large Multimodal Models (LMMs) excel at comprehending human instructions and demonstrate remarkable results across a broad spectrum of tasks. Reinforcement Learning from Human Feedback (RLHF) and AI Feedback (RLAIF) further refine LLMs by aligning them with specific preferences. These methods primarily use ranking-based feedback for entire generations. With advanced AI models (Teacher), such as GP… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

  2. arXiv:2405.17618  [pdf, other

    cs.LG cs.AI

    Symmetric Reinforcement Learning Loss for Robust Learning on Diverse Tasks and Model Scales

    Authors: Ju-Seung Byun, Andrew Perrault

    Abstract: Reinforcement learning (RL) training is inherently unstable due to factors such as moving targets and high gradient variance. Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from AI Feedback (RLAIF) can introduce additional difficulty. Differing preferences can complicate the alignment process, and prediction errors in a trained reward model can become more severe as t… ▽ More

    Submitted 29 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  3. arXiv:2405.14632  [pdf, other

    cs.LG cs.AI

    Reinforcement Learning for Fine-tuning Text-to-speech Diffusion Models

    Authors: **gyi Chen, Ju-Seung Byun, Micha Elsner, Andrew Perrault

    Abstract: Recent advancements in generative models have sparked significant interest within the machine learning community. Particularly, diffusion models have demonstrated remarkable capabilities in synthesizing images and speech. Studies such as those by Lee et al. [19], Black et al. [4], Wang et al. [36], and Fan et al. [8] illustrate that Reinforcement Learning with Human Feedback (RLHF) can enhance dif… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  4. arXiv:2401.05710  [pdf, other

    cs.LG

    The Distributional Reward Critic Architecture for Perturbed-Reward Reinforcement Learning

    Authors: Xi Chen, Zhihui Zhu, Andrew Perrault

    Abstract: We study reinforcement learning in the presence of an unknown reward perturbation. Existing methodologies for this problem make strong assumptions including reward smoothness, known perturbations, and/or perturbations that do not modify the optimal policy. We study the case of unknown arbitrary perturbations that discretize and shuffle reward space, but have the property that the true reward belon… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  5. arXiv:2312.09078  [pdf, other

    cs.LG cs.NE

    Coevolutionary Algorithm for Building Robust Decision Trees under Minimax Regret

    Authors: Adam Żychowski, Andrew Perrault, Jacek Mańdziuk

    Abstract: In recent years, there has been growing interest in develo** robust machine learning (ML) models that can withstand adversarial attacks, including one of the most widely adopted, efficient, and interpretable ML algorithms-decision trees (DTs). This paper proposes a novel coevolutionary algorithm (CoEvoRDT) designed to create robust DTs capable of handling noisy high-dimensional data in adversari… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  6. arXiv:2307.08774  [pdf, other

    cs.AI

    Reflections from the Workshop on AI-Assisted Decision Making for Conservation

    Authors: Lily Xu, Esther Rolf, Sara Beery, Joseph R. Bennett, Tanya Berger-Wolf, Tanya Birch, Elizabeth Bondi-Kelly, Justin Brashares, Melissa Chapman, Anthony Corso, Andrew Davies, Nikhil Garg, Angela Gaylard, Robert Heilmayr, Hannah Kerner, Konstantin Klemmer, Vipin Kumar, Lester Mackey, Claire Monteleoni, Paul Moorcroft, Jonathan Palmer, Andrew Perrault, David Thau, Milind Tambe

    Abstract: In this white paper, we synthesize key points made during presentations and discussions from the AI-Assisted Decision Making for Conservation workshop, hosted by the Center for Research on Computation and Society at Harvard University on October 20-21, 2022. We identify key open research questions in resource allocation, planning, and interventions for biodiversity conservation, highlighting conse… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: Co-authored by participants from the October 2022 workshop: https://crcs.seas.harvard.edu/conservation-workshop

  7. arXiv:2305.16830  [pdf, other

    cs.LG cs.AI

    Leaving the Nest: Going Beyond Local Loss Functions for Predict-Then-Optimize

    Authors: Sanket Shah, Andrew Perrault, Bryan Wilder, Milind Tambe

    Abstract: Predict-then-Optimize is a framework for using machine learning to perform decision-making under uncertainty. The central research question it asks is, "How can the structure of a decision-making task be used to tailor ML models for that specific task?" To this end, recent work has proposed learning task-specific loss functions that capture this underlying structure. However, current approaches ma… ▽ More

    Submitted 18 February, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: 10 pages, 2 figures

  8. arXiv:2208.13125  [pdf, other

    cs.LG cs.AI

    Normality-Guided Distributional Reinforcement Learning for Continuous Control

    Authors: Ju-Seung Byun, Andrew Perrault

    Abstract: Learning a predictive model of the mean return, or value function, plays a critical role in many reinforcement learning algorithms. Distributional reinforcement learning (DRL) has been shown to improve performance by modeling the value distribution, not just the mean. We study the value distribution in several continuous control tasks and find that the learned value distribution is empirical quite… ▽ More

    Submitted 17 January, 2024; v1 submitted 27 August, 2022; originally announced August 2022.

  9. arXiv:2203.16067  [pdf, other

    cs.LG cs.AI

    Decision-Focused Learning without Differentiable Optimization: Learning Locally Optimized Decision Losses

    Authors: Sanket Shah, Kai Wang, Bryan Wilder, Andrew Perrault, Milind Tambe

    Abstract: Decision-Focused Learning (DFL) is a paradigm for tailoring a predictive model to a downstream optimization task that uses its predictions in order to perform better on that specific task. The main technical challenge associated with DFL is that it requires being able to differentiate through the optimization problem, which is difficult due to discontinuous solutions and other challenges. Past wor… ▽ More

    Submitted 8 November, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

    Comments: 16 pages, 5 figures, 3 tables

  10. arXiv:2110.04357  [pdf, other

    cs.LG cs.AI

    Training Transition Policies via Distribution Matching for Complex Tasks

    Authors: Ju-Seung Byun, Andrew Perrault

    Abstract: Humans decompose novel complex tasks into simpler ones to exploit previously learned skills. Analogously, hierarchical reinforcement learning seeks to leverage lower-level policies for simple tasks to solve complex ones. However, because each lower-level policy induces a different distribution of states, transitioning from one lower-level policy to another may fail due to an unexpected starting st… ▽ More

    Submitted 11 March, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

  11. arXiv:2106.08413  [pdf, other

    cs.LG cs.AI cs.MA

    Robust Reinforcement Learning Under Minimax Regret for Green Security

    Authors: Lily Xu, Andrew Perrault, Fei Fang, Haipeng Chen, Milind Tambe

    Abstract: Green security domains feature defenders who plan patrols in the face of uncertainty about the adversarial behavior of poachers, illegal loggers, and illegal fishers. Importantly, the deterrence effect of patrols on adversaries' future behavior makes patrol planning a sequential decision-making problem. Therefore, we focus on robust sequential patrol planning for green security following the minim… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

    Comments: Accepted at the Conference on Uncertainty in Artificial Intelligence (UAI) 2021. 11 pages, 5 figures

  12. arXiv:2106.03279  [pdf, other

    cs.LG

    Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning

    Authors: Kai Wang, Sanket Shah, Haipeng Chen, Andrew Perrault, Finale Doshi-Velez, Milind Tambe

    Abstract: In the predict-then-optimize framework, the objective is to train a predictive model, map** from environment features to parameters of an optimization problem, which maximizes decision quality when the optimization is subsequently solved. Recent work on decision-focused learning shows that embedding the optimization problem in the training pipeline can improve decision quality and help generaliz… ▽ More

    Submitted 16 July, 2022; v1 submitted 6 June, 2021; originally announced June 2021.

  13. arXiv:2106.03278  [pdf, other

    cs.GT

    Coordinating Followers to Reach Better Equilibria: End-to-End Gradient Descent for Stackelberg Games

    Authors: Kai Wang, Lily Xu, Andrew Perrault, Michael K. Reiter, Milind Tambe

    Abstract: A growing body of work in game theory extends the traditional Stackelberg game to settings with one leader and multiple followers who play a Nash equilibrium. Standard approaches for computing equilibria in these games reformulate the followers' best response as constraints in the leader's optimization problem. These reformulation approaches can sometimes be effective, but often get trapped in low… ▽ More

    Submitted 3 December, 2021; v1 submitted 6 June, 2021; originally announced June 2021.

  14. arXiv:2009.06560  [pdf, other

    cs.LG stat.ML

    Dual-Mandate Patrols: Multi-Armed Bandits for Green Security

    Authors: Lily Xu, Elizabeth Bondi, Fei Fang, Andrew Perrault, Kai Wang, Milind Tambe

    Abstract: Conservation efforts in green security domains to protect wildlife and forests are constrained by the limited availability of defenders (i.e., patrollers), who must patrol vast areas to protect from attackers (e.g., poachers or illegal loggers). Defenders must choose how much time to spend in each region of the protected area, balancing exploration of infrequently visited regions and exploitation… ▽ More

    Submitted 26 April, 2024; v1 submitted 14 September, 2020; originally announced September 2020.

    Comments: Published at AAAI 2021. 9 pages (paper and references), 3 page appendix. 6 figures and 1 table

  15. arXiv:2007.04432  [pdf, other

    cs.LG cs.AI stat.ML

    Collapsing Bandits and Their Application to Public Health Interventions

    Authors: Aditya Mate, Jackson A. Killian, Haifeng Xu, Andrew Perrault, Milind Tambe

    Abstract: We propose and study Collpasing Bandits, a new restless multi-armed bandit (RMAB) setting in which each arm follows a binary-state Markovian process with a special structure: when an arm is played, the state is fully observed, thus "collapsing" any uncertainty, but when an arm is passive, no observation is made, thus allowing uncertainty to evolve. The goal is to keep as many arms in the "good" st… ▽ More

    Submitted 4 July, 2020; originally announced July 2020.

  16. arXiv:2006.12411  [pdf, other

    cs.CY cs.GT

    Game Theory on the Ground: The Effect of Increased Patrols on Deterring Poachers

    Authors: Lily Xu, Andrew Perrault, Andrew Plumptre, Margaret Driciru, Fred Wanyama, Aggrey Rwetsiba, Milind Tambe

    Abstract: Applications of artificial intelligence for wildlife protection have focused on learning models of poacher behavior based on historical patterns. However, poachers' behaviors are described not only by their historical preferences, but also their reaction to ranger patrols. Past work applying machine learning and game theory to combat poaching have hypothesized that ranger patrols deter poachers, b… ▽ More

    Submitted 22 June, 2020; originally announced June 2020.

    Comments: 5 pages, 2 figures, 3 tables

  17. arXiv:2006.10815  [pdf, other

    cs.LG stat.ML

    Automatically Learning Compact Quality-aware Surrogates for Optimization Problems

    Authors: Kai Wang, Bryan Wilder, Andrew Perrault, Milind Tambe

    Abstract: Solving optimization problems with unknown parameters often requires learning a predictive model to predict the values of the unknown parameters and then solving the problem using these values. Recent work has shown that including the optimization problem as a layer in the model training pipeline results in predictions of the unobserved parameters that lead to higher decision quality. Unfortunatel… ▽ More

    Submitted 22 October, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

  18. arXiv:2001.00088  [pdf

    cs.CY cs.GT cs.LG

    AI for Social Impact: Learning and Planning in the Data-to-Deployment Pipeline

    Authors: Andrew Perrault, Fei Fang, Arunesh Sinha, Milind Tambe

    Abstract: With the maturing of AI and multiagent systems research, we have a tremendous opportunity to direct these advances towards addressing complex societal problems. In pursuit of this goal of AI for Social Impact, we as AI researchers must go beyond improvements in computational methodology; it is important to step out in the field to demonstrate social impact. To this end, we focus on the problems of… ▽ More

    Submitted 12 June, 2022; v1 submitted 16 December, 2019; originally announced January 2020.

    Comments: AI Magazine, Winter 2020

  19. arXiv:1911.08799  [pdf, other

    cs.GT cs.AI cs.LG

    Solving Online Threat Screening Games using Constrained Action Space Reinforcement Learning

    Authors: Sanket Shah, Arunesh Sinha, Pradeep Varakantham, Andrew Perrault, Milind Tambe

    Abstract: Large-scale screening for potential threats with limited resources and capacity for screening is a problem of interest at airports, seaports, and other ports of entry. Adversaries can observe screening procedures and arrive at a time when there will be gaps in screening due to limited resource capacities. To capture this game between ports and adversaries, this problem has been previously represen… ▽ More

    Submitted 20 November, 2019; originally announced November 2019.

    Comments: Accepted to the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20)

  20. arXiv:1903.06113  [pdf, other

    q-bio.QM cs.SI q-bio.PE

    Who and When to Screen: Multi-Round Active Screening for Recurrent Infectious Diseases Under Uncertainty

    Authors: Han-Ching Ou, Arunesh Sinha, Sze-Chuan Suen, Andrew Perrault, Milind Tambe

    Abstract: Controlling recurrent infectious diseases is a vital yet complicated problem. In this paper, we propose a novel active screening model (ACTS) and algorithms to facilitate active screening for recurrent diseases (no permanent immunity) under infection uncertainty. Our contributions are: (1) A new approach to modeling multi-round network-based screening/contact tracing under uncertainty, which is a… ▽ More

    Submitted 13 March, 2019; originally announced March 2019.

    Comments: 11 pages

  21. arXiv:1903.00958  [pdf, other

    cs.GT cs.LG

    End-to-End Game-Focused Learning of Adversary Behavior in Security Games

    Authors: Andrew Perrault, Bryan Wilder, Eric Ewing, Aditya Mate, Bistra Dilkina, Milind Tambe

    Abstract: Stackelberg security games are a critical tool for maximizing the utility of limited defense resources to protect important targets from an intelligent adversary. Motivated by green security, where the defender may only observe an adversary's response to defense on a limited set of targets, we study the problem of learning a defense that generalizes well to a new set of targets with novel feature… ▽ More

    Submitted 22 June, 2020; v1 submitted 3 March, 2019; originally announced March 2019.

    Comments: Appeared at AAAI 2020

  22. arXiv:1505.03463  [pdf, other

    cs.GT cs.AI

    Exploring Strategy-Proofness, Uniqueness, and Pareto Optimality for the Stable Matching Problem with Couples

    Authors: Andrew Perrault, Joanna Drummond, Fahiem Bacchus

    Abstract: The Stable Matching Problem with Couples (SMP-C) is a ubiquitous real-world extension of the stable matching problem (SMP) involving complementarities. Although SMP can be solved in polynomial time, SMP-C is NP-Complete. Hence, it is not clear which, if any, of the theoretical results surrounding the canonical SMP problem apply in this setting. In this paper, we use a recently-developed SAT encodi… ▽ More

    Submitted 13 May, 2015; originally announced May 2015.