Search | arXiv e-print repository

Transcendence: Generative Models Can Outperform The Experts That Train Them

Authors: Edwin Zhang, Vincent Zhu, Naomi Saphra, Anat Kleiman, Benjamin L. Edelman, Milind Tambe, Sham M. Kakade, Eran Malach

Abstract: Generative models are trained with the simple objective of imitating the conditional probability distribution induced by the data they are trained on. Therefore, when trained on data generated by humans, we may not expect the artificial model to outperform the humans on their original objectives. In this work, we study the phenomenon of transcendence: when a generative model achieves capabilities… ▽ More Generative models are trained with the simple objective of imitating the conditional probability distribution induced by the data they are trained on. Therefore, when trained on data generated by humans, we may not expect the artificial model to outperform the humans on their original objectives. In this work, we study the phenomenon of transcendence: when a generative model achieves capabilities that surpass the abilities of the experts generating its data. We demonstrate transcendence by training an autoregressive transformer to play chess from game transcripts, and show that the trained model can sometimes achieve better performance than all players in the dataset. We theoretically prove that transcendence can be enabled by low-temperature sampling, and rigorously assess this claim experimentally. Finally, we discuss other sources of transcendence, laying the groundwork for future investigation of this phenomenon in a broader setting. △ Less

Submitted 28 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: Code, models, and data at https://transcendence.eddie.win

arXiv:2403.17381 [pdf, other]

Application-Driven Innovation in Machine Learning

Authors: David Rolnick, Alan Aspuru-Guzik, Sara Beery, Bistra Dilkina, Priya L. Donti, Marzyeh Ghassemi, Hannah Kerner, Claire Monteleoni, Esther Rolf, Milind Tambe, Adam White

Abstract: As applications of machine learning proliferate, innovative algorithms inspired by specific real-world challenges have become increasingly important. Such work offers the potential for significant impact not merely in domains of application but also in machine learning itself. In this paper, we describe the paradigm of application-driven research in machine learning, contrasting it with the more s… ▽ More As applications of machine learning proliferate, innovative algorithms inspired by specific real-world challenges have become increasingly important. Such work offers the potential for significant impact not merely in domains of application but also in machine learning itself. In this paper, we describe the paradigm of application-driven research in machine learning, contrasting it with the more standard paradigm of methods-driven research. We illustrate the benefits of application-driven machine learning and how this approach can productively synergize with methods-driven work. Despite these benefits, we find that reviewing, hiring, and teaching practices in machine learning often hold back application-driven innovation. We outline how these processes may be improved. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: 12 pages, 3 figures

arXiv:2403.05683 [pdf, other]

Efficient Public Health Intervention Planning Using Decomposition-Based Decision-Focused Learning

Authors: Sanket Shah, Arun Suggala, Milind Tambe, Aparna Taneja

Abstract: The declining participation of beneficiaries over time is a key concern in public health programs. A popular strategy for improving retention is to have health workers `intervene' on beneficiaries at risk of drop** out. However, the availability and time of these health workers are limited resources. As a result, there has been a line of research on optimizing these limited intervention resource… ▽ More The declining participation of beneficiaries over time is a key concern in public health programs. A popular strategy for improving retention is to have health workers `intervene' on beneficiaries at risk of drop** out. However, the availability and time of these health workers are limited resources. As a result, there has been a line of research on optimizing these limited intervention resources using Restless Multi-Armed Bandits (RMABs). The key technical barrier to using this framework in practice lies in the need to estimate the beneficiaries' RMAB parameters from historical data. Recent research has shown that Decision-Focused Learning (DFL), which focuses on maximizing the beneficiaries' adherence rather than predictive accuracy, improves the performance of intervention targeting using RMABs. Unfortunately, these gains come at a high computational cost because of the need to solve and evaluate the RMAB in each DFL training step. In this paper, we provide a principled way to exploit the structure of RMABs to speed up intervention planning by cleverly decoupling the planning for different beneficiaries. We use real-world data from an Indian NGO, ARMMAN, to show that our approach is up to two orders of magnitude faster than the state-of-the-art approach while also yielding superior model performance. This would enable the NGO to scale up deployments using DFL to potentially millions of mothers, ultimately advancing progress toward UNSDG 3.1. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: 12 pages, 3 figures, 2 tables

arXiv:2402.14807 [pdf, other]

A Decision-Language Model (DLM) for Dynamic Restless Multi-Armed Bandit Tasks in Public Health

Authors: Nikhil Behari, Edwin Zhang, Yunfan Zhao, Aparna Taneja, Dheeraj Nagaraj, Milind Tambe

Abstract: Restless multi-armed bandits (RMAB) have demonstrated success in optimizing resource allocation for large beneficiary populations in public health settings. Unfortunately, RMAB models lack flexibility to adapt to evolving public health policy priorities. Concurrently, Large Language Models (LLMs) have emerged as adept automated planners across domains of robotic control and navigation. In this pap… ▽ More Restless multi-armed bandits (RMAB) have demonstrated success in optimizing resource allocation for large beneficiary populations in public health settings. Unfortunately, RMAB models lack flexibility to adapt to evolving public health policy priorities. Concurrently, Large Language Models (LLMs) have emerged as adept automated planners across domains of robotic control and navigation. In this paper, we propose a Decision Language Model (DLM) for RMABs, enabling dynamic fine-tuning of RMAB policies in public health settings using human-language commands. We propose using LLMs as automated planners to (1) interpret human policy preference prompts, (2) propose reward functions as code for a multi-agent RMAB environment, and (3) iterate on the generated reward functions using feedback from grounded RMAB simulations. We illustrate the application of DLM in collaboration with ARMMAN, an India-based non-profit promoting preventative care for pregnant mothers, that currently relies on RMAB policies to optimally allocate health worker calls to low-resource populations. We conduct a technology demonstration in simulation using the Gemini Pro model, showing DLM can dynamically shape policy outcomes using only human prompts as input. △ Less

Submitted 26 May, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.14090 [pdf, other]

Social Environment Design

Authors: Edwin Zhang, Sadie Zhao, Tonghan Wang, Safwan Hossain, Henry Gasztowtt, Stephan Zheng, David C. Parkes, Milind Tambe, Yiling Chen

Abstract: Artificial Intelligence (AI) holds promise as a technology that can be used to improve government and economic policy-making. This paper proposes a new research agenda towards this end by introducing Social Environment Design, a general framework for the use of AI for automated policy-making that connects with the Reinforcement Learning, EconCS, and Computational Social Choice communities. The fra… ▽ More Artificial Intelligence (AI) holds promise as a technology that can be used to improve government and economic policy-making. This paper proposes a new research agenda towards this end by introducing Social Environment Design, a general framework for the use of AI for automated policy-making that connects with the Reinforcement Learning, EconCS, and Computational Social Choice communities. The framework seeks to capture general economic environments, includes voting on policy objectives, and gives a direction for the systematic analysis of government and economic policy through AI simulation. We highlight key open problems for future research in AI-based policy-making. By solving these challenges, we hope to achieve various social welfare objectives, thereby promoting more ethical and responsible decision making. △ Less

Submitted 17 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: ICML 2024 Position Paper. Website at https://sed.eddie.win

arXiv:2402.11771 [pdf, other]

Evaluating the Effectiveness of Index-Based Treatment Allocation

Authors: Niclas Boehmer, Yash Nair, Sanket Shah, Lucas Janson, Aparna Taneja, Milind Tambe

Abstract: When resources are scarce, an allocation policy is needed to decide who receives a resource. This problem occurs, for instance, when allocating scarce medical resources and is often solved using modern ML methods. This paper introduces methods to evaluate index-based allocation policies -- that allocate a fixed number of resources to those who need them the most -- by using data from a randomized… ▽ More When resources are scarce, an allocation policy is needed to decide who receives a resource. This problem occurs, for instance, when allocating scarce medical resources and is often solved using modern ML methods. This paper introduces methods to evaluate index-based allocation policies -- that allocate a fixed number of resources to those who need them the most -- by using data from a randomized control trial. Such policies create dependencies between agents, which render the assumptions behind standard statistical tests invalid and limit the effectiveness of estimators. Addressing these challenges, we translate and extend recent ideas from the statistics literature to present an efficient estimator and methods for computing asymptotically correct confidence intervals. This enables us to effectively draw valid statistical conclusions, a critical gap in previous work. Our extensive experiments validate our methodology in practical settings, while also showcasing its statistical power. We conclude by proposing and empirically verifying extensions of our methodology that enable us to reevaluate a past randomized control trial to evaluate different ML allocation policies in the context of a mHealth program, drawing previously invisible conclusions. △ Less

Submitted 18 February, 2024; originally announced February 2024.

arXiv:2402.04933 [pdf, other]

A Bayesian Approach to Online Learning for Contextual Restless Bandits with Applications to Public Health

Authors: Biyonka Liang, Lily Xu, Aparna Taneja, Milind Tambe, Lucas Janson

Abstract: Public health programs often provide interventions to encourage beneficiary adherence,and effectively allocating interventions is vital for producing the greatest overall health outcomes. Such resource allocation problems are often modeled as restless multi-armed bandits (RMABs) with unknown underlying transition dynamics, hence requiring online reinforcement learning (RL). We present Bayesian Lea… ▽ More Public health programs often provide interventions to encourage beneficiary adherence,and effectively allocating interventions is vital for producing the greatest overall health outcomes. Such resource allocation problems are often modeled as restless multi-armed bandits (RMABs) with unknown underlying transition dynamics, hence requiring online reinforcement learning (RL). We present Bayesian Learning for Contextual RMABs (BCoR), an online RL approach for RMABs that novelly combines techniques in Bayesian modeling with Thompson sampling to flexibly model the complex RMAB settings present in public health program adherence problems, such as context and non-stationarity. BCoR's key strength is the ability to leverage shared information within and between arms to learn the unknown RMAB transition dynamics quickly in intervention-scarce settings with relatively short time horizons, which is common in public health applications. Empirically, BCoR achieves substantially higher finite-sample performance over a range of experimental settings, including an example based on real-world adherence data that was developed in collaboration with ARMMAN, an NGO in India which runs a large-scale maternal health program, showcasing BCoR practical utility and potential for real-world deployment. △ Less

Submitted 27 May, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

Comments: 26 pages, 18 figures

arXiv:2312.09983 [pdf, other]

Toward Computationally Efficient Inverse Reinforcement Learning via Reward Sha**

Authors: Lauren H. Cooke, Harvey Klyne, Edwin Zhang, Cassidy Laidlaw, Milind Tambe, Finale Doshi-Velez

Abstract: Inverse reinforcement learning (IRL) is computationally challenging, with common approaches requiring the solution of multiple reinforcement learning (RL) sub-problems. This work motivates the use of potential-based reward sha** to reduce the computational burden of each RL sub-problem. This work serves as a proof-of-concept and we hope will inspire future developments towards computationally ef… ▽ More Inverse reinforcement learning (IRL) is computationally challenging, with common approaches requiring the solution of multiple reinforcement learning (RL) sub-problems. This work motivates the use of potential-based reward sha** to reduce the computational burden of each RL sub-problem. This work serves as a proof-of-concept and we hope will inspire future developments towards computationally efficient IRL. △ Less

Submitted 18 December, 2023; v1 submitted 15 December, 2023; originally announced December 2023.

arXiv:2311.07139 [pdf, other]

Analyzing and Predicting Low-Listenership Trends in a Large-Scale Mobile Health Program: A Preliminary Investigation

Authors: Arshika Lalan, Shresth Verma, Kumar Madhu Sudan, Amrita Mahale, Aparna Hegde, Milind Tambe, Aparna Taneja

Abstract: Mobile health programs are becoming an increasingly popular medium for dissemination of health information among beneficiaries in less privileged communities. Kilkari is one of the world's largest mobile health programs which delivers time sensitive audio-messages to pregnant women and new mothers. We have been collaborating with ARMMAN, a non-profit in India which operates the Kilkari program, to… ▽ More Mobile health programs are becoming an increasingly popular medium for dissemination of health information among beneficiaries in less privileged communities. Kilkari is one of the world's largest mobile health programs which delivers time sensitive audio-messages to pregnant women and new mothers. We have been collaborating with ARMMAN, a non-profit in India which operates the Kilkari program, to identify bottlenecks to improve the efficiency of the program. In particular, we provide an initial analysis of the trajectories of beneficiaries' interaction with the mHealth program and examine elements of the program that can be potentially enhanced to boost its success. We cluster the cohort into different buckets based on listenership so as to analyze listenership patterns for each group that could help boost program success. We also demonstrate preliminary results on using historical data in a time-series prediction to identify beneficiary dropouts and enable NGOs in devising timely interventions to strengthen beneficiary retention. △ Less

Submitted 13 November, 2023; originally announced November 2023.

Comments: Accepted to Data Science for Social Good Workshop, KDD 2023

arXiv:2310.14526 [pdf, other]

Towards a Pretrained Model for Restless Bandits via Multi-arm Generalization

Authors: Yunfan Zhao, Nikhil Behari, Edward Hughes, Edwin Zhang, Dheeraj Nagaraj, Karl Tuyls, Aparna Taneja, Milind Tambe

Abstract: Restless multi-arm bandits (RMABs), a class of resource allocation problems with broad application in areas such as healthcare, online advertising, and anti-poaching, have recently been studied from a multi-agent reinforcement learning perspective. Prior RMAB research suffers from several limitations, e.g., it fails to adequately address continuous states, and requires retraining from scratch when… ▽ More Restless multi-arm bandits (RMABs), a class of resource allocation problems with broad application in areas such as healthcare, online advertising, and anti-poaching, have recently been studied from a multi-agent reinforcement learning perspective. Prior RMAB research suffers from several limitations, e.g., it fails to adequately address continuous states, and requires retraining from scratch when arms opt-in and opt-out over time, a common challenge in many real world applications. We address these limitations by develo** a neural network-based pre-trained model (PreFeRMAB) that has general zero-shot ability on a wide range of previously unseen RMABs, and which can be fine-tuned on specific instances in a more sample-efficient way than retraining from scratch. Our model also accommodates general multi-action settings and discrete or continuous state spaces. To enable fast generalization, we learn a novel single policy network model that utilizes feature information and employs a training procedure in which arms opt-in and out over time. We derive a new update rule for a crucial $λ$-network with theoretical convergence guarantees and empirically demonstrate the advantages of our approach on several challenging, real-world inspired problems. △ Less

Submitted 29 January, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

arXiv:2308.09726 [pdf, other]

Equitable Restless Multi-Armed Bandits: A General Framework Inspired By Digital Health

Authors: Jackson A. Killian, Manish Jain, Yugang Jia, Jonathan Amar, Erich Huang, Milind Tambe

Abstract: Restless multi-armed bandits (RMABs) are a popular framework for algorithmic decision making in sequential settings with limited resources. RMABs are increasingly being used for sensitive decisions such as in public health, treatment scheduling, anti-poaching, and -- the motivation for this work -- digital health. For such high stakes settings, decisions must both improve outcomes and prevent disp… ▽ More Restless multi-armed bandits (RMABs) are a popular framework for algorithmic decision making in sequential settings with limited resources. RMABs are increasingly being used for sensitive decisions such as in public health, treatment scheduling, anti-poaching, and -- the motivation for this work -- digital health. For such high stakes settings, decisions must both improve outcomes and prevent disparities between groups (e.g., ensure health equity). We study equitable objectives for RMABs (ERMABs) for the first time. We consider two equity-aligned objectives from the fairness literature, minimax reward and max Nash welfare. We develop efficient algorithms for solving each -- a water filling algorithm for the former, and a greedy algorithm with theoretically motivated nuance to balance disparate group sizes for the latter. Finally, we demonstrate across three simulation domains, including a new digital health model, that our approaches can be multiple times more equitable than the current state of the art without drastic sacrifices to utility. Our findings underscore our work's urgency as RMABs permeate into systems that impact human and wildlife outcomes. Code is available at https://github.com/google-research/socialgood/tree/equitable-rmab △ Less

Submitted 17 August, 2023; originally announced August 2023.

Comments: 16 pages, 8 figures, 2 tables

arXiv:2307.08774 [pdf, other]

Reflections from the Workshop on AI-Assisted Decision Making for Conservation

Authors: Lily Xu, Esther Rolf, Sara Beery, Joseph R. Bennett, Tanya Berger-Wolf, Tanya Birch, Elizabeth Bondi-Kelly, Justin Brashares, Melissa Chapman, Anthony Corso, Andrew Davies, Nikhil Garg, Angela Gaylard, Robert Heilmayr, Hannah Kerner, Konstantin Klemmer, Vipin Kumar, Lester Mackey, Claire Monteleoni, Paul Moorcroft, Jonathan Palmer, Andrew Perrault, David Thau, Milind Tambe

Abstract: In this white paper, we synthesize key points made during presentations and discussions from the AI-Assisted Decision Making for Conservation workshop, hosted by the Center for Research on Computation and Society at Harvard University on October 20-21, 2022. We identify key open research questions in resource allocation, planning, and interventions for biodiversity conservation, highlighting conse… ▽ More In this white paper, we synthesize key points made during presentations and discussions from the AI-Assisted Decision Making for Conservation workshop, hosted by the Center for Research on Computation and Society at Harvard University on October 20-21, 2022. We identify key open research questions in resource allocation, planning, and interventions for biodiversity conservation, highlighting conservation challenges that not only require AI solutions, but also require novel methodological advances. In addition to providing a summary of the workshop talks and discussions, we hope this document serves as a call-to-action to orient the expansion of algorithmic decision-making approaches to prioritize real-world conservation challenges, through collaborative efforts of ecologists, conservation decision-makers, and AI researchers. △ Less

Submitted 17 July, 2023; originally announced July 2023.

Comments: Co-authored by participants from the October 2022 workshop: https://crcs.seas.harvard.edu/conservation-workshop

arXiv:2305.16830 [pdf, other]

Leaving the Nest: Going Beyond Local Loss Functions for Predict-Then-Optimize

Authors: Sanket Shah, Andrew Perrault, Bryan Wilder, Milind Tambe

Abstract: Predict-then-Optimize is a framework for using machine learning to perform decision-making under uncertainty. The central research question it asks is, "How can the structure of a decision-making task be used to tailor ML models for that specific task?" To this end, recent work has proposed learning task-specific loss functions that capture this underlying structure. However, current approaches ma… ▽ More Predict-then-Optimize is a framework for using machine learning to perform decision-making under uncertainty. The central research question it asks is, "How can the structure of a decision-making task be used to tailor ML models for that specific task?" To this end, recent work has proposed learning task-specific loss functions that capture this underlying structure. However, current approaches make restrictive assumptions about the form of these losses and their impact on ML model behavior. These assumptions both lead to approaches with high computational cost, and when they are violated in practice, poor performance. In this paper, we propose solutions to these issues, avoiding the aforementioned assumptions and utilizing the ML model's features to increase the sample efficiency of learning loss functions. We empirically show that our method achieves state-of-the-art results in four domains from the literature, often requiring an order of magnitude fewer samples than comparable methods from past work. Moreover, our approach outperforms the best existing method by nearly 200% when the localness assumption is broken. △ Less

Submitted 18 February, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

Comments: 10 pages, 2 figures

arXiv:2305.12640 [pdf, other]

Limited Resource Allocation in a Non-Markovian World: The Case of Maternal and Child Healthcare

Authors: Panayiotis Danassis, Shresth Verma, Jackson A. Killian, Aparna Taneja, Milind Tambe

Abstract: The success of many healthcare programs depends on participants' adherence. We consider the problem of scheduling interventions in low resource settings (e.g., placing timely support calls from health workers) to increase adherence and/or engagement. Past works have successfully developed several classes of Restless Multi-armed Bandit (RMAB) based solutions for this problem. Nevertheless, all past… ▽ More The success of many healthcare programs depends on participants' adherence. We consider the problem of scheduling interventions in low resource settings (e.g., placing timely support calls from health workers) to increase adherence and/or engagement. Past works have successfully developed several classes of Restless Multi-armed Bandit (RMAB) based solutions for this problem. Nevertheless, all past RMAB approaches assume that the participants' behaviour follows the Markov property. We demonstrate significant deviations from the Markov assumption on real-world data on a maternal health awareness program from our partner NGO, ARMMAN. Moreover, we extend RMABs to continuous state spaces, a previously understudied area. To tackle the generalised non-Markovian RMAB setting we (i) model each participant's trajectory as a time-series, (ii) leverage the power of time-series forecasting models to learn complex patterns and dynamics to predict future states, and (iii) propose the Time-series Arm Ranking Index (TARI) policy, a novel algorithm that selects the RMAB arms that will benefit the most from an intervention, given our future state predictions. We evaluate our approach on both synthetic data, and a secondary analysis on real data from ARMMAN, and demonstrate significant increase in engagement compared to the SOTA, deployed Whittle index solution. This translates to 16.3 hours of additional content listened, 90.8% more engagement drops prevented, and reaching more than twice as many high dropout-risk beneficiaries. △ Less

Submitted 21 May, 2023; originally announced May 2023.

Comments: Proceedings of the 32nd International Joint Conference on Artificial Intelligence (IJCAI 2023)

arXiv:2303.00799 [pdf, other]

Fairness for Workers Who Pull the Arms: An Index Based Policy for Allocation of Restless Bandit Tasks

Authors: Arpita Biswas, Jackson A. Killian, Paula Rodriguez Diaz, Susobhan Ghosh, Milind Tambe

Abstract: Motivated by applications such as machine repair, project monitoring, and anti-poaching patrol scheduling, we study intervention planning of stochastic processes under resource constraints. This planning problem has previously been modeled as restless multi-armed bandits (RMAB), where each arm is an intervention-dependent Markov Decision Process. However, the existing literature assumes all interv… ▽ More Motivated by applications such as machine repair, project monitoring, and anti-poaching patrol scheduling, we study intervention planning of stochastic processes under resource constraints. This planning problem has previously been modeled as restless multi-armed bandits (RMAB), where each arm is an intervention-dependent Markov Decision Process. However, the existing literature assumes all intervention resources belong to a single uniform pool, limiting their applicability to real-world settings where interventions are carried out by a set of workers, each with their own costs, budgets, and intervention effects. In this work, we consider a novel RMAB setting, called multi-worker restless bandits (MWRMAB) with heterogeneous workers. The goal is to plan an intervention schedule that maximizes the expected reward while satisfying budget constraints on each worker as well as fairness in terms of the load assigned to each worker. Our contributions are two-fold: (1) we provide a multi-worker extension of the Whittle index to tackle heterogeneous costs and per-worker budget and (2) we develop an index-based scheduling policy to achieve fairness. Further, we evaluate our method on various cost structures and show that our method significantly outperforms other baselines in terms of fairness without sacrificing much in reward accumulated. △ Less

Submitted 1 March, 2023; originally announced March 2023.

Comments: 22nd International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2023), 10 pages

arXiv:2302.02570 [pdf, other]

Improved Policy Evaluation for Randomized Trials of Algorithmic Resource Allocation

Authors: Aditya Mate, Bryan Wilder, Aparna Taneja, Milind Tambe

Abstract: We consider the task of evaluating policies of algorithmic resource allocation through randomized controlled trials (RCTs). Such policies are tasked with optimizing the utilization of limited intervention resources, with the goal of maximizing the benefits derived. Evaluation of such allocation policies through RCTs proves difficult, notwithstanding the scale of the trial, because the individuals'… ▽ More We consider the task of evaluating policies of algorithmic resource allocation through randomized controlled trials (RCTs). Such policies are tasked with optimizing the utilization of limited intervention resources, with the goal of maximizing the benefits derived. Evaluation of such allocation policies through RCTs proves difficult, notwithstanding the scale of the trial, because the individuals' outcomes are inextricably interlinked through resource constraints controlling the policy decisions. Our key contribution is to present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT. We identify conditions under which such reassignments are permissible and can be leveraged to construct counterfactual trials, whose outcomes can be accurately ascertained, for free. We prove theoretically that such an estimator is more accurate than common estimators based on sample means -- we show that it returns an unbiased estimate and simultaneously reduces variance. We demonstrate the value of our approach through empirical experiments on synthetic, semi-synthetic as well as real case study data and show improved estimation accuracy across the board. △ Less

Submitted 6 February, 2023; originally announced February 2023.

arXiv:2301.07835 [pdf, other]

Decision-Focused Evaluation: Analyzing Performance of Deployed Restless Multi-Arm Bandits

Authors: Paritosh Verma, Shresth Verma, Aditya Mate, Aparna Taneja, Milind Tambe

Abstract: Restless multi-arm bandits (RMABs) is a popular decision-theoretic framework that has been used to model real-world sequential decision making problems in public health, wildlife conservation, communication systems, and beyond. Deployed RMAB systems typically operate in two stages: the first predicts the unknown parameters defining the RMAB instance, and the second employs an optimization algorith… ▽ More Restless multi-arm bandits (RMABs) is a popular decision-theoretic framework that has been used to model real-world sequential decision making problems in public health, wildlife conservation, communication systems, and beyond. Deployed RMAB systems typically operate in two stages: the first predicts the unknown parameters defining the RMAB instance, and the second employs an optimization algorithm to solve the constructed RMAB instance. In this work we provide and analyze the results from a first-of-its-kind deployment of an RMAB system in public health domain, aimed at improving maternal and child health. Our analysis is focused towards understanding the relationship between prediction accuracy and overall performance of deployed RMAB systems. This is crucial for determining the value of investing in improving predictive accuracy towards improving the final system performance, and is useful for diagnosing, monitoring deployed RMAB systems. Using real-world data from our deployed RMAB system, we demonstrate that an improvement in overall prediction accuracy may even be accompanied by a degradation in the performance of RMAB system -- a broad investment of resources to improve overall prediction accuracy may not yield expected results. Following this, we develop decision-focused evaluation metrics to evaluate the predictive component and show that it is better at explaining (both empirically and theoretically) the overall performance of a deployed RMAB system. △ Less

Submitted 18 January, 2023; originally announced January 2023.

Comments: 11 pages, 3 figures, AI for Social Good Workshop (AAAI'23)

arXiv:2211.06318 [pdf]

Artificial Intelligence and Life in 2030: The One Hundred Year Study on Artificial Intelligence

Authors: Peter Stone, Rodney Brooks, Erik Brynjolfsson, Ryan Calo, Oren Etzioni, Greg Hager, Julia Hirschberg, Shivaram Kalyanakrishnan, Ece Kamar, Sarit Kraus, Kevin Leyton-Brown, David Parkes, William Press, AnnaLee Saxenian, Julie Shah, Milind Tambe, Astro Teller

Abstract: In September 2016, Stanford's "One Hundred Year Study on Artificial Intelligence" project (AI100) issued the first report of its planned long-term periodic assessment of artificial intelligence (AI) and its impact on society. It was written by a panel of 17 study authors, each of whom is deeply rooted in AI research, chaired by Peter Stone of the University of Texas at Austin. The report, entitled… ▽ More In September 2016, Stanford's "One Hundred Year Study on Artificial Intelligence" project (AI100) issued the first report of its planned long-term periodic assessment of artificial intelligence (AI) and its impact on society. It was written by a panel of 17 study authors, each of whom is deeply rooted in AI research, chaired by Peter Stone of the University of Texas at Austin. The report, entitled "Artificial Intelligence and Life in 2030," examines eight domains of typical urban settings on which AI is likely to have impact over the coming years: transportation, home and service robots, healthcare, education, public safety and security, low-resource communities, employment and workplace, and entertainment. It aims to provide the general public with a scientifically and technologically accurate portrayal of the current state of AI and its potential and to help guide decisions in industry and governments, as well as to inform research and development in the field. The charge for this report was given to the panel by the AI100 Standing Committee, chaired by Barbara Grosz of Harvard University. △ Less

Submitted 31 October, 2022; originally announced November 2022.

Comments: 52 pages, https://ai100.stanford.edu/2016-report

arXiv:2211.00112 [pdf, other]

Indexability is Not Enough for Whittle: Improved, Near-Optimal Algorithms for Restless Bandits

Authors: Abheek Ghosh, Dheeraj Nagaraj, Manish Jain, Milind Tambe

Abstract: We study the problem of planning restless multi-armed bandits (RMABs) with multiple actions. This is a popular model for multi-agent systems with applications like multi-channel communication, monitoring and machine maintenance tasks, and healthcare. Whittle index policies, which are based on Lagrangian relaxations, are widely used in these settings due to their simplicity and near-optimality unde… ▽ More We study the problem of planning restless multi-armed bandits (RMABs) with multiple actions. This is a popular model for multi-agent systems with applications like multi-channel communication, monitoring and machine maintenance tasks, and healthcare. Whittle index policies, which are based on Lagrangian relaxations, are widely used in these settings due to their simplicity and near-optimality under certain conditions. In this work, we first show that Whittle index policies can fail in simple and practically relevant RMAB settings, even when the RMABs are indexable. We discuss why the optimality guarantees fail and why asymptotic optimality may not translate well to practically relevant planning horizons. We then propose an alternate planning algorithm based on the mean-field method, which can provably and efficiently obtain near-optimal policies with a large number of arms, without the stringent structural assumptions required by the Whittle index policies. This borrows ideas from existing research with some improvements: our approach is hyper-parameter free, and we provide an improved non-asymptotic analysis which has: (a) no requirement for exogenous hyper-parameters and tighter polynomial dependence on known problem parameters; (b) high probability bounds which show that the reward of the policy is reliable; and (c) matching sub-optimality lower bounds for this algorithm with respect to the number of arms, thus demonstrating the tightness of our bounds. Our extensive experimental analysis shows that the mean-field approach matches or outperforms other baselines. △ Less

Submitted 28 February, 2023; v1 submitted 31 October, 2022; originally announced November 2022.

Comments: 21 pages; AAMAS'23 version with appendix

arXiv:2210.00025 [pdf, other]

Artificial Replay: A Meta-Algorithm for Harnessing Historical Data in Bandits

Authors: Siddhartha Banerjee, Sean R. Sinclair, Milind Tambe, Lily Xu, Christina Lee Yu

Abstract: How best to incorporate historical data to "warm start" bandit algorithms is an open question: naively initializing reward estimates using all historical samples can suffer from spurious data and imbalanced data coverage, leading to computational and storage issues $\unicode{x2014}$ particularly salient in continuous action spaces. We propose Artificial Replay, a meta-algorithm for incorporating h… ▽ More How best to incorporate historical data to "warm start" bandit algorithms is an open question: naively initializing reward estimates using all historical samples can suffer from spurious data and imbalanced data coverage, leading to computational and storage issues $\unicode{x2014}$ particularly salient in continuous action spaces. We propose Artificial Replay, a meta-algorithm for incorporating historical data into any arbitrary base bandit algorithm. Artificial Replay uses only a fraction of the historical data compared to a full warm-start approach, while still achieving identical regret for base algorithms that satisfy independence of irrelevant data (IIData), a novel and broadly applicable property that we introduce. We complement these theoretical results with experiments on $K$-armed and continuous combinatorial bandit algorithms, including a green security domain using real poaching data. We show the practical benefits of Artificial Replay, including for base algorithms that do not satisfy IIData. △ Less

Submitted 26 January, 2023; v1 submitted 30 September, 2022; originally announced October 2022.

Comments: 36 pages (14 pages main paper), 9 figures

arXiv:2205.15372 [pdf, ps, other]

Optimistic Whittle Index Policy: Online Learning for Restless Bandits

Authors: Kai Wang, Lily Xu, Aparna Taneja, Milind Tambe

Abstract: Restless multi-armed bandits (RMABs) extend multi-armed bandits to allow for stateful arms, where the state of each arm evolves restlessly with different transitions depending on whether that arm is pulled. Solving RMABs requires information on transition dynamics, which are often unknown upfront. To plan in RMAB settings with unknown transitions, we propose the first online learning algorithm bas… ▽ More Restless multi-armed bandits (RMABs) extend multi-armed bandits to allow for stateful arms, where the state of each arm evolves restlessly with different transitions depending on whether that arm is pulled. Solving RMABs requires information on transition dynamics, which are often unknown upfront. To plan in RMAB settings with unknown transitions, we propose the first online learning algorithm based on the Whittle index policy, using an upper confidence bound (UCB) approach to learn transition dynamics. Specifically, we estimate confidence bounds of the transition probabilities and formulate a bilinear program to compute optimistic Whittle indices using these estimates. Our algorithm, UCWhittle, achieves sublinear $O(H \sqrt{T \log T})$ frequentist regret to solve RMABs with unknown transitions in $T$ episodes with a constant horizon $H$. Empirically, we demonstrate that UCWhittle leverages the structure of RMABs and the Whittle index policy solution to achieve better performance than existing online learning baselines across three domains, including one constructed from a real-world maternal and childcare dataset. △ Less

Submitted 8 March, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

Comments: Accepted at AAAI 2023. 7 page paper, 2 page references, 9 page appendix. Code available. Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI 2023)

arXiv:2205.05659 [pdf, other]

Ranked Prioritization of Groups in Combinatorial Bandit Allocation

Authors: Lily Xu, Arpita Biswas, Fei Fang, Milind Tambe

Abstract: Preventing poaching through ranger patrols protects endangered wildlife, directly contributing to the UN Sustainable Development Goal 15 of life on land. Combinatorial bandits have been used to allocate limited patrol resources, but existing approaches overlook the fact that each location is home to multiple species in varying proportions, so a patrol benefits each species to differing degrees. Wh… ▽ More Preventing poaching through ranger patrols protects endangered wildlife, directly contributing to the UN Sustainable Development Goal 15 of life on land. Combinatorial bandits have been used to allocate limited patrol resources, but existing approaches overlook the fact that each location is home to multiple species in varying proportions, so a patrol benefits each species to differing degrees. When some species are more vulnerable, we ought to offer more protection to these animals; unfortunately, existing combinatorial bandit approaches do not offer a way to prioritize important species. To bridge this gap, (1) We propose a novel combinatorial bandit objective that trades off between reward maximization and also accounts for prioritization over species, which we call ranked prioritization. We show this objective can be expressed as a weighted linear sum of Lipschitz-continuous reward functions. (2) We provide RankedCUCB, an algorithm to select combinatorial actions that optimize our prioritization-based objective, and prove that it achieves asymptotic no-regret. (3) We demonstrate empirically that RankedCUCB leads to up to 38% improvement in outcomes for endangered species using real-world wildlife conservation data. Along with adapting to other challenges such as preventing illegal logging and overfishing, our no-regret algorithm addresses the general combinatorial bandit problem with a weighted linear objective. △ Less

Submitted 11 May, 2022; originally announced May 2022.

Comments: Accepted at IJCAI 2022, AI for Good track. 7 pages + 2 pages appendix. Code is available at https://github.com/lily-x/rankedCUCB

arXiv:2204.14173 [pdf, other]

doi 10.24963/ijcai.2022/88

Evolutionary Approach to Security Games with Signaling

Authors: Adam Żychowski, Jacek Mańdziuk, Elizabeth Bondi, Aravind Venugopal, Milind Tambe, Balaraman Ravindran

Abstract: Green Security Games have become a popular way to model scenarios involving the protection of natural resources, such as wildlife. Sensors (e.g. drones equipped with cameras) have also begun to play a role in these scenarios by providing real-time information. Incorporating both human and sensor defender resources strategically is the subject of recent work on Security Games with Signaling (SGS).… ▽ More Green Security Games have become a popular way to model scenarios involving the protection of natural resources, such as wildlife. Sensors (e.g. drones equipped with cameras) have also begun to play a role in these scenarios by providing real-time information. Incorporating both human and sensor defender resources strategically is the subject of recent work on Security Games with Signaling (SGS). However, current methods to solve SGS do not scale well in terms of time or memory. We therefore propose a novel approach to SGS, which, for the first time in this domain, employs an Evolutionary Computation paradigm: EASGS. EASGS effectively searches the huge SGS solution space via suitable solution encoding in a chromosome and a specially-designed set of operators. The operators include three types of mutations, each focusing on a particular aspect of the SGS solution, optimized crossover and a local coverage improvement scheme (a memetic aspect of EASGS). We also introduce a new set of benchmark games, based on dense or locally-dense graphs that reflect real-world SGS settings. In the majority of 342 test game instances, EASGS outperforms state-of-the-art methods, including a reinforcement learning method, in terms of time scalability, nearly constant memory utilization, and quality of the returned defender's strategies (expected payoffs). △ Less

Submitted 29 April, 2022; originally announced April 2022.

Journal ref: Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, 620-627

arXiv:2204.13663 [pdf, other]

ADVISER: AI-Driven Vaccination Intervention Optimiser for Increasing Vaccine Uptake in Nigeria

Authors: Vineet Nair, Kritika Prakash, Michael Wilbur, Aparna Taneja, Corinne Namblard, Oyindamola Adeyemo, Abhishek Dubey, Abiodun Adereni, Milind Tambe, Ayan Mukhopadhyay

Abstract: More than 5 million children under five years die from largely preventable or treatable medical conditions every year, with an overwhelmingly large proportion of deaths occurring in under-developed countries with low vaccination uptake. One of the United Nations' sustainable development goals (SDG 3) aims to end preventable deaths of newborns and children under five years of age. We focus on Niger… ▽ More More than 5 million children under five years die from largely preventable or treatable medical conditions every year, with an overwhelmingly large proportion of deaths occurring in under-developed countries with low vaccination uptake. One of the United Nations' sustainable development goals (SDG 3) aims to end preventable deaths of newborns and children under five years of age. We focus on Nigeria, where the rate of infant mortality is appalling. We collaborate with HelpMum, a large non-profit organization in Nigeria to design and optimize the allocation of heterogeneous health interventions under uncertainty to increase vaccination uptake, the first such collaboration in Nigeria. Our framework, ADVISER: AI-Driven Vaccination Intervention Optimiser, is based on an integer linear program that seeks to maximize the cumulative probability of successful vaccination. Our optimization formulation is intractable in practice. We present a heuristic approach that enables us to solve the problem for real-world use-cases. We also present theoretical bounds for the heuristic method. Finally, we show that the proposed approach outperforms baseline methods in terms of vaccination uptake through experimental evaluation. HelpMum is currently planning a pilot program based on our approach to be deployed in the largest city of Nigeria, which would be the first deployment of an AI-driven vaccination uptake program in the country and hopefully, pave the way for other data-driven programs to improve health outcomes in Nigeria. △ Less

Submitted 5 July, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

Comments: Accepted for publication at International Joint Conference on Artificial Intelligence 2022, AI for Good Track (IJCAI-22)

arXiv:2203.16067 [pdf, other]

Decision-Focused Learning without Differentiable Optimization: Learning Locally Optimized Decision Losses

Authors: Sanket Shah, Kai Wang, Bryan Wilder, Andrew Perrault, Milind Tambe

Abstract: Decision-Focused Learning (DFL) is a paradigm for tailoring a predictive model to a downstream optimization task that uses its predictions in order to perform better on that specific task. The main technical challenge associated with DFL is that it requires being able to differentiate through the optimization problem, which is difficult due to discontinuous solutions and other challenges. Past wor… ▽ More Decision-Focused Learning (DFL) is a paradigm for tailoring a predictive model to a downstream optimization task that uses its predictions in order to perform better on that specific task. The main technical challenge associated with DFL is that it requires being able to differentiate through the optimization problem, which is difficult due to discontinuous solutions and other challenges. Past work has largely gotten around this issue by handcrafting task-specific surrogates to the original optimization problem that provide informative gradients when differentiated through. However, the need to handcraft surrogates for each new task limits the usability of DFL. In addition, there are often no guarantees about the convexity of the resulting surrogates and, as a result, training a predictive model using them can lead to inferior local optima. In this paper, we do away with surrogates altogether and instead learn loss functions that capture task-specific information. To the best of our knowledge, ours is the first approach that entirely replaces the optimization component of decision-focused learning with a loss that is automatically learned. Our approach (a) only requires access to a black-box oracle that can solve the optimization problem and is thus generalizable, and (b) can be convex by construction and so can be easily optimized over. We evaluate our approach on three resource allocation problems from the literature and find that our approach outperforms learning without taking into account task structure in all three domains, and even hand-crafted surrogates from the literature. △ Less

Submitted 8 November, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

Comments: 16 pages, 5 figures, 3 tables

arXiv:2202.14010

Proceedings of the Artificial Intelligence for Cyber Security (AICS) Workshop at AAAI 2022

Authors: James Holt, Edward Raff, Ahmad Ridley, Dennis Ross, Arunesh Sinha, Diane Staheli, William Streilen, Milind Tambe, Yevgeniy Vorobeychik, Allan Wollaber

Abstract: The workshop will focus on the application of AI to problems in cyber security. Cyber systems generate large volumes of data, utilizing this effectively is beyond human capabilities. Additionally, adversaries continue to develop new attacks. Hence, AI methods are required to understand and protect the cyber domain. These challenges are widely studied in enterprise networks, but there are many gaps… ▽ More The workshop will focus on the application of AI to problems in cyber security. Cyber systems generate large volumes of data, utilizing this effectively is beyond human capabilities. Additionally, adversaries continue to develop new attacks. Hence, AI methods are required to understand and protect the cyber domain. These challenges are widely studied in enterprise networks, but there are many gaps in research and practice as well as novel problems in other domains. In general, AI techniques are still not widely adopted in the real world. Reasons include: (1) a lack of certification of AI for security, (2) a lack of formal study of the implications of practical constraints (e.g., power, memory, storage) for AI systems in the cyber domain, (3) known vulnerabilities such as evasion, poisoning attacks, (4) lack of meaningful explanations for security analysts, and (5) lack of analyst trust in AI solutions. There is a need for the research community to develop novel solutions for these practical issues. △ Less

Submitted 1 March, 2022; v1 submitted 28 February, 2022; originally announced February 2022.

arXiv:2202.00916 [pdf, other]

Scalable Decision-Focused Learning in Restless Multi-Armed Bandits with Application to Maternal and Child Health

Authors: Kai Wang, Shresth Verma, Aditya Mate, Sanket Shah, Aparna Taneja, Neha Madhiwalla, Aparna Hegde, Milind Tambe

Abstract: This paper studies restless multi-armed bandit (RMAB) problems with unknown arm transition dynamics but with known correlated arm features. The goal is to learn a model to predict transition dynamics given features, where the Whittle index policy solves the RMAB problems using predicted transitions. However, prior works often learn the model by maximizing the predictive accuracy instead of final R… ▽ More This paper studies restless multi-armed bandit (RMAB) problems with unknown arm transition dynamics but with known correlated arm features. The goal is to learn a model to predict transition dynamics given features, where the Whittle index policy solves the RMAB problems using predicted transitions. However, prior works often learn the model by maximizing the predictive accuracy instead of final RMAB solution quality, causing a mismatch between training and evaluation objectives. To address this shortcoming, we propose a novel approach for decision-focused learning in RMAB that directly trains the predictive model to maximize the Whittle index solution quality. We present three key contributions: (i) we establish differentiability of the Whittle index policy to support decision-focused learning; (ii) we significantly improve the scalability of decision-focused learning approaches in sequential problems, specifically RMAB problems; (iii) we apply our algorithm to a previously collected dataset of maternal and child health to demonstrate its performance. Indeed, our algorithm is the first for decision-focused learning in RMAB that scales to real-world problem sizes. △ Less

Submitted 13 August, 2023; v1 submitted 2 February, 2022; originally announced February 2022.

arXiv:2201.12408 [pdf, other]

Networked Restless Multi-Armed Bandits for Mobile Interventions

Authors: Han-Ching Ou, Christoph Siebenbrunner, Jackson Killian, Meredith B Brooks, David Kempe, Yevgeniy Vorobeychik, Milind Tambe

Abstract: Motivated by a broad class of mobile intervention problems, we propose and study restless multi-armed bandits (RMABs) with network effects. In our model, arms are partially recharging and connected through a graph, so that pulling one arm also improves the state of neighboring arms, significantly extending the previously studied setting of fully recharging bandits with no network effects. In mobil… ▽ More Motivated by a broad class of mobile intervention problems, we propose and study restless multi-armed bandits (RMABs) with network effects. In our model, arms are partially recharging and connected through a graph, so that pulling one arm also improves the state of neighboring arms, significantly extending the previously studied setting of fully recharging bandits with no network effects. In mobile interventions, network effects may arise due to regular population movements (such as commuting between home and work). We show that network effects in RMABs induce strong reward coupling that is not accounted for by existing solution methods. We propose a new solution approach for networked RMABs, exploiting concavity properties which arise under natural assumptions on the structure of intervention effects. We provide sufficient conditions for optimality of our approach in idealized settings and demonstrate that it empirically outperforms state-of-the art baselines in three mobile intervention domains using real-world graphs. △ Less

Submitted 28 January, 2022; originally announced January 2022.

arXiv:2109.10637 [pdf, other]

Facilitating human-wildlife cohabitation through conflict prediction

Authors: Susobhan Ghosh, Pradeep Varakantham, Aniket Bhatkhande, Tamanna Ahmad, Anish Andheria, Wenjun Li, Aparna Taneja, Divy Thakkar, Milind Tambe

Abstract: With increasing world population and expanded use of forests as cohabited regions, interactions and conflicts with wildlife are increasing, leading to large-scale loss of lives (animal and human) and livelihoods (economic). While community knowledge is valuable, forest officials and conservation organisations can greatly benefit from predictive analysis of human-wildlife conflict, leading to targe… ▽ More With increasing world population and expanded use of forests as cohabited regions, interactions and conflicts with wildlife are increasing, leading to large-scale loss of lives (animal and human) and livelihoods (economic). While community knowledge is valuable, forest officials and conservation organisations can greatly benefit from predictive analysis of human-wildlife conflict, leading to targeted interventions that can potentially help save lives and livelihoods. However, the problem of prediction is a complex socio-technical problem in the context of limited data in low-resource regions. Identifying the "right" features to make accurate predictions of conflicts at the required spatial granularity using a sparse conflict training dataset} is the key challenge that we address in this paper. Specifically, we do an illustrative case study on human-wildlife conflicts in the Bramhapuri Forest Division in Chandrapur, Maharashtra, India. Most existing work has considered human-wildlife conflicts in protected areas and to the best of our knowledge, this is the first effort at prediction of human-wildlife conflicts in unprotected areas and using those predictions for deploying interventions on the ground. △ Less

Submitted 22 September, 2021; originally announced September 2021.

Comments: 7 pages, 4 figures

arXiv:2109.08075 [pdf, other]

Field Study in Deploying Restless Multi-Armed Bandits: Assisting Non-Profits in Improving Maternal and Child Health

Authors: Aditya Mate, Lovish Madaan, Aparna Taneja, Neha Madhiwalla, Shresth Verma, Gargi Singh, Aparna Hegde, Pradeep Varakantham, Milind Tambe

Abstract: The widespread availability of cell phones has enabled non-profits to deliver critical health information to their beneficiaries in a timely manner. This paper describes our work to assist non-profits that employ automated messaging programs to deliver timely preventive care information to beneficiaries (new and expecting mothers) during pregnancy and after delivery. Unfortunately, a key challenge… ▽ More The widespread availability of cell phones has enabled non-profits to deliver critical health information to their beneficiaries in a timely manner. This paper describes our work to assist non-profits that employ automated messaging programs to deliver timely preventive care information to beneficiaries (new and expecting mothers) during pregnancy and after delivery. Unfortunately, a key challenge in such information delivery programs is that a significant fraction of beneficiaries drop out of the program. Yet, non-profits often have limited health-worker resources (time) to place crucial service calls for live interaction with beneficiaries to prevent such engagement drops. To assist non-profits in optimizing this limited resource, we developed a Restless Multi-Armed Bandits (RMABs) system. One key technical contribution in this system is a novel clustering method of offline historical data to infer unknown RMAB parameters. Our second major contribution is evaluation of our RMAB system in collaboration with an NGO, via a real-world service quality improvement study. The study compared strategies for optimizing service calls to 23003 participants over a period of 7 weeks to reduce engagement drops. We show that the RMAB group provides statistically significant improvement over other comparison groups, reducing ~ 30% engagement drops. To the best of our knowledge, this is the first study demonstrating the utility of RMABs in real world public health settings. We are transitioning our RMAB system to the NGO for real-world use. △ Less

Submitted 27 October, 2021; v1 submitted 16 September, 2021; originally announced September 2021.

arXiv:2107.03003 [pdf, other]

Harnessing Heterogeneity: Learning from Decomposed Feedback in Bayesian Modeling

Authors: Kai Wang, Bryan Wilder, Sze-chuan Suen, Bistra Dilkina, Milind Tambe

Abstract: There is significant interest in learning and optimizing a complex system composed of multiple sub-components, where these components may be agents or autonomous sensors. Among the rich literature on this topic, agent-based and domain-specific simulations can capture complex dynamics and subgroup interaction, but optimizing over such simulations can be computationally and algorithmically challengi… ▽ More There is significant interest in learning and optimizing a complex system composed of multiple sub-components, where these components may be agents or autonomous sensors. Among the rich literature on this topic, agent-based and domain-specific simulations can capture complex dynamics and subgroup interaction, but optimizing over such simulations can be computationally and algorithmically challenging. Bayesian approaches, such as Gaussian processes (GPs), can be used to learn a computationally tractable approximation to the underlying dynamics but typically neglect the detailed information about subgroups in the complicated system. We attempt to find the best of both worlds by proposing the idea of decomposed feedback, which captures group-based heterogeneity and dynamics. We introduce a novel decomposed GP regression to incorporate the subgroup decomposed feedback. Our modified regression has provably lower variance -- and thus a more accurate posterior -- compared to previous approaches; it also allows us to introduce a decomposed GP-UCB optimization algorithm that leverages subgroup feedback. The Bayesian nature of our method makes the optimization algorithm trackable with a theoretical guarantee on convergence and no-regret property. To demonstrate the wide applicability of this work, we execute our algorithm on two disparate social problems: infectious disease control in a heterogeneous population and allocation of distributed weather sensors. Experimental results show that our new method provides significant improvement compared to the state-of-the-art. △ Less

Submitted 6 July, 2021; originally announced July 2021.

arXiv:2107.01689 [pdf, other]

Restless and Uncertain: Robust Policies for Restless Bandits via Deep Multi-Agent Reinforcement Learning

Authors: Jackson A. Killian, Lily Xu, Arpita Biswas, Milind Tambe

Abstract: We introduce robustness in \textit{restless multi-armed bandits} (RMABs), a popular model for constrained resource allocation among independent stochastic processes (arms). Nearly all RMAB techniques assume stochastic dynamics are precisely known. However, in many real-world settings, dynamics are estimated with significant \emph{uncertainty}, e.g., via historical data, which can lead to bad outco… ▽ More We introduce robustness in \textit{restless multi-armed bandits} (RMABs), a popular model for constrained resource allocation among independent stochastic processes (arms). Nearly all RMAB techniques assume stochastic dynamics are precisely known. However, in many real-world settings, dynamics are estimated with significant \emph{uncertainty}, e.g., via historical data, which can lead to bad outcomes if ignored. To address this, we develop an algorithm to compute minimax regret -- robust policies for RMABs. Our approach uses a double oracle framework (oracles for \textit{agent} and \textit{nature}), which is often used for single-process robust planning but requires significant new techniques to accommodate the combinatorial nature of RMABs. Specifically, we design a deep reinforcement learning (RL) algorithm, DDLPO, which tackles the combinatorial challenge by learning an auxiliary "$λ$-network" in tandem with policy networks per arm, greatly reducing sample complexity, with guarantees on convergence. DDLPO, of general interest, implements our reward-maximizing agent oracle. We then tackle the challenging regret-maximizing nature oracle, a non-stationary RL challenge, by formulating it as a multi-agent RL problem between a policy optimizer and adversarial nature. This formulation is of general interest -- we solve it for RMABs by creating a multi-agent extension of DDLPO with a shared critic. We show our approaches work well in three experimental domains. △ Less

Submitted 21 June, 2022; v1 submitted 4 July, 2021; originally announced July 2021.

Comments: 16 pages, 4 figures

arXiv:2106.12024 [pdf, other]

doi 10.1145/3447548.3467370

Q-Learning Lagrange Policies for Multi-Action Restless Bandits

Authors: Jackson A. Killian, Arpita Biswas, Sanket Shah, Milind Tambe

Abstract: Multi-action restless multi-armed bandits (RMABs) are a powerful framework for constrained resource allocation in which $N$ independent processes are managed. However, previous work only study the offline setting where problem dynamics are known. We address this restrictive assumption, designing the first algorithms for learning good policies for Multi-action RMABs online using combinations of Lag… ▽ More Multi-action restless multi-armed bandits (RMABs) are a powerful framework for constrained resource allocation in which $N$ independent processes are managed. However, previous work only study the offline setting where problem dynamics are known. We address this restrictive assumption, designing the first algorithms for learning good policies for Multi-action RMABs online using combinations of Lagrangian relaxation and Q-learning. Our first approach, MAIQL, extends a method for Q-learning the Whittle index in binary-action RMABs to the multi-action setting. We derive a generalized update rule and convergence proof and establish that, under standard assumptions, MAIQL converges to the asymptotically optimal multi-action RMAB policy as $t\rightarrow{}\infty$. However, MAIQL relies on learning Q-functions and indexes on two timescales which leads to slow convergence and requires problem structure to perform well. Thus, we design a second algorithm, LPQL, which learns the well-performing and more general Lagrange policy for multi-action RMABs by learning to minimize the Lagrange bound through a variant of Q-learning. To ensure fast convergence, we take an approximation strategy that enables learning on a single timescale, then give a guarantee relating the approximation's precision to an upper bound of LPQL's return as $t\rightarrow{}\infty$. Finally, we show that our approaches always outperform baselines across multiple settings, including one derived from real-world medication adherence data. △ Less

Submitted 22 June, 2021; originally announced June 2021.

Comments: 13 pages, 6 figures, to be published in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data

arXiv:2106.08413 [pdf, other]

Robust Reinforcement Learning Under Minimax Regret for Green Security

Authors: Lily Xu, Andrew Perrault, Fei Fang, Haipeng Chen, Milind Tambe

Abstract: Green security domains feature defenders who plan patrols in the face of uncertainty about the adversarial behavior of poachers, illegal loggers, and illegal fishers. Importantly, the deterrence effect of patrols on adversaries' future behavior makes patrol planning a sequential decision-making problem. Therefore, we focus on robust sequential patrol planning for green security following the minim… ▽ More Green security domains feature defenders who plan patrols in the face of uncertainty about the adversarial behavior of poachers, illegal loggers, and illegal fishers. Importantly, the deterrence effect of patrols on adversaries' future behavior makes patrol planning a sequential decision-making problem. Therefore, we focus on robust sequential patrol planning for green security following the minimax regret criterion, which has not been considered in the literature. We formulate the problem as a game between the defender and nature who controls the parameter values of the adversarial behavior and design an algorithm MIRROR to find a robust policy. MIRROR uses two reinforcement learning-based oracles and solves a restricted game considering limited defender strategies and parameter values. We evaluate MIRROR on real-world poaching data. △ Less

Submitted 15 June, 2021; originally announced June 2021.

Comments: Accepted at the Conference on Uncertainty in Artificial Intelligence (UAI) 2021. 11 pages, 5 figures

arXiv:2106.07039 [pdf, other]

Contingency-Aware Influence Maximization: A Reinforcement Learning Approach

Authors: Haipeng Chen, Wei Qiu, Han-Ching Ou, Bo An, Milind Tambe

Abstract: The influence maximization (IM) problem aims at finding a subset of seed nodes in a social network that maximize the spread of influence. In this study, we focus on a sub-class of IM problems, where whether the nodes are willing to be the seeds when being invited is uncertain, called contingency-aware IM. Such contingency aware IM is critical for applications for non-profit organizations in low re… ▽ More The influence maximization (IM) problem aims at finding a subset of seed nodes in a social network that maximize the spread of influence. In this study, we focus on a sub-class of IM problems, where whether the nodes are willing to be the seeds when being invited is uncertain, called contingency-aware IM. Such contingency aware IM is critical for applications for non-profit organizations in low resource communities (e.g., spreading awareness of disease prevention). Despite the initial success, a major practical obstacle in promoting the solutions to more communities is the tremendous runtime of the greedy algorithms and the lack of high performance computing (HPC) for the non-profits in the field -- whenever there is a new social network, the non-profits usually do not have the HPCs to recalculate the solutions. Motivated by this and inspired by the line of works that use reinforcement learning (RL) to address combinatorial optimization on graphs, we formalize the problem as a Markov Decision Process (MDP), and use RL to learn an IM policy over historically seen networks, and generalize to unseen networks with negligible runtime at test phase. To fully exploit the properties of our targeted problem, we propose two technical innovations that improve the existing methods, including state-abstraction and theoretically grounded reward sha**. Empirical results show that our method achieves influence as high as the state-of-the-art methods for contingency-aware IM, while having negligible runtime at test phase. △ Less

Submitted 13 June, 2021; originally announced June 2021.

Comments: 11 pages; accepted for publication at UAI 2021

arXiv:2106.06060 [pdf, other]

AI-driven Prices for Externalities and Sustainability in Production Markets

Authors: Panayiotis Danassis, Aris Filos-Ratsikas, Haipeng Chen, Milind Tambe, Boi Faltings

Abstract: Traditional competitive markets do not account for negative externalities; indirect costs that some participants impose on others, such as the cost of over-appropriating a common-pool resource (which diminishes future stock, and thus harvest, for everyone). Quantifying appropriate interventions to market prices has proven to be quite challenging. We propose a practical approach to computing market… ▽ More Traditional competitive markets do not account for negative externalities; indirect costs that some participants impose on others, such as the cost of over-appropriating a common-pool resource (which diminishes future stock, and thus harvest, for everyone). Quantifying appropriate interventions to market prices has proven to be quite challenging. We propose a practical approach to computing market prices and allocations via a deep reinforcement learning policymaker agent, operating in an environment of other learning agents. Our policymaker allows us to tune the prices with regard to diverse objectives such as sustainability and resource wastefulness, fairness, buyers' and sellers' welfare, etc. As a highlight of our findings, our policymaker is significantly more successful in maintaining resource sustainability, compared to the market equilibrium outcome, in scarce resource environments. △ Less

Submitted 12 January, 2023; v1 submitted 10 June, 2021; originally announced June 2021.

Comments: Accepted to the 22nd International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2023)

arXiv:2106.04663 [pdf, other]

Solving Structured Hierarchical Games Using Differential Backward Induction

Authors: Zun Li, Feiran Jia, Aditya Mate, Shahin Jabbari, Mithun Chakraborty, Milind Tambe, Yevgeniy Vorobeychik

Abstract: From large-scale organizations to decentralized political systems, hierarchical strategic decision making is commonplace. We introduce a novel class of structured hierarchical games (SHGs) that formally capture such hierarchical strategic interactions. In an SHG, each player is a node in a tree, and strategic choices of players are sequenced from root to leaves, with root moving first, followed by… ▽ More From large-scale organizations to decentralized political systems, hierarchical strategic decision making is commonplace. We introduce a novel class of structured hierarchical games (SHGs) that formally capture such hierarchical strategic interactions. In an SHG, each player is a node in a tree, and strategic choices of players are sequenced from root to leaves, with root moving first, followed by its children, then followed by their children, and so on until the leaves. A player's utility in an SHG depends on its own decision, and on the choices of its parent and all the tree leaves. SHGs thus generalize simultaneous-move games, as well as Stackelberg games with many followers. We leverage the structure of both the sequence of player moves as well as payoff dependence to develop a gradient-based back propagation-style algorithm, which we call Differential Backward Induction (DBI), for approximating equilibria of SHGs. We provide a sufficient condition for convergence of DBI and demonstrate its efficacy in finding approximate equilibrium solutions to several SHG models of hierarchical policy-making problems. △ Less

Submitted 27 June, 2022; v1 submitted 8 June, 2021; originally announced June 2021.

Comments: The short version of this paper appears in the proceedings of UAI-22

arXiv:2106.03279 [pdf, other]

Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning

Authors: Kai Wang, Sanket Shah, Haipeng Chen, Andrew Perrault, Finale Doshi-Velez, Milind Tambe

Abstract: In the predict-then-optimize framework, the objective is to train a predictive model, map** from environment features to parameters of an optimization problem, which maximizes decision quality when the optimization is subsequently solved. Recent work on decision-focused learning shows that embedding the optimization problem in the training pipeline can improve decision quality and help generaliz… ▽ More In the predict-then-optimize framework, the objective is to train a predictive model, map** from environment features to parameters of an optimization problem, which maximizes decision quality when the optimization is subsequently solved. Recent work on decision-focused learning shows that embedding the optimization problem in the training pipeline can improve decision quality and help generalize better to unseen tasks compared to relying on an intermediate loss function for evaluating prediction quality. We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) that are solved via reinforcement learning. In particular, we are given environment features and a set of trajectories from training MDPs, which we use to train a predictive model that generalizes to unseen test MDPs without trajectories. Two significant computational challenges arise in applying decision-focused learning to MDPs: (i) large state and action spaces make it infeasible for existing techniques to differentiate through MDP problems, and (ii) the high-dimensional policy space, as parameterized by a neural network, makes differentiating through a policy expensive. We resolve the first challenge by sampling provably unbiased derivatives to approximate and differentiate through optimality conditions, and the second challenge by using a low-rank approximation to the high-dimensional sample-based derivatives. We implement both Bellman--based and policy gradient--based decision-focused learning on three different MDP problems with missing parameters, and show that decision-focused learning performs better in generalization to unseen tasks. △ Less

Submitted 16 July, 2022; v1 submitted 6 June, 2021; originally announced June 2021.

arXiv:2106.03278 [pdf, other]

Coordinating Followers to Reach Better Equilibria: End-to-End Gradient Descent for Stackelberg Games

Authors: Kai Wang, Lily Xu, Andrew Perrault, Michael K. Reiter, Milind Tambe

Abstract: A growing body of work in game theory extends the traditional Stackelberg game to settings with one leader and multiple followers who play a Nash equilibrium. Standard approaches for computing equilibria in these games reformulate the followers' best response as constraints in the leader's optimization problem. These reformulation approaches can sometimes be effective, but often get trapped in low… ▽ More A growing body of work in game theory extends the traditional Stackelberg game to settings with one leader and multiple followers who play a Nash equilibrium. Standard approaches for computing equilibria in these games reformulate the followers' best response as constraints in the leader's optimization problem. These reformulation approaches can sometimes be effective, but often get trapped in low-quality solutions when followers' objectives are non-linear or non-quadratic. Moreover, these approaches assume a unique equilibrium or a specific equilibrium concept, e.g., optimistic or pessimistic, which is a limiting assumption in many situations. To overcome these limitations, we propose a stochastic gradient descent--based approach, where the leader's strategy is updated by differentiating through the followers' best responses. We frame the leader's optimization as a learning problem against followers' equilibrium, which allows us to decouple the followers' equilibrium constraints from the leader's problem. This approach also addresses cases with multiple equilibria and arbitrary equilibrium selection procedures by back-propagating through a sampled Nash equilibrium. To this end, this paper introduces a novel concept called equilibrium flow to formally characterize the set of equilibrium selection processes where the gradient with respect to a sampled equilibrium is an unbiased estimate of the true gradient. We evaluate our approach experimentally against existing baselines in three Stackelberg problems with multiple followers and find that in each case, our approach is able to achieve higher utility for the leader. △ Less

Submitted 3 December, 2021; v1 submitted 6 June, 2021; originally announced June 2021.

arXiv:2105.07965 [pdf, other]

Learn to Intervene: An Adaptive Learning Policy for Restless Bandits in Application to Preventive Healthcare

Authors: Arpita Biswas, Gaurav Aggarwal, Pradeep Varakantham, Milind Tambe

Abstract: In many public health settings, it is important for patients to adhere to health programs, such as taking medications and periodic health checks. Unfortunately, beneficiaries may gradually disengage from such programs, which is detrimental to their health. A concrete example of gradual disengagement has been observed by an organization that carries out a free automated call-based program for sprea… ▽ More In many public health settings, it is important for patients to adhere to health programs, such as taking medications and periodic health checks. Unfortunately, beneficiaries may gradually disengage from such programs, which is detrimental to their health. A concrete example of gradual disengagement has been observed by an organization that carries out a free automated call-based program for spreading preventive care information among pregnant women. Many women stop picking up calls after being enrolled for a few months. To avoid such disengagements, it is important to provide timely interventions. Such interventions are often expensive and can be provided to only a small fraction of the beneficiaries. We model this scenario as a restless multi-armed bandit (RMAB) problem, where each beneficiary is assumed to transition from one state to another depending on the intervention. Moreover, since the transition probabilities are unknown a priori, we propose a Whittle index based Q-Learning mechanism and show that it converges to the optimal solution. Our method improves over existing learning-based methods for RMABs on multiple benchmarks from literature and also on the maternal healthcare dataset. △ Less

Submitted 22 July, 2021; v1 submitted 17 May, 2021; originally announced May 2021.

Comments: To appear in the 30th International Joint Conference on Artificial Intelligence (IJCAI 2021)

arXiv:2103.09052 [pdf, other]

Selective Intervention Planning using Restless Multi-Armed Bandits to Improve Maternal and Child Health Outcomes

Authors: Siddharth Nishtala, Lovish Madaan, Aditya Mate, Harshavardhan Kamarthi, Anirudh Grama, Divy Thakkar, Dhyanesh Narayanan, Suresh Chaudhary, Neha Madhiwalla, Ramesh Padmanabhan, Aparna Hegde, Pradeep Varakantham, Balaraman Ravindran, Milind Tambe

Abstract: India has a maternal mortality ratio of 113 and child mortality ratio of 2830 per 100,000 live births. Lack of access to preventive care information is a major contributing factor for these deaths, especially in low resource households. We partner with ARMMAN, a non-profit based in India employing a call-based information program to disseminate health-related information to pregnant women and wome… ▽ More India has a maternal mortality ratio of 113 and child mortality ratio of 2830 per 100,000 live births. Lack of access to preventive care information is a major contributing factor for these deaths, especially in low resource households. We partner with ARMMAN, a non-profit based in India employing a call-based information program to disseminate health-related information to pregnant women and women with recent child deliveries. We analyze call records of over 300,000 women registered in the program created by ARMMAN and try to identify women who might not engage with these call programs that are proven to result in positive health outcomes. We built machine learning based models to predict the long term engagement pattern from call logs and beneficiaries' demographic information, and discuss the applicability of this method in the real world through a pilot validation. Through a pilot service quality improvement study, we show that using our model's predictions to make interventions boosts engagement metrics by 61.37%. We then formulate the intervention planning problem as restless multi-armed bandits (RMABs), and present preliminary results using this approach. △ Less

Submitted 18 October, 2021; v1 submitted 7 March, 2021; originally announced March 2021.

Comments: 7 pages. Camera-ready version for AASG 2021 Workshop

arXiv:2103.04730 [pdf, other]

Efficient Algorithms for Finite Horizon and Streaming Restless Multi-Armed Bandit Problems

Authors: Aditya Mate, Arpita Biswas, Christoph Siebenbrunner, Susobhan Ghosh, Milind Tambe

Abstract: We propose Streaming Bandits, a Restless Multi Armed Bandit (RMAB) framework in which heterogeneous arms may arrive and leave the system after staying on for a finite lifetime. Streaming Bandits naturally capture the health intervention planning problem, where health workers must manage the health outcomes of a patient cohort while new patients join and existing patients leave the cohort each day.… ▽ More We propose Streaming Bandits, a Restless Multi Armed Bandit (RMAB) framework in which heterogeneous arms may arrive and leave the system after staying on for a finite lifetime. Streaming Bandits naturally capture the health intervention planning problem, where health workers must manage the health outcomes of a patient cohort while new patients join and existing patients leave the cohort each day. Our contributions are as follows: (1) We derive conditions under which our problem satisfies indexability, a precondition that guarantees the existence and asymptotic optimality of the Whittle Index solution for RMABs. We establish the conditions using a polytime reduction of the Streaming Bandit setup to regular RMABs. (2) We further prove a phenomenon that we call index decay, whereby the Whittle index values are low for short residual lifetimes driving the intuition underpinning our algorithm. (3) We propose a novel and efficient algorithm to compute the index-based solution for Streaming Bandits. Unlike previous methods, our algorithm does not rely on solving the costly finite horizon problem on each arm of the RMAB, thereby lowering the computational complexity compared to existing methods. (4) Finally, we evaluate our approach via simulations run on realworld data sets from a tuberculosis patient monitoring task and an intervention planning task for improving maternal healthcare, in addition to other synthetic domains. Across the board, our algorithm achieves a 2-orders-of-magnitude speed-up over existing methods while maintaining the same solution quality. △ Less

Submitted 15 February, 2022; v1 submitted 8 March, 2021; originally announced March 2021.

arXiv:2102.10646 [pdf, other]

A Game-Theoretic Approach for Hierarchical Epidemic Control

Authors: Feiran Jia, Aditya Mate, Zun Li, Shahin Jabbari, Mithun Chakraborty, Milind Tambe, Michael Wellman, Yevgeniy Vorobeychik

Abstract: We design and analyze a multi-level game-theoretic model of hierarchical policy interventions for epidemic control, such as those in response to the COVID-19 pandemic. Our model captures the potentially mismatched priorities among a hierarchy of policy-makers (e.g., federal, state, and local governments) with respect to two cost components that have opposite dependence on the policy strength -- po… ▽ More We design and analyze a multi-level game-theoretic model of hierarchical policy interventions for epidemic control, such as those in response to the COVID-19 pandemic. Our model captures the potentially mismatched priorities among a hierarchy of policy-makers (e.g., federal, state, and local governments) with respect to two cost components that have opposite dependence on the policy strength -- post-intervention infection rates and the socio-economic cost of policy implementation. Additionally, our model includes a crucial third factor in decisions: a cost of non-compliance with the policy-maker immediately above in the hierarchy, such as non-compliance of counties with state-level policies. We propose two novel algorithms for approximating solutions to such games. The first is based on best response dynamics (BRD), and exploits the tree structure of the game. The second combines quadratic integer programming (QIP), which enables us to collapse the two lowest levels of the game, with best response dynamics. Through extensive experiments, we show that our QIP-based approach significantly outperforms the BRD algorithm both in running time and the quality of equilibrium solutions. Finally, we apply the QIP-based algorithm to experiments based on both synthetic and real-world data under various parameter configurations and analyze the resulting (approximate) equilibria to gain insight into the impact of decentralization on overall welfare (measured as the negative sum of costs) as well as emergent properties like free-riding and fairness in cost distribution among policy-makers. △ Less

Submitted 3 August, 2022; v1 submitted 21 February, 2021; originally announced February 2021.

arXiv:2101.02766 [pdf, other]

Active Screening for Recurrent Diseases: A Reinforcement Learning Approach

Authors: Han-Ching Ou, Haipeng Chen, Shahin Jabbari, Milind Tambe

Abstract: Active screening is a common approach in controlling the spread of recurring infectious diseases such as tuberculosis and influenza. In this approach, health workers periodically select a subset of population for screening. However, given the limited number of health workers, only a small subset of the population can be visited in any given time period. Given the recurrent nature of the disease an… ▽ More Active screening is a common approach in controlling the spread of recurring infectious diseases such as tuberculosis and influenza. In this approach, health workers periodically select a subset of population for screening. However, given the limited number of health workers, only a small subset of the population can be visited in any given time period. Given the recurrent nature of the disease and rapid spreading, the goal is to minimize the number of infections over a long time horizon. Active screening can be formalized as a sequential combinatorial optimization over the network of people and their connections. The main computational challenges in this formalization arise from i) the combinatorial nature of the problem, ii) the need of sequential planning and iii) the uncertainties in the infectiousness states of the population. Previous works on active screening fail to scale to large time horizon while fully considering the future effect of current interventions. In this paper, we propose a novel reinforcement learning (RL) approach based on Deep Q-Networks (DQN), with several innovative adaptations that are designed to address the above challenges. First, we use graph convolutional networks (GCNs) to represent the Q-function that exploit the node correlations of the underlying contact network. Second, to avoid solving a combinatorial optimization problem in each time period, we decompose the node set selection as a sub-sequence of decisions, and further design a two-level RL framework that solves the problem in a hierarchical way. Finally, to speed-up the slow convergence of RL which arises from reward sparseness, we incorporate ideas from curriculum learning into our hierarchical RL approach. We evaluate our RL algorithm on several real-world networks. △ Less

Submitted 19 April, 2021; v1 submitted 7 January, 2021; originally announced January 2021.

Comments: The short version of this paper appears in the proceedings of AAMAS-21

arXiv:2012.12839 [pdf, other]

Cohorting to isolate asymptomatic spreaders: An agent-based simulation study on the Mumbai Suburban Railway

Authors: Alok Talekar, Sharad Shriram, Nidhin Vaidhiyan, Gaurav Aggarwal, Jiangzhuo Chen, Srini Venkatramanan, Li**g Wang, Aniruddha Adiga, Adam Sadilek, Ashish Tendulkar, Madhav Marathe, Rajesh Sundaresan, Milind Tambe

Abstract: The Mumbai Suburban Railways, \emph{locals}, are a key transit infrastructure of the city and is crucial for resuming normal economic activity. To reduce disease transmission, policymakers can enforce reduced crowding and mandate wearing of masks. \emph{Cohorting} -- forming groups of travelers that always travel together, is an additional policy to reduce disease transmission on \textit{locals} w… ▽ More The Mumbai Suburban Railways, \emph{locals}, are a key transit infrastructure of the city and is crucial for resuming normal economic activity. To reduce disease transmission, policymakers can enforce reduced crowding and mandate wearing of masks. \emph{Cohorting} -- forming groups of travelers that always travel together, is an additional policy to reduce disease transmission on \textit{locals} without severe restrictions. Cohorting allows us to: ($i$) form traveler bubbles, thereby decreasing the number of distinct interactions over time; ($ii$) potentially quarantine an entire cohort if a single case is detected, making contact tracing more efficient, and ($iii$) target cohorts for testing and early detection of symptomatic as well as asymptomatic cases. Studying impact of cohorts using compartmental models is challenging because of the ensuing representational complexity. Agent-based models provide a natural way to represent cohorts along with the representation of the cohort members with the larger social network. This paper describes a novel multi-scale agent-based model to study the impact of cohorting strategies on COVID-19 dynamics in Mumbai. We achieve this by modeling the Mumbai urban region using a detailed agent-based model comprising of 12.4 million agents. Individual cohorts and their inter-cohort interactions as they travel on locals are modeled using local mean field approximations. The resulting multi-scale model in conjunction with a detailed disease transmission and intervention simulator is used to assess various cohorting strategies. The results provide a quantitative trade-off between cohort size and its impact on disease dynamics and well being. The results show that cohorts can provide significant benefit in terms of reduced transmission without significantly impacting ridership and or economic \& social activity. △ Less

Submitted 24 December, 2020; v1 submitted 23 December, 2020; originally announced December 2020.

Comments: Will be presented at AAMAS 2021. Minor edits to styling (per conf guidelines) and acknowledgement

arXiv:2012.10389 [pdf, other]

doi 10.5555/3463952

Reinforcement Learning for Unified Allocation and Patrolling in Signaling Games with Uncertainty

Authors: Aravind Venugopal, Elizabeth Bondi, Harshavardhan Kamarthi, Keval Dholakia, Balaraman Ravindran, Milind Tambe

Abstract: Green Security Games (GSGs) have been successfully used in the protection of valuable resources such as fisheries, forests and wildlife. While real-world deployment involves both resource allocation and subsequent coordinated patrolling with communication and real-time, uncertain information, previous game models do not fully address both of these stages simultaneously. Furthermore, adopting exist… ▽ More Green Security Games (GSGs) have been successfully used in the protection of valuable resources such as fisheries, forests and wildlife. While real-world deployment involves both resource allocation and subsequent coordinated patrolling with communication and real-time, uncertain information, previous game models do not fully address both of these stages simultaneously. Furthermore, adopting existing solution strategies is difficult since they do not scale well for larger, more complex variants of the game models. We therefore first propose a novel GSG model that combines defender allocation, patrolling, real-time drone notification to human patrollers, and drones sending warning signals to attackers. The model further incorporates uncertainty for real-time decision-making within a team of drones and human patrollers. Second, we present CombSGPO, a novel and scalable algorithm based on reinforcement learning, to compute a defender strategy for this game model. CombSGPO performs policy search over a multi-dimensional, discrete action space to compute an allocation strategy that is best suited to a best-response patrolling strategy for the defender, learnt by training a multi-agent Deep Q-Network. We show via experiments that CombSGPO converges to better strategies and is more scalable than comparable approaches. Third, we provide a detailed analysis of the coordination and signaling behavior learnt by CombSGPO, showing group formation between defender resources and patrolling formations based on signaling and notifications between resources. Importantly, we find that strategic signaling emerges in the final learnt strategy. Finally, we perform experiments to evaluate these strategies under different levels of uncertainty. △ Less

Submitted 18 December, 2020; originally announced December 2020.

Comments: Accepted at AAMAS 2021

Report number: page 1353-1361

arXiv:2012.03822 [pdf, other]

Efficient Reservoir Management through Deep Reinforcement Learning

Authors: Xinrun Wang, Tarun Nair, Haoyang Li, Yuh Sheng Reuben Wong, Nachiket Kelkar, Srinivas Vaidyanathan, Rajat Nayak, Bo An, Jagdish Krishnaswamy, Milind Tambe

Abstract: Dams impact downstream river dynamics through flow regulation and disruption of upstream-downstream linkages. However, current dam operation is far from satisfactory due to the inability to respond the complicated and uncertain dynamics of the upstream-downstream system and various usages of the reservoir. Even further, the unsatisfactory dam operation can cause floods in downstream areas. Therefo… ▽ More Dams impact downstream river dynamics through flow regulation and disruption of upstream-downstream linkages. However, current dam operation is far from satisfactory due to the inability to respond the complicated and uncertain dynamics of the upstream-downstream system and various usages of the reservoir. Even further, the unsatisfactory dam operation can cause floods in downstream areas. Therefore, we leverage reinforcement learning (RL) methods to compute efficient dam operation guidelines in this work. Specifically, we build offline simulators with real data and different mathematical models for the upstream inflow, i.e., generalized least square (GLS) and dynamic linear model (DLM), then use the simulator to train the state-of-the-art RL algorithms, including DDPG, TD3 and SAC. Experiments show that the simulator with DLM can efficiently model the inflow dynamics in the upstream and the dam operation policies trained by RL algorithms significantly outperform the human-generated policy. △ Less

Submitted 7 December, 2020; originally announced December 2020.

Comments: 5 pages, 4 figures, Workshop paper

arXiv:2011.10666 [pdf, other]

Enhancing Poaching Predictions for Under-Resourced Wildlife Conservation Parks Using Remote Sensing Imagery

Authors: Rachel Guo, Lily Xu, Drew Cronin, Francis Okeke, Andrew Plumptre, Milind Tambe

Abstract: Illegal wildlife poaching is driving the loss of biodiversity. To combat poaching, rangers patrol expansive protected areas for illegal poaching activity. However, rangers often cannot comprehensively search such large parks. Thus, the Protection Assistant for Wildlife Security (PAWS) was introduced as a machine learning approach to help identify the areas with highest poaching risk. As PAWS is de… ▽ More Illegal wildlife poaching is driving the loss of biodiversity. To combat poaching, rangers patrol expansive protected areas for illegal poaching activity. However, rangers often cannot comprehensively search such large parks. Thus, the Protection Assistant for Wildlife Security (PAWS) was introduced as a machine learning approach to help identify the areas with highest poaching risk. As PAWS is deployed to parks around the world, we recognized that many parks have limited resources for data collection and therefore have scarce feature sets. To ensure under-resourced parks have access to meaningful poaching predictions, we introduce the use of publicly available remote sensing data to extract features for parks. By employing this data from Google Earth Engine, we also incorporate previously unavailable dynamic data to enrich predictions with seasonal trends. We automate the entire data-to-deployment pipeline and find that, with only using publicly available data, we recuperate prediction performance comparable to predictions made using features manually computed by park specialists. We conclude that the inclusion of satellite imagery creates a robust system through which parks of any resource level can benefit from poaching risks for years to come. △ Less

Submitted 20 November, 2020; originally announced November 2020.

Comments: Presented at NeurIPS 2020 Workshop on Machine Learning for the Develo** World. 4 pages, 1 page references. 4 figures, 1 table

arXiv:2011.02962 [pdf, other]

Measuring Data Collection Diligence for Community Healthcare

Authors: Ramesha Karunasena, Mohammad Sarparajul Ambiya, Arunesh Sinha, Ruchit Nagar, Saachi Dalal, Divy Thakkar, Dhyanesh Narayanan, Milind Tambe

Abstract: Data analytics has tremendous potential to provide targeted benefit in low-resource communities, however the availability of high-quality public health data is a significant challenge in develo** countries primarily due to non-diligent data collection by community health workers (CHWs). In this work, we define and test a data collection diligence score. This challenging unlabeled data problem is… ▽ More Data analytics has tremendous potential to provide targeted benefit in low-resource communities, however the availability of high-quality public health data is a significant challenge in develo** countries primarily due to non-diligent data collection by community health workers (CHWs). In this work, we define and test a data collection diligence score. This challenging unlabeled data problem is handled by building upon domain expert's guidance to design a useful data representation of the raw data, using which we design a simple and natural score. An important aspect of the score is relative scoring of the CHWs, which implicitly takes into account the context of the local area. The data is also clustered and interpreting these clusters provides a natural explanation of the past behavior of each data collector. We further predict the diligence score for future time steps. Our framework has been validated on the ground using observations by the field monitors of our partner NGO in India. Beyond the successful field test, our work is in the final stages of deployment in the state of Rajasthan, India. △ Less

Submitted 7 April, 2021; v1 submitted 5 November, 2020; originally announced November 2020.

ACM Class: I.2.1

arXiv:2009.09559 [pdf, other]

Clinical trial of an AI-augmented intervention for HIV prevention in youth experiencing homelessness

Authors: Bryan Wilder, Laura Onasch-Vera, Graham Diguiseppi, Robin Petering, Chyna Hill, Amulya Yadav, Eric Rice, Milind Tambe

Abstract: Youth experiencing homelessness (YEH) are subject to substantially greater risk of HIV infection, compounded both by their lack of access to stable housing and the disproportionate representation of youth of marginalized racial, ethnic, and gender identity groups among YEH. A key goal for health equity is to improve adoption of protective behaviors in this population. One promising strategy for in… ▽ More Youth experiencing homelessness (YEH) are subject to substantially greater risk of HIV infection, compounded both by their lack of access to stable housing and the disproportionate representation of youth of marginalized racial, ethnic, and gender identity groups among YEH. A key goal for health equity is to improve adoption of protective behaviors in this population. One promising strategy for intervention is to recruit peer leaders from the population of YEH to promote behaviors such as condom usage and regular HIV testing to their social contacts. This raises a computational question: which youth should be selected as peer leaders to maximize the overall impact of the intervention? We developed an artificial intelligence system to optimize such social network interventions in a community health setting. We conducted a clinical trial enrolling 713 YEH at drop-in centers in a large US city. The clinical trial compared interventions planned with the algorithm to those where the highest-degree nodes in the youths' social network were recruited as peer leaders (the standard method in public health) and to an observation-only control group. Results from the clinical trial show that youth in the AI group experience statistically significant reductions in key risk behaviors for HIV transmission, while those in the other groups do not. This provides, to our knowledge, the first empirical validation of the usage of AI methods to optimize social network interventions for health. We conclude by discussing lessons learned over the course of the project which may inform future attempts to use AI in community-level interventions. △ Less

Submitted 6 November, 2020; v1 submitted 20 September, 2020; originally announced September 2020.

Report number: Accepted at AAAI 2021

Showing 1–50 of 89 results for author: Tambe, M