Skip to main content

Showing 1–7 of 7 results for author: Kalyanakrishnan, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2211.15602  [pdf, ps, other

    cs.DM cs.CC math.CO

    Upper Bounds for All and Max-gain Policy Iteration Algorithms on Deterministic MDPs

    Authors: Ritesh Goenka, Eashan Gupta, Sushil Khyalia, Pratyush Agarwal, Mulinti Shaik Wajid, Shivaram Kalyanakrishnan

    Abstract: Policy Iteration (PI) is a widely used family of algorithms to compute optimal policies for Markov Decision Problems (MDPs). We derive upper bounds on the running time of PI on Deterministic MDPs (DMDPs): the class of MDPs in which every state-action pair has a unique next state. Our results include a non-trivial upper bound that applies to the entire family of PI algorithms; another to all "max-g… ▽ More

    Submitted 8 October, 2023; v1 submitted 28 November, 2022; originally announced November 2022.

    Comments: Added new bounds for two state MDPs

    MSC Class: 90C40 (Primary) 68Q25; 05C35; 05C38 (Secondary)

  2. arXiv:2211.06318  [pdf

    cs.CY cs.AI cs.LG

    Artificial Intelligence and Life in 2030: The One Hundred Year Study on Artificial Intelligence

    Authors: Peter Stone, Rodney Brooks, Erik Brynjolfsson, Ryan Calo, Oren Etzioni, Greg Hager, Julia Hirschberg, Shivaram Kalyanakrishnan, Ece Kamar, Sarit Kraus, Kevin Leyton-Brown, David Parkes, William Press, AnnaLee Saxenian, Julie Shah, Milind Tambe, Astro Teller

    Abstract: In September 2016, Stanford's "One Hundred Year Study on Artificial Intelligence" project (AI100) issued the first report of its planned long-term periodic assessment of artificial intelligence (AI) and its impact on society. It was written by a panel of 17 study authors, each of whom is deeply rooted in AI research, chaired by Peter Stone of the University of Texas at Austin. The report, entitled… ▽ More

    Submitted 31 October, 2022; originally announced November 2022.

    Comments: 52 pages, https://ai100.stanford.edu/2016-report

  3. arXiv:2102.03718  [pdf, other

    cs.LG

    An Analysis of Frame-skip** in Reinforcement Learning

    Authors: Shivaram Kalyanakrishnan, Siddharth Aravindan, Vishwajeet Bagdawat, Varun Bhatt, Harshith Goka, Archit Gupta, Kalpesh Krishna, Vihari Piratla

    Abstract: In the practice of sequential decision making, agents are often designed to sense state at regular intervals of $d$ time steps, $d > 1$, ignoring state information in between sensing steps. While it is clear that this practice can reduce sensing and compute costs, recent results indicate a further benefit. On many Atari console games, reinforcement learning (RL) algorithms deliver substantially be… ▽ More

    Submitted 6 February, 2021; originally announced February 2021.

  4. arXiv:2009.07842  [pdf, ps, other

    cs.LG math.OC stat.ML

    Lower Bounds for Policy Iteration on Multi-action MDPs

    Authors: Kumar Ashutosh, Sarthak Consul, Bhishma Dedhia, Parthasarathi Khirwadkar, Sahil Shah, Shivaram Kalyanakrishnan

    Abstract: Policy Iteration (PI) is a classical family of algorithms to compute an optimal policy for any given Markov Decision Problem (MDP). The basic idea in PI is to begin with some initial policy and to repeatedly update the policy to one from an improving set, until an optimal policy is reached. Different variants of PI result from the (switching) rule used for improvement. An important theoretical que… ▽ More

    Submitted 16 September, 2020; originally announced September 2020.

    Comments: 8 pages, 3 diagrams, 2 tables. Paper in IEEE CDC 2020

  5. arXiv:1901.08387  [pdf, ps, other

    cs.LG cs.AI

    Regret Minimisation in Multi-Armed Bandits Using Bounded Arm Memory

    Authors: Arghya Roy Chaudhuri, Shivaram Kalyanakrishnan

    Abstract: In this paper, we propose a constant word (RAM model) algorithm for regret minimisation for both finite and infinite Stochastic Multi-Armed Bandit (MAB) instances. Most of the existing regret minimisation algorithms need to remember the statistics of all the arms they encounter. This may become a problem for the cases where the number of available words of memory is limited. Designing an efficient… ▽ More

    Submitted 24 January, 2019; originally announced January 2019.

  6. arXiv:1901.08386  [pdf, ps, other

    cs.LG stat.ML

    PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits

    Authors: Arghya Roy Chaudhuri, Shivaram Kalyanakrishnan

    Abstract: We consider the problem of identifying any $k$ out of the best $m$ arms in an $n$-armed stochastic multi-armed bandit. Framed in the PAC setting, this particular problem generalises both the problem of `best subset selection' and that of selecting `one out of the best m' arms [arcsk 2017]. In applications such as crowd-sourcing and drug-designing, identifying a single good solution is often not su… ▽ More

    Submitted 24 January, 2019; originally announced January 2019.

  7. arXiv:1712.04303  [pdf, other

    cs.DC

    RLWS: A Reinforcement Learning based GPU Warp Scheduler

    Authors: Jayvant Anantpur, Nagendra Gulur Dwarakanath, Shivaram Kalyanakrishnan, Shalabh Bhatnagar, R. Govindarajan

    Abstract: The Streaming Multiprocessors (SMs) of a Graphics Processing Unit (GPU) execute instructions from a group of consecutive threads, called warps. At each cycle, an SM schedules a warp from a group of active warps and can context switch among the active warps to hide various stalls. Hence the performance of warp scheduler is critical to the performance of GPU. Several heuristic warp scheduling algori… ▽ More

    Submitted 17 November, 2017; originally announced December 2017.