Skip to main content

Showing 1–17 of 17 results for author: Lazic, N

Searching in archive cs. Search in all archives.
.
  1. Robotic Table Tennis: A Case Study into a High Speed Learning System

    Authors: David B. D'Ambrosio, Jonathan Abelian, Saminda Abeyruwan, Michael Ahn, Alex Bewley, Justin Boyd, Krzysztof Choromanski, Omar Cortes, Erwin Coumans, Tianli Ding, Wenbo Gao, Laura Graesser, Atil Iscen, Navdeep Jaitly, Deepali Jain, Juhana Kangaspunta, Satoshi Kataoka, Gus Kouretas, Yuheng Kuang, Nevena Lazic, Corey Lynch, Reza Mahjourian, Sherry Q. Moore, Thinh Nguyen, Ken Oslund , et al. (10 additional authors not shown)

    Abstract: We present a deep-dive into a real-world robotic learning system that, in previous work, was shown to be capable of hundreds of table tennis rallies with a human and has the ability to precisely return the ball to desired targets. This system puts together a highly optimized perception subsystem, a high-speed low-latency robot controller, a simulation paradigm that can prevent damage in the real w… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

    Comments: Published and presented at Robotics: Science and Systems (RSS2023)

  2. arXiv:2307.11546  [pdf, other

    physics.plasm-ph cs.LG

    Towards practical reinforcement learning for tokamak magnetic control

    Authors: Brendan D. Tracey, Andrea Michi, Yuri Chervonyi, Ian Davies, Cosmin Paduraru, Nevena Lazic, Federico Felici, Timo Ewalds, Craig Donner, Cristian Galperti, Jonas Buchli, Michael Neunert, Andrea Huber, Jonathan Evens, Paula Kurylowicz, Daniel J. Mankowitz, Martin Riedmiller, The TCV Team

    Abstract: Reinforcement learning (RL) has shown promising results for real-time control systems, including the domain of plasma magnetic control. However, there are still significant drawbacks compared to traditional feedback control approaches for magnetic confinement. In this work, we address key drawbacks of the RL method; achieving higher control accuracy for desired plasma properties, reducing the stea… ▽ More

    Submitted 5 October, 2023; v1 submitted 21 July, 2023; originally announced July 2023.

  3. arXiv:2301.12579  [pdf, other

    cs.LG cs.AI

    Sample Efficient Deep Reinforcement Learning via Local Planning

    Authors: Dong Yin, Sridhar Thiagarajan, Nevena Lazic, Nived Rajaraman, Botao Hao, Csaba Szepesvari

    Abstract: The focus of this work is sample-efficient deep reinforcement learning (RL) with a simulator. One useful property of simulators is that it is typically easy to reset the environment to a previously observed state. We propose an algorithmic framework, named uncertainty-first local planning (UFLP), that takes advantage of this property. Concretely, in each data collection iteration, with some probab… ▽ More

    Submitted 3 July, 2023; v1 submitted 29 January, 2023; originally announced January 2023.

    Comments: 25 pages, 11 figures

  4. arXiv:2201.06532  [pdf, ps, other

    cs.LG stat.ML

    A New Look at Dynamic Regret for Non-Stationary Stochastic Bandits

    Authors: Yasin Abbasi-Yadkori, Andras Gyorgy, Nevena Lazic

    Abstract: We study the non-stationary stochastic multi-armed bandit problem, where the reward statistics of each arm may change several times during the course of learning. The performance of a learning algorithm is evaluated in terms of their dynamic regret, which is defined as the difference between the expected cumulative reward of an agent choosing the optimal arm in every time step and the cumulative r… ▽ More

    Submitted 8 March, 2022; v1 submitted 17 January, 2022; originally announced January 2022.

  5. arXiv:2108.05533  [pdf, ps, other

    cs.LG math.OC stat.ML

    Efficient Local Planning with Linear Function Approximation

    Authors: Dong Yin, Botao Hao, Yasin Abbasi-Yadkori, Nevena Lazić, Csaba Szepesvári

    Abstract: We study query and computationally efficient planning algorithms with linear function approximation and a simulator. We assume that the agent only has local access to the simulator, meaning that the agent can only query the simulator at states that have been visited before. This setting is more practical than many prior works on reinforcement learning with a generative model. We propose two algori… ▽ More

    Submitted 4 February, 2022; v1 submitted 12 August, 2021; originally announced August 2021.

    Comments: Algorithmic Learning Theory 2022

  6. arXiv:2102.12611  [pdf, other

    cs.LG stat.ML

    Improved Regret Bound and Experience Replay in Regularized Policy Iteration

    Authors: Nevena Lazic, Dong Yin, Yasin Abbasi-Yadkori, Csaba Szepesvari

    Abstract: In this work, we study algorithms for learning in infinite-horizon undiscounted Markov decision processes (MDPs) with function approximation. We first show that the regret analysis of the Politex algorithm (a version of regularized policy iteration) can be sharpened from $O(T^{3/4})$ to $O(\sqrt{T})$ under nearly identical assumptions, and instantiate the bound with linear function approximation.… ▽ More

    Submitted 24 February, 2021; originally announced February 2021.

  7. arXiv:2102.06234  [pdf, other

    cs.LG stat.ML

    Optimization Issues in KL-Constrained Approximate Policy Iteration

    Authors: Nevena Lazić, Botao Hao, Yasin Abbasi-Yadkori, Dale Schuurmans, Csaba Szepesvári

    Abstract: Many reinforcement learning algorithms can be seen as versions of approximate policy iteration (API). While standard API often performs poorly, it has been shown that learning can be stabilized by regularizing each policy update by the KL-divergence to the previous policy. Popular practical algorithms such as TRPO, MPO, and VMPO replace regularization by a constraint on KL-divergence of consecutiv… ▽ More

    Submitted 11 February, 2021; originally announced February 2021.

  8. arXiv:2012.05339  [pdf, other

    cs.LG cs.CV

    Neural Rate Control for Video Encoding using Imitation Learning

    Authors: Hongzi Mao, Chenjie Gu, Miaosen Wang, Angie Chen, Nevena Lazic, Nir Levine, Derek Pang, Rene Claus, Marisabel Hechtman, Ching-Han Chiang, Cheng Chen, **gning Han

    Abstract: In modern video encoders, rate control is a critical component and has been heavily engineered. It decides how many bits to spend to encode each frame, in order to optimize the rate-distortion trade-off over all video frames. This is a challenging constrained planning problem because of the complex dependency among decisions for different video frames and the bitrate constraint defined at the end… ▽ More

    Submitted 9 December, 2020; originally announced December 2020.

  9. arXiv:2006.12620  [pdf, other

    cs.LG cs.AI

    A maximum-entropy approach to off-policy evaluation in average-reward MDPs

    Authors: Nevena Lazic, Dong Yin, Mehrdad Farajtabar, Nir Levine, Dilan Gorur, Chris Harris, Dale Schuurmans

    Abstract: This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs). For MDPs that are ergodic and linear (i.e. where rewards and dynamics are linear in some known features), we provide the first finite-sample OPE error bound, extending existing results beyond the episodic and discounted cases. In a more general setting, wh… ▽ More

    Submitted 17 June, 2020; originally announced June 2020.

  10. arXiv:2003.14398  [pdf, other

    cs.LG cs.RO stat.ML

    Robotic Table Tennis with Model-Free Reinforcement Learning

    Authors: Wenbo Gao, Laura Graesser, Krzysztof Choromanski, Xingyou Song, Nevena Lazic, Pannag Sanketi, Vikas Sindhwani, Navdeep Jaitly

    Abstract: We propose a model-free algorithm for learning efficient policies capable of returning table tennis balls by controlling robot joints at a rate of 100Hz. We demonstrate that evolutionary search (ES) methods acting on CNN-based policy architectures for non-visual inputs and convolving across time learn compact controllers leading to smooth motions. Furthermore, we show that with appropriately tuned… ▽ More

    Submitted 27 May, 2020; v1 submitted 31 March, 2020; originally announced March 2020.

    Comments: V2: new URL of supplementary video. 8 pages, 4 figures

    ACM Class: I.2.6; I.2.9

  11. arXiv:2002.03069  [pdf, other

    cs.LG stat.ML

    Adaptive Approximate Policy Iteration

    Authors: Botao Hao, Nevena Lazic, Yasin Abbasi-Yadkori, Pooria Joulani, Csaba Szepesvari

    Abstract: Model-free reinforcement learning algorithms combined with value function approximation have recently achieved impressive performance in a variety of application domains. However, the theoretical understanding of such algorithms is limited, and existing results are largely focused on episodic or discounted Markov decision processes (MDPs). In this work, we present adaptive approximate policy itera… ▽ More

    Submitted 11 February, 2021; v1 submitted 7 February, 2020; originally announced February 2020.

    Comments: Accepted at AISTATS 2021

  12. arXiv:1908.10479  [pdf, other

    cs.LG stat.ML

    Exploration-Enhanced POLITEX

    Authors: Yasin Abbasi-Yadkori, Nevena Lazic, Csaba Szepesvari, Gellert Weisz

    Abstract: We study algorithms for average-cost reinforcement learning problems with value function approximation. Our starting point is the recently proposed POLITEX algorithm, a version of policy iteration where the policy produced in each iteration is near-optimal in hindsight for the sum of all past value function estimates. POLITEX has sublinear regret guarantees in uniformly-mixing MDPs when the value… ▽ More

    Submitted 27 August, 2019; originally announced August 2019.

  13. arXiv:1811.12927  [pdf, other

    cs.RO

    Hierarchical Policy Design for Sample-Efficient Learning of Robot Table Tennis Through Self-Play

    Authors: Reza Mahjourian, Risto Miikkulainen, Nevena Lazic, Sergey Levine, Navdeep Jaitly

    Abstract: Training robots with physical bodies requires develo** new methods and action representations that allow the learning agents to explore the space of policies efficiently. This work studies sample-efficient learning of complex policies in the context of robot table tennis. It incorporates learning into a hierarchical control framework using a model-free strategy layer (which requires complex reas… ▽ More

    Submitted 17 February, 2019; v1 submitted 30 November, 2018; originally announced November 2018.

  14. arXiv:1806.07104  [pdf, other

    cs.LG stat.ML

    Online Linear Quadratic Control

    Authors: Alon Cohen, Avinatan Hassidim, Tomer Koren, Nevena Lazic, Yishay Mansour, Kunal Talwar

    Abstract: We study the problem of controlling linear time-invariant systems with known noisy dynamics and adversarially chosen quadratic losses. We present the first efficient online learning algorithms in this setting that guarantee $O(\sqrt{T})$ regret under mild assumptions, where $T$ is the time horizon. Our algorithms rely on a novel SDP relaxation for the steady-state distribution of the system. Cruci… ▽ More

    Submitted 19 June, 2018; originally announced June 2018.

  15. arXiv:1804.06021  [pdf, other

    cs.LG math.OC stat.ML

    Model-Free Linear Quadratic Control via Reduction to Expert Prediction

    Authors: Yasin Abbasi-Yadkori, Nevena Lazic, Csaba Szepesvari

    Abstract: Model-free approaches for reinforcement learning (RL) and continuous control find policies based only on past states and rewards, without fitting a model of the system dynamics. They are appealing as they are general purpose and easy to implement; however, they also come with fewer theoretical guarantees than model-based RL. In this work, we present a new model-free algorithm for controlling linea… ▽ More

    Submitted 5 October, 2018; v1 submitted 16 April, 2018; originally announced April 2018.

  16. arXiv:1604.05753  [pdf, other

    cs.LG cs.AI

    Sketching and Neural Networks

    Authors: Amit Daniely, Nevena Lazic, Yoram Singer, Kunal Talwar

    Abstract: High-dimensional sparse data present computational and statistical challenges for supervised learning. We propose compact linear sketches for reducing the dimensionality of the input, followed by a single layer neural network. We show that any sparse polynomial function can be computed, on nearly all sparse binary vectors, by a single layer neural network that takes a compact sketch of the vector… ▽ More

    Submitted 19 April, 2016; originally announced April 2016.

  17. arXiv:1412.1820  [pdf, other

    cs.CL

    Context-Dependent Fine-Grained Entity Type Tagging

    Authors: Dan Gillick, Nevena Lazic, Kuzman Ganchev, Jesse Kirchner, David Huynh

    Abstract: Entity type tagging is the task of assigning category labels to each mention of an entity in a document. While standard systems focus on a small set of types, recent work (Ling and Weld, 2012) suggests that using a large fine-grained label set can lead to dramatic improvements in downstream tasks. In the absence of labeled training data, existing fine-grained tagging systems obtain examples automa… ▽ More

    Submitted 1 August, 2016; v1 submitted 3 December, 2014; originally announced December 2014.