Skip to main content

Showing 1–50 of 85 results for author: Silver, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.13219  [pdf, other

    cs.SI cs.CY

    Bubble reachers and uncivil discourse in polarized online public sphere

    Authors: Jordan K Kobellarz, Milos Brocic, Daniel Silver, Thiago H Silva

    Abstract: Early optimism saw possibilities for social media to renew democratic discourse, marked by hopes for individuals from diverse backgrounds to find opportunities to learn from and interact with others different from themselves. This optimism quickly waned as social media seemed to breed ideological homophily marked by "filter bubble" or "echo chambers." A typical response to the sense of fragmentati… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 41 pages, 5 figures

  2. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  3. arXiv:2402.17905  [pdf, other

    cs.LG cs.CY cs.SI

    Using Graph Neural Networks to Predict Local Culture

    Authors: Thiago H Silva, Daniel Silver

    Abstract: Urban research has long recognized that neighbourhoods are dynamic and relational. However, lack of data, methodologies, and computer processing power have hampered a formal quantitative examination of neighbourhood relational dynamics. To make progress on this issue, this study proposes a graph neural network (GNN) approach that permits combining and evaluating multiple sources of information abo… ▽ More

    Submitted 22 April, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: 14 pages, 5 figures

  4. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  5. arXiv:2309.15259  [pdf, other

    quant-ph cs.CV eess.IV

    SLIQ: Quantum Image Similarity Networks on Noisy Quantum Computers

    Authors: Daniel Silver, Tirthak Patel, Aditya Ranjan, Harshitta Gandhi, William Cutler, Devesh Tiwari

    Abstract: Exploration into quantum machine learning has grown tremendously in recent years due to the ability of quantum computers to speed up classical programs. However, these efforts have yet to solve unsupervised similarity detection tasks due to the challenge of porting them to run on quantum computers. To overcome this challenge, we propose SLIQ, the first open-sourced work for resource-efficient quan… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Journal ref: Vol. 37 No. 8: AAAI-2023 Technical Tracks 8

  6. QUILT: Effective Multi-Class Classification on Quantum Computers Using an Ensemble of Diverse Quantum Classifiers

    Authors: Daniel Silver, Tirthak Patel, Devesh Tiwari

    Abstract: Quantum computers can theoretically have significant acceleration over classical computers; but, the near-future era of quantum computing is limited due to small number of qubits that are also error prone. Quilt is a framework for performing multi-class classification task designed to work effectively on current error-prone quantum computers. Quilt is evaluated with real quantum machines as well a… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence 2022, 36(8), 8324-8332

  7. arXiv:2308.11096  [pdf, other

    quant-ph cs.AR cs.CV

    MosaiQ: Quantum Generative Adversarial Networks for Image Generation on NISQ Computers

    Authors: Daniel Silver, Tirthak Patel, William Cutler, Aditya Ranjan, Harshitta Gandhi, Devesh Tiwari

    Abstract: Quantum machine learning and vision have come to the fore recently, with hardware advances enabling rapid advancement in the capabilities of quantum machines. Recently, quantum image generation has been explored with many potential advantages over non-quantum techniques; however, previous techniques have suffered from poor quality and robustness. To address these problems, we introduce, MosaiQ, a… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: Accepted to appear at ICCV'23

  8. arXiv:2307.16799  [pdf, other

    quant-ph cs.AR cs.DC cs.ET

    Toward Privacy in Quantum Program Execution On Untrusted Quantum Cloud Computing Machines for Business-sensitive Quantum Needs

    Authors: Tirthak Patel, Daniel Silver, Aditya Ranjan, Harshitta Gandhi, William Cutler, Devesh Tiwari

    Abstract: Quantum computing is an emerging paradigm that has shown great promise in accelerating large-scale scientific, optimization, and machine-learning workloads. With most quantum computing solutions being offered over the cloud, it has become imperative to protect confidential and proprietary quantum code from being accessed by untrusted and/or adversarial agents. In response to this challenge, we pro… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

  9. A Hybrid 3D Eddy Detection Technique Based on Sea Surface Height and Velocity Field

    Authors: Wei** Hua, Karen Bemis, Dujuan Kang, Sedat Ozer, Deborah Silver

    Abstract: Eddy detection is a critical task for ocean scientists to understand and analyze ocean circulation. In this paper, we introduce a hybrid eddy detection approach that combines sea surface height (SSH) and velocity fields with geometric criteria defining eddy behavior. Our approach searches for SSH minima and maxima, which oceanographers expect to find at the center of eddies. Geometric criteria are… ▽ More

    Submitted 31 October, 2023; v1 submitted 14 May, 2023; originally announced May 2023.

    Comments: 8 pages, 14 figures. Accepted by EnvirVis 2023. Project Link: https://github.com/VizlabRutgers/Hybrid-Eddy-detection

  10. arXiv:2304.13626  [pdf, other

    cs.AI

    The Roles of Symbols in Neural-based AI: They are Not What You Think!

    Authors: Daniel L. Silver, Tom M. Mitchell

    Abstract: We propose that symbols are first and foremost external communication tools used between intelligent agents that allow knowledge to be transferred in a more efficient and effective manner than having to experience the world directly. But, they are also used internally within an agent through a form of self-communication to help formulate, describe and justify subsymbolic patterns of neural activit… ▽ More

    Submitted 26 April, 2023; originally announced April 2023.

    Comments: 28 pages

  11. arXiv:2303.10294  [pdf

    cs.LG

    Forecasting COVID-19 Case Counts Based on 2020 Ontario Data

    Authors: Daniel L. Silver, Rinda Digamarthi

    Abstract: Objective: To develop machine learning models that can predict the number of COVID-19 cases per day given the last 14 days of environmental and mobility data. Approach: COVID-19 data from four counties around Toronto, Ontario, were used. Data were prepared into daily records containing the number of new COVID case counts, patient demographic data, outdoor weather variables, indoor environment fa… ▽ More

    Submitted 17 March, 2023; originally announced March 2023.

    Report number: Acadia Institute for Data Analytics Technical Report Dec 2021

  12. arXiv:2211.09903  [pdf, other

    quant-ph cs.ET

    CHARTER: Identifying the Most-Critical Gate Operations in Quantum Circuits via Amplified Gate Reversibility

    Authors: Tirthak Patel, Daniel Silver, Devesh Tiwari

    Abstract: When quantum programs are executed on noisy intermediate-scale quantum (NISQ) computers, they experience hardware noise; consequently, the program outputs are often erroneous. To mitigate the adverse effects of hardware noise, it is necessary to understand the effect of hardware noise on the program output and more fundamentally, understand the impact of hardware noise on specific regions within a… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: This worked was published in SC'22

    Journal ref: SC22: International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 189-204. IEEE Computer Society, 2022

  13. arXiv:2209.10665  [pdf

    cs.SI

    Changing the Scene: applying four models of social evolution to the scenescape

    Authors: Daniel Silver, Thiago H Silva, Patrick Adler

    Abstract: This paper elaborates a multi-model approach to studying how local scenes change. We refer to this as the "4 D's" of scene change: development, differentiation, defense, and diffusion. Each posits somewhat distinct change processes, and has its own tradition of theory and empirical research, which we briefly review. After summarizing some major trends in scenes and amenities in the US context, for… ▽ More

    Submitted 21 September, 2022; originally announced September 2022.

    Comments: Published at Journal Wuhan University

    Journal ref: Journal Wuhan University 2022

  14. arXiv:2206.15378  [pdf, other

    cs.AI cs.GT cs.MA

    Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

    Authors: Julien Perolat, Bart de Vylder, Daniel Hennes, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T. Connor, Neil Burch, Thomas Anthony, Stephen McAleer, Romuald Elie, Sarah H. Cen, Zhe Wang, Audrunas Gruslys, Aleksandra Malysheva, Mina Khan, Sherjil Ozair, Finbarr Timbers, Toby Pohlen, Tom Eccles, Mark Rowland, Marc Lanctot, Jean-Baptiste Lespiau, Bilal Piot , et al. (9 additional authors not shown)

    Abstract: We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of $10^{535}$ nodes, i.e., $10^{175}$ times larger than that of Go. It has the additiona… ▽ More

    Submitted 30 June, 2022; originally announced June 2022.

  15. arXiv:2110.12840  [pdf, other

    cs.LG cs.AI stat.ML

    Self-Consistent Models and Values

    Authors: Gregory Farquhar, Kate Baumli, Zita Marinho, Angelos Filos, Matteo Hessel, Hado van Hasselt, David Silver

    Abstract: Learned models of the environment provide reinforcement learning (RL) agents with flexible ways of making predictions about the environment. In particular, models enable planning, i.e. using more computation to improve value functions or policies, without requiring additional environment interactions. In this work, we investigate a way of augmenting model-based RL, by additionally encouraging a le… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021

  16. arXiv:2109.08906  [pdf, other

    cs.SI cs.CY

    Reaching the bubble may not be enough: news media role in online political polarization

    Authors: Jordan K Kobellarz, Milos Brocic, Alexandre R Graeml, Daniel Silver, Thiago H Silva

    Abstract: Politics in different countries show diverse degrees of polarization, which tends to be stronger on social media, given how easy it became to connect and engage with like-minded individuals on the web. A way of reducing polarization would be by distributing cross-partisan news among individuals with distinct political orientations, i.e., ``reaching the bubbles''. This study investigates whether th… ▽ More

    Submitted 7 July, 2022; v1 submitted 18 September, 2021; originally announced September 2021.

  17. arXiv:2109.04504  [pdf, other

    cs.LG cs.AI stat.ML

    Bootstrapped Meta-Learning

    Authors: Sebastian Flennerhag, Yannick Schroecker, Tom Zahavy, Hado van Hasselt, David Silver, Satinder Singh

    Abstract: Meta-learning empowers artificial intelligence to increase its efficiency by learning how to learn. Unlocking this potential involves overcoming a challenging meta-optimisation problem. We propose an algorithm that tackles this problem by letting the meta-learner teach itself. The algorithm first bootstraps a target from the meta-learner, then optimises the meta-learner by minimising the distance… ▽ More

    Submitted 16 March, 2022; v1 submitted 9 September, 2021; originally announced September 2021.

    Comments: Published at ICLR 2022. 37 pages, 19 figures, 9 tables

  18. arXiv:2106.13105  [pdf, other

    cs.AI cs.LG

    The Option Keyboard: Combining Skills in Reinforcement Learning

    Authors: André Barreto, Diana Borsa, Shaobo Hou, Gheorghe Comanici, Eser Aygün, Philippe Hamel, Daniel Toyama, Jonathan Hunt, Shibl Mourad, David Silver, Doina Precup

    Abstract: The ability to combine known skills to create new ones may be crucial in the solution of complex reinforcement learning problems that unfold over extended periods. We argue that a robust way of combining skills is to define and manipulate them in the space of pseudo-rewards (or "cumulants"). Based on this premise, we propose a framework for combining skills using the formalism of options. We show… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

    Comments: Published at NeurIPS 2019

  19. arXiv:2106.10316  [pdf, other

    cs.AI cs.LG

    Proper Value Equivalence

    Authors: Christopher Grimm, André Barreto, Gregory Farquhar, David Silver, Satinder Singh

    Abstract: One of the main challenges in model-based reinforcement learning (RL) is to decide which aspects of the environment should be modeled. The value-equivalence (VE) principle proposes a simple answer to this question: a model should capture the aspects of the environment that are relevant for value-based planning. Technically, VE distinguishes models based on a set of policies and a set of functions:… ▽ More

    Submitted 12 December, 2021; v1 submitted 18 June, 2021; originally announced June 2021.

    Journal ref: NeurIPS 2021

  20. arXiv:2104.06303  [pdf, other

    cs.LG

    Learning and Planning in Complex Action Spaces

    Authors: Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Mohammadamin Barekatain, Simon Schmitt, David Silver

    Abstract: Many important real-world problems have action spaces that are high-dimensional, continuous or both, making full enumeration of all possible actions infeasible. Instead, only small subsets of actions can be sampled for the purpose of policy evaluation and improvement. In this paper, we propose a general framework to reason in a principled way about policy evaluation and improvement over such sampl… ▽ More

    Submitted 13 April, 2021; originally announced April 2021.

  21. arXiv:2104.06294  [pdf, other

    cs.LG

    Online and Offline Reinforcement Learning by Planning with a Learned Model

    Authors: Julian Schrittwieser, Thomas Hubert, Amol Mandhane, Mohammadamin Barekatain, Ioannis Antonoglou, David Silver

    Abstract: Learning efficiently from small amounts of data has long been the focus of model-based reinforcement learning, both for the online case when interacting with the environment and the offline case when learning from a fixed dataset. However, to date no single unified algorithm could demonstrate state-of-the-art results in both settings. In this work, we describe the Reanalyse algorithm which uses mo… ▽ More

    Submitted 13 April, 2021; originally announced April 2021.

  22. arXiv:2104.06159  [pdf, other

    cs.LG cs.AI

    Muesli: Combining Improvements in Policy Optimization

    Authors: Matteo Hessel, Ivo Danihelka, Fabio Viola, Arthur Guez, Simon Schmitt, Laurent Sifre, Theophane Weber, David Silver, Hado van Hasselt

    Abstract: We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero's state-of-the-art performance on Atari. Notably, Muesli does so without using deep search: it acts directly with a policy network and has computation speed comparable to model-free baselines. The Atari results are complemented by ex… ▽ More

    Submitted 31 March, 2022; v1 submitted 13 April, 2021; originally announced April 2021.

  23. arXiv:2103.02502  [pdf, other

    cs.HC cs.GR cs.IT

    A Bounded Measure for Estimating the Benefit of Visualization: Case Studies and Empirical Evaluation

    Authors: Min Chen, Alfie Abdul-Rahman, Deborah Silver, Mateu Sbert

    Abstract: Many visual representations, such as volume-rendered images and metro maps, feature a noticeable amount of information loss. At a glance, there seem to be numerous opportunities for viewers to misinterpret the data being visualized, hence undermining the benefits of these visual representations. In practice, there is little doubt that these visual representations are useful. The recently-proposed… ▽ More

    Submitted 3 March, 2021; originally announced March 2021.

    Comments: Following the SciVis 2020 reviewers' request for more explanation and clarification, the origianl article, "A Bounded Measure for Estimating the Benefit of Visualization, arxiv:2002.05282", has been split into two articles, on "Theoretical Discourse and Conceptual Evaluation" and "Case Studies and Empirical Evaluation" respectively. This is the second article

    Journal ref: Entropy, 24(2), 282, 2022

  24. arXiv:2102.06741  [pdf, other

    cs.LG cs.AI

    Discovery of Options via Meta-Learned Subgoals

    Authors: Vivek Veeriah, Tom Zahavy, Matteo Hessel, Zhongwen Xu, Junhyuk Oh, Iurii Kemaev, Hado van Hasselt, David Silver, Satinder Singh

    Abstract: Temporal abstractions in the form of options have been shown to help reinforcement learning (RL) agents learn faster. However, despite prior work on this topic, the problem of discovering options through interaction with an environment remains a challenge. In this paper, we introduce a novel meta-gradient approach for discovering useful options in multi-task RL environments. Our approach is based… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

  25. arXiv:2011.03506  [pdf, other

    cs.LG cs.AI

    The Value Equivalence Principle for Model-Based Reinforcement Learning

    Authors: Christopher Grimm, André Barreto, Satinder Singh, David Silver

    Abstract: Learning models of the environment from data is often viewed as an essential component to building intelligent reinforcement learning (RL) agents. The common practice is to separate the learning of the model from its use, by constructing a model of the environment's dynamics that correctly predicts the observed state transitions. In this paper we argue that the limited representational resources o… ▽ More

    Submitted 6 November, 2020; originally announced November 2020.

    Comments: NeurIPS-2020

  26. arXiv:2007.08794  [pdf, other

    cs.LG cs.AI

    Discovering Reinforcement Learning Algorithms

    Authors: Junhyuk Oh, Matteo Hessel, Wojciech M. Czarnecki, Zhongwen Xu, Hado van Hasselt, Satinder Singh, David Silver

    Abstract: Reinforcement learning (RL) algorithms update an agent's parameters according to one of several possible rules, discovered manually through years of research. Automating the discovery of update rules from data could lead to more efficient algorithms, or algorithms that are better adapted to specific environments. Although there have been prior attempts at addressing this significant scientific cha… ▽ More

    Submitted 5 January, 2021; v1 submitted 17 July, 2020; originally announced July 2020.

  27. arXiv:2007.08433  [pdf, other

    cs.LG cs.AI stat.ML

    Meta-Gradient Reinforcement Learning with an Objective Discovered Online

    Authors: Zhongwen Xu, Hado van Hasselt, Matteo Hessel, Junhyuk Oh, Satinder Singh, David Silver

    Abstract: Deep reinforcement learning includes a broad family of algorithms that parameterise an internal representation, such as a value function or policy, by a deep neural network. Each algorithm optimises its parameters with respect to an objective, such as Q-learning or policy gradient, that defines its semantics. In this work, we propose an algorithm based on meta-gradient descent that discovers its o… ▽ More

    Submitted 16 July, 2020; originally announced July 2020.

  28. arXiv:2007.01839  [pdf, other

    cs.LG cs.AI stat.ML

    Expected Eligibility Traces

    Authors: Hado van Hasselt, Sephora Madjiheurem, Matteo Hessel, David Silver, André Barreto, Diana Borsa

    Abstract: The question of how to determine which states and actions are responsible for a certain outcome is known as the credit assignment problem and remains a central research question in reinforcement learning and artificial intelligence. Eligibility traces enable efficient credit assignment to the recent sequence of states and actions experienced by the agent, but not to counterfactual sequences that c… ▽ More

    Submitted 8 February, 2021; v1 submitted 3 July, 2020; originally announced July 2020.

    Comments: AAAI, distinguished paper award

  29. arXiv:2006.02243  [pdf, other

    cs.LG stat.ML

    The Value-Improvement Path: Towards Better Representations for Reinforcement Learning

    Authors: Will Dabney, André Barreto, Mark Rowland, Robert Dadashi, John Quan, Marc G. Bellemare, David Silver

    Abstract: In value-based reinforcement learning (RL), unlike in supervised learning, the agent faces not a single, stationary, approximation problem, but a sequence of value prediction problems. Each time the policy improves, the nature of the problem changes, shifting both the distribution of states and their values. In this paper we take a novel perspective, arguing that the value prediction problems face… ▽ More

    Submitted 4 January, 2021; v1 submitted 3 June, 2020; originally announced June 2020.

    Comments: AAAI-21

  30. arXiv:2006.01035  [pdf, other

    cs.LG stat.ML

    Data-Driven Prediction of Embryo Implantation Probability Using IVF Time-lapse Imaging

    Authors: David H. Silver, Martin Feder, Yael Gold-Zamir, Avital L. Polsky, Shahar Rosentraub, Efrat Shachor, Adi Weinberger, Pavlo Mazur, Valery D. Zukin, Alex M. Bronstein

    Abstract: The process of fertilizing a human egg outside the body in order to help those suffering from infertility to conceive is known as in vitro fertilization (IVF). Despite being the most effective method of assisted reproductive technology (ART), the average success rate of IVF is a mere 20-40%. One step that is critical to the success of the procedure is selecting which embryo to transfer to the pati… ▽ More

    Submitted 2 June, 2020; v1 submitted 1 June, 2020; originally announced June 2020.

    Report number: MIDL/2020/ExtendedAbstract/TujK1uTkTP

  31. arXiv:2004.04278  [pdf, other

    cs.CV

    Estimating Grape Yield on the Vine from Multiple Images

    Authors: Daniel L. Silver, Jabun Nasa

    Abstract: Estimating grape yield prior to harvest is important to commercial vineyard production as it informs many vineyard and winery decisions. Currently, the process of yield estimation is time consuming and varies in its accuracy from 75-90\% depending on the experience of the viticulturist. This paper proposes a multiple task learning (MTL) convolutional neural network (CNN) approach that uses images… ▽ More

    Submitted 8 April, 2020; originally announced April 2020.

    Comments: Paper presented at the ICLR 2020 Workshop on Computer Vision for Agriculture (CV4A), 4 pages, 4 figures

  32. arXiv:2002.12928  [pdf, other

    stat.ML cs.LG

    A Self-Tuning Actor-Critic Algorithm

    Authors: Tom Zahavy, Zhongwen Xu, Vivek Veeriah, Matteo Hessel, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh

    Abstract: Reinforcement learning algorithms are highly sensitive to the choice of hyperparameters, typically requiring significant manual effort to identify hyperparameters that perform well on a new domain. In this paper, we take a step towards addressing this issue by using metagradients to automatically adapt hyperparameters online by meta-gradient descent (Xu et al., 2018). We apply our algorithm, Self-… ▽ More

    Submitted 14 April, 2021; v1 submitted 28 February, 2020; originally announced February 2020.

  33. arXiv:2002.08329  [pdf, other

    cs.LG stat.ML

    Value-driven Hindsight Modelling

    Authors: Arthur Guez, Fabio Viola, Théophane Weber, Lars Buesing, Steven Kapturowski, Doina Precup, David Silver, Nicolas Heess

    Abstract: Value estimation is a critical component of the reinforcement learning (RL) paradigm. The question of how to effectively learn value predictors from data is one of the major problems studied by the RL community, and different approaches exploit structure in the problem domain in different ways. Model learning can make use of the rich transition structure present in sequences of observations, but t… ▽ More

    Submitted 20 October, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

    Comments: 9 pages + reference + appendix. NeurIPS 2020 version

  34. arXiv:2002.05282  [pdf, other

    cs.AI cs.GR cs.HC cs.IT

    A Bounded Measure for Estimating the Benefit of Visualization

    Authors: Min Chen, Mateu Sbert, Alfie Abdul-Rahman, Deborah Silver

    Abstract: Information theory can be used to analyze the cost-benefit of visualization processes. However, the current measure of benefit contains an unbounded term that is neither easy to estimate nor intuitive to interpret. In this work, we propose to revise the existing cost-benefit measure by replacing the unbounded term with a bounded one. We examine a number of bounded measures that include the Jenson-… ▽ More

    Submitted 25 July, 2020; v1 submitted 12 February, 2020; originally announced February 2020.

    Comments: Comment on version 2: This revised version, which includes a new formal proof, many additions, and a detailed revision report, was submitted to SciVis 2020. Unexpectedly, our revision effort did not have much influence on the SciVis 2020 reviewers who gave an outright rejection with lower scores than EuroVis reviews. We will share these reviews after we have completed our feedback

    Journal ref: Entropy, 24(2), 228, 2022

  35. arXiv:1912.05500  [pdf, other

    cs.AI cs.LG

    What Can Learned Intrinsic Rewards Capture?

    Authors: Zeyu Zheng, Junhyuk Oh, Matteo Hessel, Zhongwen Xu, Manuel Kroiss, Hado van Hasselt, David Silver, Satinder Singh

    Abstract: The objective of a reinforcement learning agent is to behave so as to maximise the sum of a suitable scalar function of state: the reward. These rewards are typically given and immutable. In this paper, we instead consider the proposition that the reward function itself can be a good locus of learned knowledge. To investigate this, we propose a scalable meta-gradient framework for learning useful… ▽ More

    Submitted 21 August, 2020; v1 submitted 11 December, 2019; originally announced December 2019.

    Comments: ICML 2020. The first two authors contributed equally

  36. Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

    Authors: Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver

    Abstract: Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess and Go, where a perfect simulator is available. However, in real-world problems the dynamics governing the environment are often complex and unknown. In this work we present the… ▽ More

    Submitted 21 February, 2020; v1 submitted 19 November, 2019; originally announced November 2019.

  37. arXiv:1909.04607  [pdf, other

    cs.AI cs.LG

    Discovery of Useful Questions as Auxiliary Tasks

    Authors: Vivek Veeriah, Matteo Hessel, Zhongwen Xu, Richard Lewis, Janarthanan Rajendran, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh

    Abstract: Arguably, intelligent agents ought to be able to discover their own questions so that in learning answers for them they learn unanticipated useful knowledge and skills; this departs from the focus in much of machine learning on agents learning answers to externally defined questions. We present a novel method for a reinforcement learning (RL) agent to discover questions formulated as general value… ▽ More

    Submitted 10 September, 2019; originally announced September 2019.

  38. arXiv:1908.03568  [pdf, other

    cs.LG cs.AI stat.ML

    Behaviour Suite for Reinforcement Learning

    Authors: Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, Hado Van Hasselt

    Abstract: This paper introduces the Behaviour Suite for Reinforcement Learning, or bsuite for short. bsuite is a collection of carefully-designed experiments that investigate core capabilities of reinforcement learning (RL) agents with two objectives. First, to collect clear, informative and scalable problems that capture key issues in the design of general and efficient learning algorithms. Second, to stud… ▽ More

    Submitted 14 February, 2020; v1 submitted 9 August, 2019; originally announced August 2019.

  39. arXiv:1907.02908  [pdf, other

    cs.LG cs.AI stat.ML

    On Inductive Biases in Deep Reinforcement Learning

    Authors: Matteo Hessel, Hado van Hasselt, Joseph Modayil, David Silver

    Abstract: Many deep reinforcement learning algorithms contain inductive biases that sculpt the agent's objective and its interface to the environment. These inductive biases can take many forms, including domain knowledge and pretuned hyper-parameters. In general, there is a trade-off between generality and performance when algorithms use such biases. Stronger biases can lead to faster learning, but weaker… ▽ More

    Submitted 5 July, 2019; originally announced July 2019.

  40. arXiv:1903.05263  [pdf, other

    cs.LG stat.ML

    AutoML @ NeurIPS 2018 challenge: Design and Results

    Authors: Hugo Jair Escalante, Wei-Wei Tu, Isabelle Guyon, Daniel L. Silver, Evelyne Viegas, Yuqiang Chen, Wenyuan Dai, Qiang Yang

    Abstract: We organized a competition on Autonomous Lifelong Machine Learning with Drift that was part of the competition program of NeurIPS 2018. This data driven competition asked participants to develop computer programs capable of solving supervised learning problems where the i.i.d. assumption did not hold. Large data sets were arranged in a lifelong learning and evaluation scenario and CodaLab was used… ▽ More

    Submitted 13 March, 2019; v1 submitted 12 March, 2019; originally announced March 2019.

    Comments: Preprint submitted to NeurIPS2018 Volume of Springer Series on Challenges in Machine Learning

  41. arXiv:1901.10964  [pdf, other

    cs.LG cs.AI

    Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement

    Authors: André Barreto, Diana Borsa, John Quan, Tom Schaul, David Silver, Matteo Hessel, Daniel Mankowitz, Augustin Žídek, Rémi Munos

    Abstract: The ability to transfer skills across tasks has the potential to scale up reinforcement learning (RL) agents to environments currently out of reach. Recently, a framework based on two ideas, successor features (SFs) and generalised policy improvement (GPI), has been introduced as a principled way of transferring skills. In this paper we extend the SFs & GPI framework in two ways. One of the basic… ▽ More

    Submitted 30 January, 2019; originally announced January 2019.

    Comments: Published at ICML 2018

  42. arXiv:1901.03559  [pdf, other

    cs.LG cs.AI stat.ML

    An investigation of model-free planning

    Authors: Arthur Guez, Mehdi Mirza, Karol Gregor, Rishabh Kabra, Sébastien Racanière, Théophane Weber, David Raposo, Adam Santoro, Laurent Orseau, Tom Eccles, Greg Wayne, David Silver, Timothy Lillicrap

    Abstract: The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity. For an RL agent to address these challenges, it is essential that it can plan effectively. Prior work has typically utilized an explicit model of the environment, combined with a specific planning algorithm (such as tree search). More recently, a new family of methods have been propos… ▽ More

    Submitted 20 May, 2019; v1 submitted 11 January, 2019; originally announced January 2019.

  43. arXiv:1901.01761  [pdf, other

    cs.LG stat.ML

    Credit Assignment Techniques in Stochastic Computation Graphs

    Authors: Théophane Weber, Nicolas Heess, Lars Buesing, David Silver

    Abstract: Stochastic computation graphs (SCGs) provide a formalism to represent structured optimization problems arising in artificial intelligence, including supervised, unsupervised, and reinforcement learning. Previous work has shown that an unbiased estimator of the gradient of the expected loss of SCGs can be derived from a single principle. However, this estimator often has high variance and requires… ▽ More

    Submitted 7 January, 2019; originally announced January 2019.

  44. arXiv:1812.07626  [pdf, other

    cs.LG cs.AI stat.ML

    Universal Successor Features Approximators

    Authors: Diana Borsa, André Barreto, John Quan, Daniel Mankowitz, Rémi Munos, Hado van Hasselt, David Silver, Tom Schaul

    Abstract: The ability of a reinforcement learning (RL) agent to learn about many reward functions at the same time has many potential benefits, such as the decomposition of complex tasks into simpler ones, the exchange of information between tasks, and the reuse of skills. We focus on one aspect in particular, namely the ability to generalise to unseen tasks. Parametric generalisation relies on the interpol… ▽ More

    Submitted 18 December, 2018; originally announced December 2018.

  45. arXiv:1812.06855  [pdf, other

    cs.LG cs.AI stat.ML

    Bayesian Optimization in AlphaGo

    Authors: Yutian Chen, Aja Huang, Ziyu Wang, Ioannis Antonoglou, Julian Schrittwieser, David Silver, Nando de Freitas

    Abstract: During the development of AlphaGo, its many hyper-parameters were tuned with Bayesian optimization multiple times. This automatic tuning process resulted in substantial improvements in playing strength. For example, prior to the match with Lee Sedol, we tuned the latest AlphaGo agent and this improved its win-rate from 50% to 66.5% in self-play games. This tuned version was deployed in the final m… ▽ More

    Submitted 17 December, 2018; originally announced December 2018.

  46. arXiv:1807.01281  [pdf, other

    cs.LG cs.AI stat.ML

    Human-level performance in first-person multiplayer games with population-based deep reinforcement learning

    Authors: Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio Garcia Castaneda, Charles Beattie, Neil C. Rabinowitz, Ari S. Morcos, Avraham Ruderman, Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z. Leibo, David Silver, Demis Hassabis, Koray Kavukcuoglu, Thore Graepel

    Abstract: Recent progress in artificial intelligence through reinforcement learning (RL) has shown great success on increasingly complex single-agent environments and two-player turn-based games. However, the real-world contains multiple agents, each learning and acting independently to cooperate and compete with other agents, and environments reflecting this degree of complexity remain an open challenge. I… ▽ More

    Submitted 3 July, 2018; originally announced July 2018.

  47. arXiv:1806.06923  [pdf, other

    cs.LG cs.AI stat.ML

    Implicit Quantile Networks for Distributional Reinforcement Learning

    Authors: Will Dabney, Georg Ostrovski, David Silver, Rémi Munos

    Abstract: In this work, we build on recent advances in distributional reinforcement learning to give a generally applicable, flexible, and state-of-the-art distributional variant of DQN. We achieve this by using quantile regression to approximate the full quantile function for the state-action return distribution. By reparameterizing a distribution over the sample space, this yields an implicitly defined re… ▽ More

    Submitted 14 June, 2018; originally announced June 2018.

    Comments: ICML 2018

  48. arXiv:1805.09801  [pdf, other

    cs.LG cs.AI stat.ML

    Meta-Gradient Reinforcement Learning

    Authors: Zhongwen Xu, Hado van Hasselt, David Silver

    Abstract: The goal of reinforcement learning algorithms is to estimate and/or optimise the value function. However, unlike supervised learning, no teacher or oracle is available to provide the true value function. Instead, the majority of reinforcement learning algorithms estimate and/or optimise a proxy for the value function. This proxy is typically based on a sampled and bootstrapped approximation to the… ▽ More

    Submitted 24 May, 2018; originally announced May 2018.

  49. arXiv:1803.10760  [pdf, other

    cs.LG stat.ML

    Unsupervised Predictive Memory in a Goal-Directed Agent

    Authors: Greg Wayne, Chia-Chun Hung, David Amos, Mehdi Mirza, Arun Ahuja, Agnieszka Grabska-Barwinska, Jack Rae, Piotr Mirowski, Joel Z. Leibo, Adam Santoro, Mevlana Gemici, Malcolm Reynolds, Tim Harley, Josh Abramson, Shakir Mohamed, Danilo Rezende, David Saxton, Adam Cain, Chloe Hillier, David Silver, Koray Kavukcuoglu, Matt Botvinick, Demis Hassabis, Timothy Lillicrap

    Abstract: Animals execute goal-directed behaviours despite the limited range and scope of their sensors. To cope, they explore environments and store memories maintaining estimates of important information that is not presently available. Recently, progress has been made with artificial intelligence (AI) agents that learn to perform tasks from sensory input, even at a human level, by merging reinforcement l… ▽ More

    Submitted 28 March, 2018; originally announced March 2018.

  50. arXiv:1803.00933  [pdf, other

    cs.LG

    Distributed Prioritized Experience Replay

    Authors: Dan Horgan, John Quan, David Budden, Gabriel Barth-Maron, Matteo Hessel, Hado van Hasselt, David Silver

    Abstract: We propose a distributed architecture for deep reinforcement learning at scale, that enables agents to learn effectively from orders of magnitude more data than previously possible. The algorithm decouples acting from learning: the actors interact with their own instances of the environment by selecting actions according to a shared neural network, and accumulate the resulting experience in a shar… ▽ More

    Submitted 2 March, 2018; originally announced March 2018.

    Comments: Accepted to International Conference on Learning Representations 2018