Skip to main content

Showing 1–11 of 11 results for author: Firoiu, V

.
  1. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  2. arXiv:2209.14375  [pdf, other

    cs.LG cs.CL

    Improving alignment of dialogue agents via targeted human judgements

    Authors: Amelia Glaese, Nat McAleese, Maja Trębacz, John Aslanides, Vlad Firoiu, Timo Ewalds, Maribeth Rauh, Laura Weidinger, Martin Chadwick, Phoebe Thacker, Lucy Campbell-Gillingham, Jonathan Uesato, Po-Sen Huang, Ramona Comanescu, Fan Yang, Abigail See, Sumanth Dathathri, Rory Greig, Charlie Chen, Doug Fritz, Jaume Sanchez Elias, Richard Green, Soňa Mokrá, Nicholas Fernando, Boxi Wu , et al. (9 additional authors not shown)

    Abstract: We present Sparrow, an information-seeking dialogue agent trained to be more helpful, correct, and harmless compared to prompted language model baselines. We use reinforcement learning from human feedback to train our models with two new additions to help human raters judge agent behaviour. First, to make our agent more helpful and harmless, we break down the requirements for good dialogue into na… ▽ More

    Submitted 28 September, 2022; originally announced September 2022.

  3. arXiv:2112.10664  [pdf, other

    cs.AI cs.LO

    Proving Theorems using Incremental Learning and Hindsight Experience Replay

    Authors: Eser Aygün, Laurent Orseau, Ankit Anand, Xavier Glorot, Vlad Firoiu, Lei M. Zhang, Doina Precup, Shibl Mourad

    Abstract: Traditional automated theorem provers for first-order logic depend on speed-optimized search and many handcrafted heuristics that are designed to work best over a wide range of domains. Machine learning approaches in literature either depend on these traditional provers to bootstrap themselves or fall short on reaching comparable performance. In this paper, we propose a general incremental learnin… ▽ More

    Submitted 20 December, 2021; originally announced December 2021.

    Comments: 16 pages, 2 figures

    ACM Class: I.2.3

  4. arXiv:2103.03798  [pdf, other

    cs.AI

    Training a First-Order Theorem Prover from Synthetic Data

    Authors: Vlad Firoiu, Eser Aygun, Ankit Anand, Zafarali Ahmed, Xavier Glorot, Laurent Orseau, Lei Zhang, Doina Precup, Shibl Mourad

    Abstract: A major challenge in applying machine learning to automated theorem proving is the scarcity of training data, which is a key ingredient in training successful deep learning models. To tackle this problem, we propose an approach that relies on training purely with synthetically generated theorems, without any human data aside from axioms. We use these theorems to train a neurally-guided saturation-… ▽ More

    Submitted 6 April, 2021; v1 submitted 5 March, 2021; originally announced March 2021.

  5. arXiv:2006.11259  [pdf, other

    cs.LO cs.LG

    Learning to Prove from Synthetic Theorems

    Authors: Eser Aygün, Zafarali Ahmed, Ankit Anand, Vlad Firoiu, Xavier Glorot, Laurent Orseau, Doina Precup, Shibl Mourad

    Abstract: A major challenge in applying machine learning to automated theorem proving is the scarcity of training data, which is a key ingredient in training successful deep learning models. To tackle this problem, we propose an approach that relies on training with synthetic theorems, generated from a set of axioms. We show that such theorems can be used to train an automated prover and that the learned pr… ▽ More

    Submitted 19 June, 2020; originally announced June 2020.

    Comments: 17 pages, 6 figures, submitted to NeurIPS 2020

    ACM Class: I.2.3

  6. arXiv:1909.12892  [pdf, other

    cs.LG cs.AI stat.ML

    Automated curricula through setter-solver interactions

    Authors: Sebastien Racaniere, Andrew K. Lampinen, Adam Santoro, David P. Reichert, Vlad Firoiu, Timothy P. Lillicrap

    Abstract: Reinforcement learning algorithms use correlations between policies and rewards to improve agent performance. But in dynamic or sparsely rewarding environments these correlations are often too small, or rewarding events are too infrequent to make learning feasible. Human education instead relies on curricula--the breakdown of tasks into simpler, static challenges with dense rewards--to build up to… ▽ More

    Submitted 21 January, 2020; v1 submitted 27 September, 2019; originally announced September 2019.

    Journal ref: International Conference on Learning Representations, 2020

  7. arXiv:1810.07286  [pdf, other

    cs.AI cs.LG

    At Human Speed: Deep Reinforcement Learning with Action Delay

    Authors: Vlad Firoiu, Tina Ju, Josh Tenenbaum

    Abstract: There has been a recent explosion in the capabilities of game-playing artificial intelligence. Many classes of tasks, from video games to motor control to board games, are now solvable by fairly generic algorithms, based on deep learning and reinforcement learning, that learn to play from experience with minimal prior knowledge. However, these machines often do not win through intelligence alone -… ▽ More

    Submitted 16 October, 2018; originally announced October 2018.

  8. arXiv:1802.01561  [pdf, other

    cs.LG cs.AI

    IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

    Authors: Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane Legg, Koray Kavukcuoglu

    Abstract: In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and extended training time. We have developed a new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) that not only uses resources more efficiently in single-machine training but also… ▽ More

    Submitted 28 June, 2018; v1 submitted 5 February, 2018; originally announced February 2018.

  9. arXiv:1702.06230  [pdf, other

    cs.LG cs.AI

    Beating the World's Best at Super Smash Bros. with Deep Reinforcement Learning

    Authors: Vlad Firoiu, William F. Whitney, Joshua B. Tenenbaum

    Abstract: There has been a recent explosion in the capabilities of game-playing artificial intelligence. Many classes of RL tasks, from Atari games to motor control to board games, are now solvable by fairly generic algorithms, based on deep learning, that learn to play from experience with minimal knowledge of the specific domain of interest. In this work, we will investigate the performance of these metho… ▽ More

    Submitted 8 May, 2017; v1 submitted 20 February, 2017; originally announced February 2017.

    MSC Class: I.2.6

  10. arXiv:1506.00308  [pdf, other

    stat.ML

    Automatic Inference for Inverting Software Simulators via Probabilistic Programming

    Authors: Ardavan Saeedi, Vlad Firoiu, Vikash Mansinghka

    Abstract: Models of complex systems are often formalized as sequential software simulators: computationally intensive programs that iteratively build up probable system configurations given parameters and initial conditions. These simulators enable modelers to capture effects that are difficult to characterize analytically or summarize statistically. However, in many real-world applications, these simulatio… ▽ More

    Submitted 31 May, 2015; originally announced June 2015.

    Comments: ICML 2014 AutoML Workshop

  11. arXiv:cs/0406019  [pdf, ps, other

    cs.NI

    Providing Service Guarantees in High-Speed Switching Systems with Feedback Output Queuing

    Authors: Victor Firoiu, Xiaohui Zhang, Emre Gunduzhan, Nicolas Christin

    Abstract: We consider the problem of providing service guarantees in a high-speed packet switch. As basic requirements, the switch should be scalable to high speeds per port, a large number of ports and a large number of traffic flows with independent guarantees. Existing scalable solutions are based on Virtual Output Queuing, which is computationally complex when required to provide service guarantees fo… ▽ More

    Submitted 11 June, 2004; originally announced June 2004.

    Comments: 30 pages, 9 figures. Shorter preliminary version appeared in Proceedings of Hot Interconnects X, Stanford CA, August 2002

    ACM Class: C.2.1; C.2.6