Skip to main content

Showing 1–15 of 15 results for author: Assael, Y

.
  1. arXiv:2403.13793  [pdf, other

    cs.LG

    Evaluating Frontier Models for Dangerous Capabilities

    Authors: Mary Phuong, Matthew Aitchison, Elliot Catt, Sarah Cogan, Alexandre Kaskasoli, Victoria Krakovna, David Lindner, Matthew Rahtz, Yannis Assael, Sarah Hodkinson, Heidi Howard, Tom Lieberum, Ramana Kumar, Maria Abi Raad, Albert Webson, Lewis Ho, Sharon Lin, Sebastian Farquhar, Marcus Hutter, Gregoire Deletang, Anian Ruoss, Seliem El-Sayed, Sasha Brown, Anca Dragan, Rohin Shah , et al. (2 additional authors not shown)

    Abstract: To understand the risks posed by a new AI system, we must understand what it can and cannot do. Building on prior work, we introduce a programme of new "dangerous capability" evaluations and pilot them on Gemini 1.0 models. Our evaluations cover four areas: (1) persuasion and deception; (2) cyber-security; (3) self-proliferation; and (4) self-reasoning. We do not find evidence of strong dangerous… ▽ More

    Submitted 5 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  2. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  3. arXiv:2107.00692  [pdf, other

    cs.CL

    Interactive decoding of words from visual speech recognition models

    Authors: Brendan Shillingford, Yannis Assael, Misha Denil

    Abstract: This work describes an interactive decoding method to improve the performance of visual speech recognition systems using user input to compensate for the inherent ambiguity of the task. Unlike most phoneme-to-word decoding pipelines, which produce phonemes and feed these through a finite state transducer, our method instead expands words in lockstep, facilitating the insertion of interaction point… ▽ More

    Submitted 1 July, 2021; originally announced July 2021.

    Comments: 8 pages

  4. arXiv:2011.03530  [pdf, other

    cs.CV cs.SD eess.AS

    Large-scale multilingual audio visual dubbing

    Authors: Yi Yang, Brendan Shillingford, Yannis Assael, Miaosen Wang, Wendi Liu, Yutian Chen, Yu Zhang, Eren Sezener, Luis C. Cobo, Misha Denil, Yusuf Aytar, Nando de Freitas

    Abstract: We describe a system for large-scale audiovisual translation and dubbing, which translates videos from one language to another. The source language's speech content is transcribed to text, translated, and automatically synthesized into target language speech using the original speaker's voice. The visual content is translated by synthesizing lip movements for the speaker to match the translated au… ▽ More

    Submitted 6 November, 2020; originally announced November 2020.

    Comments: 26 pages, 8 figures

  5. arXiv:1911.04890  [pdf, other

    eess.AS cs.CL cs.CV cs.LG cs.SD

    Recurrent Neural Network Transducer for Audio-Visual Speech Recognition

    Authors: Takaki Makino, Hank Liao, Yannis Assael, Brendan Shillingford, Basilio Garcia, Otavio Braga, Olivier Siohan

    Abstract: This work presents a large-scale audio-visual speech recognition system based on a recurrent neural network transducer (RNN-T) architecture. To support the development of such a system, we built a large audio-visual (A/V) dataset of segmented utterances extracted from YouTube public videos, leading to 31k hours of audio-visual training content. The performance of an audio-only, visual-only, and au… ▽ More

    Submitted 8 November, 2019; originally announced November 2019.

    Comments: Will be presented in 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2019)

  6. arXiv:1910.06262  [pdf, other

    cs.CL cs.CY

    Restoring ancient text using deep learning: a case study on Greek epigraphy

    Authors: Yannis Assael, Thea Sommerschield, Jonathan Prag

    Abstract: Ancient history relies on disciplines such as epigraphy, the study of ancient inscribed texts, for evidence of the recorded past. However, these texts, "inscriptions", are often damaged over the centuries, and illegible parts of the text must be restored by specialists, known as epigraphists. This work presents Pythia, the first ancient text restoration model that recovers missing characters from… ▽ More

    Submitted 14 October, 2019; originally announced October 2019.

    Journal ref: Empirical Methods in Natural Language Processing (EMNLP) 2019

  7. arXiv:1907.04927  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Speech bandwidth extension with WaveNet

    Authors: Archit Gupta, Brendan Shillingford, Yannis Assael, Thomas C. Walters

    Abstract: Large-scale mobile communication systems tend to contain legacy transmission channels with narrowband bottlenecks, resulting in characteristic "telephone-quality" audio. While higher quality codecs exist, due to the scale and heterogeneity of the networks, transmitting higher sample rate audio with modern high-quality audio codecs can be difficult in practice. This paper proposes an approach where… ▽ More

    Submitted 5 July, 2019; originally announced July 2019.

  8. arXiv:1809.10460  [pdf, other

    cs.LG cs.SD stat.ML

    Sample Efficient Adaptive Text-to-Speech

    Authors: Yutian Chen, Yannis Assael, Brendan Shillingford, David Budden, Scott Reed, Heiga Zen, Quan Wang, Luis C. Cobo, Andrew Trask, Ben Laurie, Caglar Gulcehre, Aäron van den Oord, Oriol Vinyals, Nando de Freitas

    Abstract: We present a meta-learning approach for adaptive text-to-speech (TTS) with few data. During training, we learn a multi-speaker model using a shared conditional WaveNet core and independent learned embeddings for each speaker. The aim of training is not to produce a neural network with fixed weights, which is then deployed as a TTS system. Instead, the aim is to produce a network that requires few… ▽ More

    Submitted 16 January, 2019; v1 submitted 27 September, 2018; originally announced September 2018.

    Comments: Accepted by ICLR 2019

  9. arXiv:1807.05162  [pdf, other

    cs.CV cs.LG

    Large-Scale Visual Speech Recognition

    Authors: Brendan Shillingford, Yannis Assael, Matthew W. Hoffman, Thomas Paine, Cían Hughes, Utsav Prabhu, Hank Liao, Hasim Sak, Kanishka Rao, Lorrayne Bennett, Marie Mulville, Ben Coppin, Ben Laurie, Andrew Senior, Nando de Freitas

    Abstract: This work presents a scalable solution to open-vocabulary visual speech recognition. To achieve this, we constructed the largest existing visual speech recognition dataset, consisting of pairs of text and video clips of faces speaking (3,886 hours of video). In tandem, we designed and trained an integrated lipreading system, consisting of a video processing pipeline that maps raw video to stable v… ▽ More

    Submitted 1 October, 2018; v1 submitted 13 July, 2018; originally announced July 2018.

  10. arXiv:1801.09466  [pdf, other

    cs.AI math.OC

    Using deep Q-learning to understand the tax evasion behavior of risk-averse firms

    Authors: Nikolaos D. Goumagias, Dimitrios Hristu-Varsakelis, Yannis M. Assael

    Abstract: Designing tax policies that are effective in curbing tax evasion and maximize state revenues requires a rigorous understanding of taxpayer behavior. This work explores the problem of determining the strategy a self-interested, risk-averse tax entity is expected to follow, as it "navigates" - in the context of a Markov Decision Process - a government-controlled tax environment that includes random… ▽ More

    Submitted 29 January, 2018; originally announced January 2018.

    Comments: Preprint - accepted for publication in Expert Systems with Applications

  11. arXiv:1711.02448  [pdf, other

    q-bio.NC cs.NE stat.ML

    Cortical microcircuits as gated-recurrent neural networks

    Authors: Rui Ponte Costa, Yannis M. Assael, Brendan Shillingford, Nando de Freitas, Tim P. Vogels

    Abstract: Cortical circuits exhibit intricate recurrent architectures that are remarkably similar across different brain areas. Such stereotyped structure suggests the existence of common computational principles. However, such principles have remained largely elusive. Inspired by gated-memory networks, namely long short-term memory networks (LSTMs), we introduce a recurrent neural network in which informat… ▽ More

    Submitted 3 January, 2018; v1 submitted 7 November, 2017; originally announced November 2017.

    Comments: To appear in Advances in Neural Information Processing Systems 30 (NIPS 2017). 13 pages, 2 figures (and 1 supp. figure)

  12. arXiv:1611.01599  [pdf, other

    cs.LG cs.CL cs.CV

    LipNet: End-to-End Sentence-level Lipreading

    Authors: Yannis M. Assael, Brendan Shillingford, Shimon Whiteson, Nando de Freitas

    Abstract: Lipreading is the task of decoding text from the movement of a speaker's mouth. Traditional approaches separated the problem into two stages: designing or learning visual features, and prediction. More recent deep lipreading approaches are end-to-end trainable (Wand et al., 2016; Chung & Zisserman, 2016a). However, existing work on models trained end-to-end perform only word classification, rather… ▽ More

    Submitted 16 December, 2016; v1 submitted 5 November, 2016; originally announced November 2016.

  13. arXiv:1610.02707  [pdf, other

    cs.AI

    Multi-Objective Deep Reinforcement Learning

    Authors: Hossam Mossalam, Yannis M. Assael, Diederik M. Roijers, Shimon Whiteson

    Abstract: We propose Deep Optimistic Linear Support Learning (DOL) to solve high-dimensional multi-objective decision problems where the relative importances of the objectives are not known a priori. Using features from the high-dimensional inputs, DOL computes the convex coverage set containing all potential optimal solutions of the convex combinations of the objectives. To our knowledge, this is the first… ▽ More

    Submitted 9 October, 2016; originally announced October 2016.

  14. arXiv:1605.06676  [pdf, other

    cs.AI cs.LG cs.MA

    Learning to Communicate with Deep Multi-Agent Reinforcement Learning

    Authors: Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, Shimon Whiteson

    Abstract: We consider the problem of multiple agents sensing and acting in environments with the goal of maximising their shared utility. In these environments, agents must learn communication protocols in order to share information that is needed to solve the tasks. By embracing deep neural networks, we are able to demonstrate end-to-end learning of protocols in complex environments inspired by communicati… ▽ More

    Submitted 24 May, 2016; v1 submitted 21 May, 2016; originally announced May 2016.

  15. arXiv:1602.02672  [pdf, other

    cs.AI cs.LG

    Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks

    Authors: Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, Shimon Whiteson

    Abstract: We propose deep distributed recurrent Q-networks (DDRQN), which enable teams of agents to learn to solve communication-based coordination tasks. In these tasks, the agents are not given any pre-designed communication protocol. Therefore, in order to successfully communicate, they must first automatically develop and agree upon their own communication protocol. We present empirical results on two m… ▽ More

    Submitted 8 February, 2016; originally announced February 2016.