Skip to main content

Showing 1–50 of 55 results for author: Gülçehre, Ç

.
  1. arXiv:2406.16450  [pdf, other

    cs.CL

    Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers

    Authors: Xiuying Wei, Skander Moalla, Razvan Pascanu, Caglar Gulcehre

    Abstract: State-of-the-art results in large language models (LLMs) often rely on scale, which becomes computationally expensive. This has sparked a research agenda to reduce these models' parameter count and computational costs without significantly impacting their performance. Our study focuses on transformer-based LLMs, specifically targeting the computationally intensive feedforward networks (FFN), which… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  2. arXiv:2406.14155  [pdf, other

    cs.CL

    Aligning Large Language Models with Diverse Political Viewpoints

    Authors: Dominik Stammbach, Philine Widmer, Eunjung Cho, Caglar Gulcehre, Elliott Ash

    Abstract: Large language models such as ChatGPT often exhibit striking political biases. If users query them about political information, they might take a normative stance and reinforce such biases. To overcome this, we align LLMs with diverse political viewpoints from 100,000 comments written by candidates running for national parliament in Switzerland. Such aligned models are able to generate more accura… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  3. arXiv:2406.11473  [pdf, other

    cs.CL cs.AI

    Promises, Outlooks and Challenges of Diffusion Language Modeling

    Authors: Justin Deschenaux, Caglar Gulcehre

    Abstract: The modern autoregressive Large Language Models (LLMs) have achieved outstanding performance on NLP benchmarks, and they are deployed in the real world. However, they still suffer from limitations of the autoregressive training paradigm. For example, autoregressive token generation is notably slow and can be prone to \textit{exposure bias}. The diffusion-based language models were proposed as an a… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  4. arXiv:2406.06793  [pdf, other

    cs.LG cs.AI

    PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-Performer

    Authors: Chang Chen, Junyeob Baek, Fei Deng, Kenji Kawaguchi, Caglar Gulcehre, Sung** Ahn

    Abstract: Despite the recent advancements in offline RL, no unified algorithm could achieve superior performance across a broad range of tasks. Offline \textit{value function learning}, in particular, struggles with sparse-reward, long-horizon tasks due to the difficulty of solving credit assignment and extrapolation errors that accumulates as the horizon of the task grows.~On the other hand, models that ca… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  5. arXiv:2405.06691  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    Fleet of Agents: Coordinated Problem Solving with Large Language Models using Genetic Particle Filtering

    Authors: Akhil Arora, Lars Klein, Nearchos Potamitis, Roland Aydin, Caglar Gulcehre, Robert West

    Abstract: Large language models (LLMs) have significantly evolved, moving from simple output generation to complex reasoning and from stand-alone usage to being embedded into broader frameworks. In this paper, we introduce \emph{Fleet of Agents (FoA)}, a novel framework utilizing LLMs as agents to navigate through dynamic tree searches, employing a genetic-type particle filtering approach. FoA spawns a mult… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 11 pages, 1 figure, 4 tables

  6. arXiv:2405.00662  [pdf, other

    cs.LG

    No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO

    Authors: Skander Moalla, Andrea Miele, Razvan Pascanu, Caglar Gulcehre

    Abstract: Reinforcement learning (RL) is inherently rife with non-stationarity since the states and rewards the agent observes during training depend on its changing policy. Therefore, networks in deep RL must be capable of adapting to new observations and fitting new targets. However, previous works have observed that networks in off-policy deep value-based methods exhibit a decrease in representation rank… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: Code and run histories are available at https://github.com/CLAIRE-Labo/no-representation-no-trust

  7. arXiv:2402.19427  [pdf, other

    cs.LG cs.CL

    Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

    Authors: Soham De, Samuel L. Smith, Anushan Fernando, Aleksandar Botev, George Cristian-Muraru, Albert Gu, Ruba Haroun, Leonard Berrada, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, Arnaud Doucet, David Budden, Yee Whye Teh, Razvan Pascanu, Nando De Freitas, Caglar Gulcehre

    Abstract: Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to scale. We propose Hawk, an RNN with gated linear recurrences, and Griffin, a hybrid model that mixes gated linear recurrences with local attention. Hawk exceeds the reported performance of Mamba on downstream tasks, while Griffin matches the performance of Llama… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: 25 pages, 11 figures

  8. arXiv:2401.02644  [pdf, other

    cs.LG cs.AI

    Simple Hierarchical Planning with Diffusion

    Authors: Chang Chen, Fei Deng, Kenji Kawaguchi, Caglar Gulcehre, Sung** Ahn

    Abstract: Diffusion-based generative methods have proven effective in modeling trajectories with offline datasets. However, they often face computational challenges and can falter in generalization, especially in capturing temporal abstractions for long-horizon tasks. To overcome this, we introduce the Hierarchical Diffuser, a simple, fast, yet surprisingly effective planning method combining the advantages… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  9. arXiv:2311.09064  [pdf, other

    cs.CV cs.LG

    Imagine the Unseen World: A Benchmark for Systematic Generalization in Visual World Models

    Authors: Yeongbin Kim, Gautam Singh, Junyeong Park, Caglar Gulcehre, Sung** Ahn

    Abstract: Systematic compositionality, or the ability to adapt to novel situations by creating a mental model of the world using reusable pieces of knowledge, remains a significant challenge in machine learning. While there has been considerable progress in the language domain, efforts towards systematic visual imagination, or envisioning the dynamical implications of a visual observation, are in their infa… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: Published as a conference paper at NeurIPS 2023. The first two authors contributed equally. To download the benchmark, visit https://systematic-visual-imagination.github.io

  10. arXiv:2308.08998  [pdf, other

    cs.CL cs.LG

    Reinforced Self-Training (ReST) for Language Modeling

    Authors: Caglar Gulcehre, Tom Le Paine, Srivatsan Srinivasan, Ksenia Konyushkova, Lotte Weerts, Abhishek Sharma, Aditya Siddhant, Alex Ahern, Miaosen Wang, Chenjie Gu, Wolfgang Macherey, Arnaud Doucet, Orhan Firat, Nando de Freitas

    Abstract: Reinforcement learning from human feedback (RLHF) can improve the quality of large language model's (LLM) outputs by aligning them with human preferences. We propose a simple algorithm for aligning LLMs with human preferences inspired by growing batch reinforcement learning (RL), which we call Reinforced Self-Training (ReST). Given an initial LLM policy, ReST produces a dataset by generating sampl… ▽ More

    Submitted 21 August, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

    Comments: 23 pages, 16 figures

  11. arXiv:2308.03526  [pdf, other

    cs.LG cs.AI

    AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning

    Authors: Michaël Mathieu, Sherjil Ozair, Srivatsan Srinivasan, Caglar Gulcehre, Shangtong Zhang, Ray Jiang, Tom Le Paine, Richard Powell, Konrad Żołna, Julian Schrittwieser, David Choi, Petko Georgiev, Daniel Toyama, Aja Huang, Roman Ring, Igor Babuschkin, Timo Ewalds, Mahyar Bordbar, Sarah Henderson, Sergio Gómez Colmenarejo, Aäron van den Oord, Wojciech Marian Czarnecki, Nando de Freitas, Oriol Vinyals

    Abstract: StarCraft II is one of the most challenging simulated reinforcement learning environments; it is partially observable, stochastic, multi-agent, and mastering StarCraft II requires strategic planning over long time horizons with real-time low-level execution. It also has an active professional competitive scene. StarCraft II is uniquely suited for advancing offline RL algorithms, both because of it… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: 32 pages, 13 figures, previous version published as a NeurIPS 2021 workshop: https://openreview.net/forum?id=Np8Pumfoty

  12. arXiv:2307.11888  [pdf, other

    cs.LG cs.NE

    Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues

    Authors: Antonio Orvieto, Soham De, Caglar Gulcehre, Razvan Pascanu, Samuel L. Smith

    Abstract: Deep neural networks based on linear RNNs interleaved with position-wise MLPs are gaining traction as competitive approaches for sequence modeling. Examples of such architectures include state-space models (SSMs) like S4, LRU, and Mamba: recently proposed models that achieve promising performance on text, genetics, and other data that require long-range reasoning. Despite experimental evidence hig… ▽ More

    Submitted 5 June, 2024; v1 submitted 21 July, 2023; originally announced July 2023.

    Comments: v1: Accepted at HLD 2023 Workshop @ICML; v2: Preprint; v3: ICML version

  13. arXiv:2303.06349  [pdf, other

    cs.LG

    Resurrecting Recurrent Neural Networks for Long Sequences

    Authors: Antonio Orvieto, Samuel L Smith, Albert Gu, Anushan Fernando, Caglar Gulcehre, Razvan Pascanu, Soham De

    Abstract: Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are hard to optimize and slow to train. Deep state-space models (SSMs) have recently been shown to perform remarkably well on long sequence modeling tasks, and have the added benefits of fast parallelizable training and RNN-like fast inference. However, while SSMs are superficially similar to RNNs, there are important diff… ▽ More

    Submitted 11 March, 2023; originally announced March 2023.

    Comments: 30 pages, 9 figures

  14. arXiv:2207.02099  [pdf, other

    cs.LG

    An Empirical Study of Implicit Regularization in Deep Offline RL

    Authors: Caglar Gulcehre, Srivatsan Srinivasan, Jakub Sygnowski, Georg Ostrovski, Mehrdad Farajtabar, Matt Hoffman, Razvan Pascanu, Arnaud Doucet

    Abstract: Deep neural networks are the most commonly used function approximators in offline reinforcement learning. Prior works have shown that neural nets trained with TD-learning and gradient descent can exhibit implicit regularization that can be characterized by under-parameterization of these networks. Specifically, the rank of the penultimate feature layer, also called \textit{effective rank}, has bee… ▽ More

    Submitted 7 July, 2022; v1 submitted 5 July, 2022; originally announced July 2022.

    Comments: 40 pages, 37 figures, 2 tables

  15. arXiv:2106.10251  [pdf, other

    cs.LG cs.AI stat.ML

    Active Offline Policy Selection

    Authors: Ksenia Konyushkova, Yutian Chen, Tom Le Paine, Caglar Gulcehre, Cosmin Paduraru, Daniel J Mankowitz, Misha Denil, Nando de Freitas

    Abstract: This paper addresses the problem of policy selection in domains with abundant logged data, but with a restricted interaction budget. Solving this problem would enable safe evaluation and deployment of offline reinforcement learning policies in industry, robotics, and recommendation domains among others. Several off-policy evaluation (OPE) techniques have been proposed to assess the value of polici… ▽ More

    Submitted 6 May, 2022; v1 submitted 18 June, 2021; originally announced June 2021.

    Comments: Presented at NeurIPS 2021

  16. arXiv:2105.10148  [pdf, other

    cs.LG stat.ML

    On Instrumental Variable Regression for Deep Offline Policy Evaluation

    Authors: Yutian Chen, Liyuan Xu, Caglar Gulcehre, Tom Le Paine, Arthur Gretton, Nando de Freitas, Arnaud Doucet

    Abstract: We show that the popular reinforcement learning (RL) strategy of estimating the state-action value (Q-function) by minimizing the mean squared Bellman error leads to a regression problem with confounding, the inputs and output noise being correlated. Hence, direct minimization of the Bellman error can result in significantly biased Q-function estimates. We explain why fixing the target Q-network i… ▽ More

    Submitted 23 November, 2022; v1 submitted 21 May, 2021; originally announced May 2021.

    Comments: Accepted by Journal of Machine Learning Research in 11/2022

    Journal ref: Journal of Machine Learning Research 23 (2022) 1-41

  17. arXiv:2103.09575  [pdf, other

    cs.LG

    Regularized Behavior Value Estimation

    Authors: Caglar Gulcehre, Sergio Gómez Colmenarejo, Ziyu Wang, Jakub Sygnowski, Thomas Paine, Konrad Zolna, Yutian Chen, Matthew Hoffman, Razvan Pascanu, Nando de Freitas

    Abstract: Offline reinforcement learning restricts the learning process to rely only on logged-data without access to an environment. While this enables real-world applications, it also poses unique challenges. One important challenge is dealing with errors caused by the overestimation of values for state-action pairs not well-covered by the training data. Due to bootstrap**, these errors get amplified du… ▽ More

    Submitted 17 March, 2021; originally announced March 2021.

  18. arXiv:2011.13885  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Offline Learning from Demonstrations and Unlabeled Experience

    Authors: Konrad Zolna, Alexander Novikov, Ksenia Konyushkova, Caglar Gulcehre, Ziyu Wang, Yusuf Aytar, Misha Denil, Nando de Freitas, Scott Reed

    Abstract: Behavior cloning (BC) is often practical for robot learning because it allows a policy to be trained offline without rewards, by supervised learning on expert demonstrations. However, BC does not effectively leverage what we will refer to as unlabeled experience: data of mixed and unknown quality without reward annotations. This unlabeled data can be generated by a variety of sources such as human… ▽ More

    Submitted 27 November, 2020; originally announced November 2020.

    Comments: Accepted to Offline Reinforcement Learning Workshop at Neural Information Processing Systems (2020)

  19. arXiv:2007.13483  [pdf, other

    cs.LG cs.AI

    Post-Workshop Report on Science meets Engineering in Deep Learning, NeurIPS 2019, Vancouver

    Authors: Levent Sagun, Caglar Gulcehre, Adriana Romero, Negar Rostamzadeh, Stefano Sarao Mannelli

    Abstract: Science meets Engineering in Deep Learning took place in Vancouver as part of the Workshop section of NeurIPS 2019. As organizers of the workshop, we created the following report in an attempt to isolate emerging topics and recurring themes that have been presented throughout the event. Deep learning can still be a complex mix of art and engineering despite its tremendous success in recent years.… ▽ More

    Submitted 29 July, 2020; v1 submitted 25 June, 2020; originally announced July 2020.

    Comments: Report of NeurIPS 2019 workshop SEDL

  20. arXiv:2007.09055  [pdf, other

    cs.LG cs.AI stat.ML

    Hyperparameter Selection for Offline Reinforcement Learning

    Authors: Tom Le Paine, Cosmin Paduraru, Andrea Michi, Caglar Gulcehre, Konrad Zolna, Alexander Novikov, Ziyu Wang, Nando de Freitas

    Abstract: Offline reinforcement learning (RL purely from logged data) is an important avenue for deploying RL techniques in real-world scenarios. However, existing hyperparameter selection methods for offline RL break the offline assumption by evaluating policies corresponding to each hyperparameter setting in the environment. This online execution is often infeasible and hence undermines the main aim of of… ▽ More

    Submitted 17 July, 2020; originally announced July 2020.

  21. arXiv:2006.15134  [pdf, other

    cs.LG cs.AI stat.ML

    Critic Regularized Regression

    Authors: Ziyu Wang, Alexander Novikov, Konrad Zolna, Jost Tobias Springenberg, Scott Reed, Bobak Shahriari, Noah Siegel, Josh Merel, Caglar Gulcehre, Nicolas Heess, Nando de Freitas

    Abstract: Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy optimization from large pre-recorded datasets without online environment interaction. It addresses challenges with regard to the cost of data collection and safety, both of which are particularly pertinent to real-world applications of RL. Unfortunately, most off-policy algorithms perform poorly when learnin… ▽ More

    Submitted 22 September, 2021; v1 submitted 26 June, 2020; originally announced June 2020.

    Comments: 24 pages; presented at NeurIPS 2020

  22. arXiv:2006.13888  [pdf, other

    cs.LG stat.ML

    RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning

    Authors: Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Tom Le Paine, Sergio Gomez Colmenarejo, Konrad Zolna, Rishabh Agarwal, Josh Merel, Daniel Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matt Hoffman, Ofir Nachum, George Tucker, Nicolas Heess, Nando de Freitas

    Abstract: Offline methods for reinforcement learning have a potential to help bridge the gap between reinforcement learning research and real-world applications. They make it possible to learn policies from offline datasets, thus overcoming concerns associated with online data collection in the real-world, including cost, safety, or ethical concerns. In this paper, we propose a benchmark called RL Unplugged… ▽ More

    Submitted 12 February, 2021; v1 submitted 24 June, 2020; originally announced June 2020.

    Comments: NeurIPS paper. 21 pages including supplementary material, the github link for the datasets: https://github.com/deepmind/deepmind-research/rl_unplugged

  23. arXiv:2006.00979  [pdf, other

    cs.LG cs.AI

    Acme: A Research Framework for Distributed Reinforcement Learning

    Authors: Matthew W. Hoffman, Bobak Shahriari, John Aslanides, Gabriel Barth-Maron, Nikola Momchev, Danila Sinopalnikov, Piotr Stańczyk, Sabela Ramos, Anton Raichuk, Damien Vincent, Léonard Hussenot, Robert Dadashi, Gabriel Dulac-Arnold, Manu Orsini, Alexis Jacq, Johan Ferret, Nino Vieillard, Seyed Kamyar Seyed Ghasemipour, Sertan Girgin, Olivier Pietquin, Feryal Behbahani, Tamara Norman, Abbas Abdolmaleki, Albin Cassirer, Fan Yang , et al. (14 additional authors not shown)

    Abstract: Deep reinforcement learning (RL) has led to many recent and groundbreaking advances. However, these advances have often come at the cost of both increased scale in the underlying architectures being trained as well as increased complexity of the RL algorithms used to train them. These increases have in turn made it more difficult for researchers to rapidly prototype new ideas or reproduce publishe… ▽ More

    Submitted 20 September, 2022; v1 submitted 1 June, 2020; originally announced June 2020.

    Comments: This work presents a second version of the paper which coincides with an increase in modularity, additional emphasis on offline, imitation and learning from demonstrations algorithms, as well as various new agents implemented as part of Acme

  24. arXiv:1910.09890  [pdf, other

    cs.NE cs.LG

    Improving the Gating Mechanism of Recurrent Neural Networks

    Authors: Albert Gu, Caglar Gulcehre, Tom Le Paine, Matt Hoffman, Razvan Pascanu

    Abstract: Gating mechanisms are widely used in neural network models, where they allow gradients to backpropagate more easily through depth or time. However, their saturation property introduces problems of its own. For example, in recurrent models these gates need to have outputs near 1 to propagate information over long time-delays, which requires them to operate in their saturation regime and hinders gra… ▽ More

    Submitted 18 June, 2020; v1 submitted 22 October, 2019; originally announced October 2019.

    Comments: International Conference on Machine Learning 2020

  25. arXiv:1910.06764  [pdf, other

    cs.LG cs.AI stat.ML

    Stabilizing Transformers for Reinforcement Learning

    Authors: Emilio Parisotto, H. Francis Song, Jack W. Rae, Razvan Pascanu, Caglar Gulcehre, Siddhant M. Jayakumar, Max Jaderberg, Raphael Lopez Kaufman, Aidan Clark, Seb Noury, Matthew M. Botvinick, Nicolas Heess, Raia Hadsell

    Abstract: Owing to their ability to both effectively integrate information over long time horizons and scale to massive amounts of data, self-attention architectures have recently shown breakthrough success in natural language processing (NLP), achieving state-of-the-art results in domains such as language modeling and machine translation. Harnessing the transformer's ability to process long time horizons o… ▽ More

    Submitted 13 October, 2019; originally announced October 2019.

  26. arXiv:1909.01387  [pdf, other

    cs.LG cs.AI

    Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

    Authors: Tom Le Paine, Caglar Gulcehre, Bobak Shahriari, Misha Denil, Matt Hoffman, Hubert Soyer, Richard Tanburn, Steven Kapturowski, Neil Rabinowitz, Duncan Williams, Gabriel Barth-Maron, Ziyu Wang, Nando de Freitas, Worlds Team

    Abstract: This paper introduces R2D3, an agent that makes efficient use of demonstrations to solve hard exploration problems in partially observable environments with highly variable initial conditions. We also introduce a suite of eight tasks that combine these three properties, and show that R2D3 can solve several of the tasks where other state of the art methods (both with and without demonstrations) fai… ▽ More

    Submitted 3 September, 2019; originally announced September 2019.

  27. arXiv:1810.08647  [pdf, other

    cs.LG cs.AI cs.MA stat.ML

    Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning

    Authors: Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas

    Abstract: We propose a unified mechanism for achieving coordination and communication in Multi-Agent Reinforcement Learning (MARL), through rewarding agents for having causal influence over other agents' actions. Causal influence is assessed using counterfactual reasoning. At each timestep, an agent simulates alternate actions that it could have taken, and computes their effect on the behavior of other agen… ▽ More

    Submitted 18 June, 2019; v1 submitted 19 October, 2018; originally announced October 2018.

  28. arXiv:1809.10460  [pdf, other

    cs.LG cs.SD stat.ML

    Sample Efficient Adaptive Text-to-Speech

    Authors: Yutian Chen, Yannis Assael, Brendan Shillingford, David Budden, Scott Reed, Heiga Zen, Quan Wang, Luis C. Cobo, Andrew Trask, Ben Laurie, Caglar Gulcehre, Aäron van den Oord, Oriol Vinyals, Nando de Freitas

    Abstract: We present a meta-learning approach for adaptive text-to-speech (TTS) with few data. During training, we learn a multi-speaker model using a shared conditional WaveNet core and independent learned embeddings for each speaker. The aim of training is not to produce a neural network with fixed weights, which is then deployed as a TTS system. Instead, the aim is to produce a network that requires few… ▽ More

    Submitted 16 January, 2019; v1 submitted 27 September, 2018; originally announced September 2018.

    Comments: Accepted by ICLR 2019

  29. arXiv:1806.01261  [pdf, other

    cs.LG cs.AI stat.ML

    Relational inductive biases, deep learning, and graph networks

    Authors: Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, Caglar Gulcehre, Francis Song, Andrew Ballard, Justin Gilmer, George Dahl, Ashish Vaswani, Kelsey Allen, Charles Nash, Victoria Langston, Chris Dyer, Nicolas Heess, Daan Wierstra, Pushmeet Kohli, Matt Botvinick, Oriol Vinyals , et al. (2 additional authors not shown)

    Abstract: Artificial intelligence (AI) has undergone a renaissance recently, making major progress in key domains such as vision, language, control, and decision-making. This has been due, in part, to cheap data and cheap compute resources, which have fit the natural strengths of deep learning. However, many defining characteristics of human intelligence, which developed under much different pressures, rema… ▽ More

    Submitted 17 October, 2018; v1 submitted 4 June, 2018; originally announced June 2018.

  30. arXiv:1805.09786  [pdf, other

    cs.NE

    Hyperbolic Attention Networks

    Authors: Caglar Gulcehre, Misha Denil, Mateusz Malinowski, Ali Razavi, Razvan Pascanu, Karl Moritz Hermann, Peter Battaglia, Victor Bapst, David Raposo, Adam Santoro, Nando de Freitas

    Abstract: We introduce hyperbolic attention networks to endow neural networks with enough capacity to match the complexity of data with hierarchical and power-law structure. A few recent approaches have successfully demonstrated the benefits of imposing hyperbolic geometry on the parameters of shallow networks. We extend this line of work by imposing hyperbolic geometry on the activations of neural networks… ▽ More

    Submitted 24 May, 2018; originally announced May 2018.

  31. arXiv:1711.10462  [pdf, other

    cs.LG stat.ML

    Plan, Attend, Generate: Planning for Sequence-to-Sequence Models

    Authors: Francis Dutil, Caglar Gulcehre, Adam Trischler, Yoshua Bengio

    Abstract: We investigate the integration of a planning mechanism into sequence-to-sequence models using attention. We develop a model which can plan ahead in the future when it computes its alignments between input and output sequences, constructing a matrix of proposed future alignments and a commitment vector that governs whether to follow or recompute the plan. This mechanism is inspired by the recently… ▽ More

    Submitted 28 November, 2017; originally announced November 2017.

    Comments: NIPS 2017

  32. arXiv:1706.05087  [pdf, other

    cs.CL cs.NE

    Plan, Attend, Generate: Character-level Neural Machine Translation with Planning in the Decoder

    Authors: Caglar Gulcehre, Francis Dutil, Adam Trischler, Yoshua Bengio

    Abstract: We investigate the integration of a planning mechanism into an encoder-decoder architecture with an explicit alignment for character-level machine translation. We develop a model that plans ahead when it computes alignments between the source and target sequences, constructing a matrix of proposed future alignments and a commitment vector that governs whether to follow or recompute the plan. This… ▽ More

    Submitted 23 June, 2017; v1 submitted 13 June, 2017; originally announced June 2017.

    Comments: Accepted to Rep4NLP 2017 Workshop at ACL 2017 Conference

  33. arXiv:1706.02761  [pdf, other

    cs.LG cs.NE stat.ML

    Gated Orthogonal Recurrent Units: On Learning to Forget

    Authors: Li **g, Caglar Gulcehre, John Peurifoy, Yichen Shen, Max Tegmark, Marin Soljačić, Yoshua Bengio

    Abstract: We present a novel recurrent neural network (RNN) based model that combines the remembering ability of unitary RNNs with the ability of gated RNNs to effectively forget redundant/irrelevant information in its memory. We achieve this by extending unitary RNNs with a gating mechanism. Our model is able to outperform LSTMs, GRUs and Unitary RNNs on several long-term dependency benchmark tasks. We emp… ▽ More

    Submitted 25 October, 2017; v1 submitted 8 June, 2017; originally announced June 2017.

  34. arXiv:1705.02012  [pdf, ps, other

    cs.CL

    Machine Comprehension by Text-to-Text Neural Question Generation

    Authors: Xingdi Yuan, Tong Wang, Caglar Gulcehre, Alessandro Sordoni, Philip Bachman, Sandeep Subramanian, Saizheng Zhang, Adam Trischler

    Abstract: We propose a recurrent neural model that generates natural-language questions from documents, conditioned on answers. We show how to train the model using a combination of supervised and reinforcement learning. After teacher forcing for standard maximum likelihood training, we fine-tune the model using policy gradient techniques to maximize several rewards that measure question quality. Most notab… ▽ More

    Submitted 15 May, 2017; v1 submitted 4 May, 2017; originally announced May 2017.

  35. arXiv:1703.00788  [pdf, other

    cs.LG

    A Robust Adaptive Stochastic Gradient Method for Deep Learning

    Authors: Caglar Gulcehre, Jose Sotelo, Marcin Moczulski, Yoshua Bengio

    Abstract: Stochastic gradient algorithms are the main focus of large-scale optimization problems and led to important successes in the recent advancement of the deep learning algorithms. The convergence of SGD depends on the careful choice of learning rate and the amount of the noise in stochastic estimates of the gradients. In this paper, we propose an adaptive learning rate algorithm, which utilizes stoch… ▽ More

    Submitted 2 March, 2017; originally announced March 2017.

    Comments: IJCNN 2017 Accepted Paper, An extension of our paper, "ADASECANT: Robust Adaptive Secant Method for Stochastic Gradient"

  36. arXiv:1701.08718  [pdf, other

    cs.LG cs.NE stat.ML

    Memory Augmented Neural Networks with Wormhole Connections

    Authors: Caglar Gulcehre, Sarath Chandar, Yoshua Bengio

    Abstract: Recent empirical results on long-term dependency tasks have shown that neural networks augmented with an external memory can learn the long-term dependency tasks more easily and achieve better generalization than vanilla recurrent neural networks (RNN). We suggest that memory augmented neural networks can reduce the effects of vanishing gradients by creating shortcut (or wormhole) connections. Bas… ▽ More

    Submitted 30 January, 2017; originally announced January 2017.

  37. arXiv:1608.04980  [pdf, other

    cs.LG cs.NE

    Mollifying Networks

    Authors: Caglar Gulcehre, Marcin Moczulski, Francesco Visin, Yoshua Bengio

    Abstract: The optimization of deep neural networks can be more challenging than traditional convex optimization problems due to the highly non-convex nature of the loss function, e.g. it can involve pathological landscapes such as saddle-surfaces that can be difficult to escape for algorithms based on simple gradient descent. In this paper, we attack the problem of optimization of highly non-convex neural n… ▽ More

    Submitted 17 August, 2016; originally announced August 2016.

  38. arXiv:1607.00036  [pdf, other

    cs.LG cs.NE

    Dynamic Neural Turing Machine with Soft and Hard Addressing Schemes

    Authors: Caglar Gulcehre, Sarath Chandar, Kyunghyun Cho, Yoshua Bengio

    Abstract: We extend neural Turing machine (NTM) model into a dynamic neural Turing machine (D-NTM) by introducing a trainable memory addressing scheme. This addressing scheme maintains for each memory cell two separate vectors, content and address vectors. This allows the D-NTM to learn a wide variety of location-based addressing strategies including both linear and nonlinear ones. We implement the D-NTM wi… ▽ More

    Submitted 17 March, 2017; v1 submitted 30 June, 2016; originally announced July 2016.

    Comments: 13 pages, 3 figures

  39. arXiv:1605.02688  [pdf, other

    cs.SC cs.LG cs.MS

    Theano: A Python framework for fast computation of mathematical expressions

    Authors: The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano , et al. (88 additional authors not shown)

    Abstract: Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, mu… ▽ More

    Submitted 9 May, 2016; originally announced May 2016.

    Comments: 19 pages, 5 figures

  40. arXiv:1603.09025  [pdf, other

    cs.LG

    Recurrent Batch Normalization

    Authors: Tim Cooijmans, Nicolas Ballas, César Laurent, Çağlar Gülçehre, Aaron Courville

    Abstract: We propose a reparameterization of LSTM that brings the benefits of batch normalization to recurrent neural networks. Whereas previous works only apply batch normalization to the input-to-hidden transformation of RNNs, we demonstrate that it is both possible and beneficial to batch-normalize the hidden-to-hidden transition, thereby reducing internal covariate shift between time steps. We evaluate… ▽ More

    Submitted 27 February, 2017; v1 submitted 29 March, 2016; originally announced March 2016.

  41. arXiv:1603.08148  [pdf, other

    cs.CL cs.LG cs.NE

    Pointing the Unknown Words

    Authors: Caglar Gulcehre, Sung** Ahn, Ramesh Nallapati, Bowen Zhou, Yoshua Bengio

    Abstract: The problem of rare and unknown words is an important issue that can potentially influence the performance of many NLP systems, including both the traditional count-based and the deep learning models. We propose a novel way to deal with the rare and unseen words for the neural network models using attention. Our model uses two softmax layers in order to predict the next word in conditional languag… ▽ More

    Submitted 21 August, 2016; v1 submitted 26 March, 2016; originally announced March 2016.

    Comments: ACL 2016 Oral Paper

  42. arXiv:1603.06807  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus

    Authors: Iulian Vlad Serban, Alberto García-Durán, Caglar Gulcehre, Sung** Ahn, Sarath Chandar, Aaron Courville, Yoshua Bengio

    Abstract: Over the past decade, large-scale supervised learning corpora have enabled machine learning researchers to make substantial advances. However, to this date, there are no large-scale question-answer corpora available. In this paper we present the 30M Factoid Question-Answer Corpus, an enormous question answer pair corpus produced by applying a novel neural network architecture on the knowledge base… ▽ More

    Submitted 29 May, 2016; v1 submitted 22 March, 2016; originally announced March 2016.

    Comments: 13 pages, 1 figure, 7 tables

    ACM Class: H.3.4; I.5.1; I.2.6; I.2.7

  43. arXiv:1603.00391  [pdf, other

    cs.LG cs.NE stat.ML

    Noisy Activation Functions

    Authors: Caglar Gulcehre, Marcin Moczulski, Misha Denil, Yoshua Bengio

    Abstract: Common nonlinear activation functions used in neural networks can cause training difficulties due to the saturation behavior of the activation function, which may hide dependencies that are not visible to vanilla-SGD (using first order gradients only). Gating mechanisms that use softly saturating activation functions to emulate the discrete switching of digital logic circuits are good examples of… ▽ More

    Submitted 3 April, 2016; v1 submitted 1 March, 2016; originally announced March 2016.

  44. arXiv:1602.06023  [pdf, other

    cs.CL

    Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond

    Authors: Ramesh Nallapati, Bowen Zhou, Cicero Nogueira dos santos, Caglar Gulcehre, Bing Xiang

    Abstract: In this work, we model abstractive text summarization using Attentional Encoder-Decoder Recurrent Neural Networks, and show that they achieve state-of-the-art performance on two different corpora. We propose several novel models that address critical problems in summarization that are not adequately modeled by the basic architecture, such as modeling key-words, capturing the hierarchy of sentence-… ▽ More

    Submitted 26 August, 2016; v1 submitted 18 February, 2016; originally announced February 2016.

    Journal ref: The SIGNLL Conference on Computational Natural Language Learning (CoNLL), 2016

  45. arXiv:1511.06295  [pdf, other

    cs.LG

    Policy Distillation

    Authors: Andrei A. Rusu, Sergio Gomez Colmenarejo, Caglar Gulcehre, Guillaume Desjardins, James Kirkpatrick, Razvan Pascanu, Volodymyr Mnih, Koray Kavukcuoglu, Raia Hadsell

    Abstract: Policies for complex visual tasks have been successfully learned with deep reinforcement learning, using an approach called deep Q-networks (DQN), but relatively large (task-specific) networks and extensive training are needed to achieve good performance. In this work, we present a novel method called policy distillation that can be used to extract the policy of a reinforcement learning agent and… ▽ More

    Submitted 7 January, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: Submitted to ICLR 2016

  46. arXiv:1503.03535  [pdf, other

    cs.CL

    On Using Monolingual Corpora in Neural Machine Translation

    Authors: Caglar Gulcehre, Orhan Firat, Kelvin Xu, Kyunghyun Cho, Loic Barrault, Huei-Chi Lin, Fethi Bougares, Holger Schwenk, Yoshua Bengio

    Abstract: Recent work on end-to-end neural network-based architectures for machine translation has shown promising results for En-Fr and En-De translation. Arguably, one of the major factors behind this success has been the availability of high quality parallel corpora. In this work, we investigate how to leverage abundant monolingual corpora for neural machine translation. Compared to a phrase-based and hi… ▽ More

    Submitted 12 June, 2015; v1 submitted 11 March, 2015; originally announced March 2015.

    Comments: 9 pages, 2 figures

  47. arXiv:1503.01800  [pdf, other

    cs.LG cs.CV

    EmoNets: Multimodal deep learning approaches for emotion recognition in video

    Authors: Samira Ebrahimi Kahou, Xavier Bouthillier, Pascal Lamblin, Caglar Gulcehre, Vincent Michalski, Kishore Konda, Sébastien Jean, Pierre Froumenty, Yann Dauphin, Nicolas Boulanger-Lewandowski, Raul Chandias Ferrari, Mehdi Mirza, David Warde-Farley, Aaron Courville, Pascal Vincent, Roland Memisevic, Christopher Pal, Yoshua Bengio

    Abstract: The task of the emotion recognition in the wild (EmotiW) Challenge is to assign one of seven emotions to short video clips extracted from Hollywood style movies. The videos depict acted-out emotions under realistic conditions with a large degree of variation in attributes such as pose and illumination, making it worthwhile to explore approaches which consider combinations of features from multiple… ▽ More

    Submitted 29 March, 2015; v1 submitted 5 March, 2015; originally announced March 2015.

  48. arXiv:1502.02367  [pdf, other

    cs.NE cs.LG stat.ML

    Gated Feedback Recurrent Neural Networks

    Authors: Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio

    Abstract: In this work, we propose a novel recurrent neural network (RNN) architecture. The proposed RNN, gated-feedback RNN (GF-RNN), extends the existing approach of stacking multiple recurrent layers by allowing and controlling signals flowing from upper recurrent layers to lower layers using a global gating unit for each pair of layers. The recurrent signals exchanged between layers are gated adaptively… ▽ More

    Submitted 17 June, 2015; v1 submitted 9 February, 2015; originally announced February 2015.

    Comments: 9 pages, removed appendix

  49. arXiv:1412.7419  [pdf, other

    cs.LG cs.NE stat.ML

    ADASECANT: Robust Adaptive Secant Method for Stochastic Gradient

    Authors: Caglar Gulcehre, Marcin Moczulski, Yoshua Bengio

    Abstract: Stochastic gradient algorithms have been the main focus of large-scale learning problems and they led to important successes in machine learning. The convergence of SGD depends on the careful choice of learning rate and the amount of the noise in stochastic estimates of the gradients. In this paper, we propose a new adaptive learning rate algorithm, which utilizes curvature information for automat… ▽ More

    Submitted 31 October, 2015; v1 submitted 23 December, 2014; originally announced December 2014.

    Comments: 8 pages, 3 figures, ICLR workshop submission

  50. arXiv:1412.3555  [pdf, other

    cs.NE cs.LG

    Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

    Authors: Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, Yoshua Bengio

    Abstract: In this paper we compare different types of recurrent units in recurrent neural networks (RNNs). Especially, we focus on more sophisticated units that implement a gating mechanism, such as a long short-term memory (LSTM) unit and a recently proposed gated recurrent unit (GRU). We evaluate these recurrent units on the tasks of polyphonic music modeling and speech signal modeling. Our experiments re… ▽ More

    Submitted 11 December, 2014; originally announced December 2014.

    Comments: Presented in NIPS 2014 Deep Learning and Representation Learning Workshop