Skip to main content

Showing 1–7 of 7 results for author: Kerg, G

.
  1. arXiv:2206.05056  [pdf, other

    cs.NE cs.AI cs.LG

    On Neural Architecture Inductive Biases for Relational Tasks

    Authors: Giancarlo Kerg, Sarthak Mittal, David Rolnick, Yoshua Bengio, Blake Richards, Guillaume Lajoie

    Abstract: Current deep learning approaches have shown good in-distribution generalization performance, but struggle with out-of-distribution generalization. This is especially true in the case of tasks involving abstract relations like recognizing rules in sequences, as we find in many intelligence tests. Recent work has explored how forcing relational representations to remain distinct from sensory represe… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

  2. arXiv:2203.01443  [pdf, other

    cs.LG

    Continuous-Time Meta-Learning with Forward Mode Differentiation

    Authors: Tristan Deleu, David Kanaa, Leo Feng, Giancarlo Kerg, Yoshua Bengio, Guillaume Lajoie, Pierre-Luc Bacon

    Abstract: Drawing inspiration from gradient-based meta-learning methods with infinitely small gradient steps, we introduce Continuous-Time Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field. Specifically, representations of the inputs are meta-learned such that a task-specific linear classifier is obtained as a solution of an ordinary differenti… ▽ More

    Submitted 2 March, 2022; originally announced March 2022.

  3. arXiv:2012.14193  [pdf, other

    cs.LG stat.ML

    Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization

    Authors: Stanislaw Jastrzebski, Devansh Arpit, Oliver Astrand, Giancarlo Kerg, Huan Wang, Caiming Xiong, Richard Socher, Kyunghyun Cho, Krzysztof Geras

    Abstract: The early phase of training a deep neural network has a dramatic effect on the local curvature of the loss function. For instance, using a small learning rate does not guarantee stable optimization because the optimization trajectory has a tendency to steer towards regions of the loss surface with increasing local curvature. We ask whether this tendency is connected to the widely observed phenomen… ▽ More

    Submitted 11 June, 2021; v1 submitted 28 December, 2020; originally announced December 2020.

    Comments: The last two authors contributed equally. Accepted to the International Conference on Machine Learning 2021

  4. arXiv:2006.12253  [pdf, other

    cs.LG cs.NE q-bio.NC stat.ML

    Advantages of biologically-inspired adaptive neural activation in RNNs during learning

    Authors: Victor Geadah, Giancarlo Kerg, Stefan Horoi, Guy Wolf, Guillaume Lajoie

    Abstract: Dynamic adaptation in single-neuron response plays a fundamental role in neural coding in biological neural networks. Yet, most neural activation functions used in artificial networks are fixed and mostly considered as an inconsequential architecture choice. In this paper, we investigate nonlinear activation function adaptation over the large time scale of learning, and outline its impact on seque… ▽ More

    Submitted 22 June, 2020; originally announced June 2020.

  5. arXiv:2006.09471  [pdf, other

    cs.LG stat.ML

    Untangling tradeoffs between recurrence and self-attention in neural networks

    Authors: Giancarlo Kerg, Bhargav Kanuparthi, Anirudh Goyal, Kyle Goyette, Yoshua Bengio, Guillaume Lajoie

    Abstract: Attention and self-attention mechanisms, are now central to state-of-the-art deep learning on sequential tasks. However, most recent progress hinges on heuristic approaches with limited understanding of attention's role in model optimization and computation, and rely on considerable memory and computational resources that scale poorly. In this work, we present a formal analysis of how self-attenti… ▽ More

    Submitted 10 December, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

  6. arXiv:1905.12080  [pdf, other

    cs.LG cs.AI stat.ML

    Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics

    Authors: Giancarlo Kerg, Kyle Goyette, Maximilian Puelma Touzel, Gauthier Gidel, Eugene Vorontsov, Yoshua Bengio, Guillaume Lajoie

    Abstract: A recent strategy to circumvent the exploding and vanishing gradient problem in RNNs, and to allow the stable propagation of signals over long time scales, is to constrain recurrent connectivity matrices to be orthogonal or unitary. This ensures eigenvalues with unit norm and thus stable dynamics and training. However this comes at the cost of reduced expressivity due to the limited variety of ort… ▽ More

    Submitted 28 October, 2019; v1 submitted 28 May, 2019; originally announced May 2019.

  7. arXiv:1810.03023  [pdf, other

    stat.ML cs.LG

    h-detach: Modifying the LSTM Gradient Towards Better Optimization

    Authors: Devansh Arpit, Bhargav Kanuparthi, Giancarlo Kerg, Nan Rosemary Ke, Ioannis Mitliagkas, Yoshua Bengio

    Abstract: Recurrent neural networks are known for their notorious exploding and vanishing gradient problem (EVGP). This problem becomes more evident in tasks where the information needed to correctly solve them exist over long time scales, because EVGP prevents important gradient components from being back-propagated adequately over a large number of steps. We introduce a simple stochastic algorithm (\texti… ▽ More

    Submitted 9 January, 2019; v1 submitted 6 October, 2018; originally announced October 2018.

    Comments: First two authors contributed equally. Published in ICLR 2019