Skip to main content

Showing 1–12 of 12 results for author: Swirszcz, G

.
  1. arXiv:2407.06183  [pdf, other

    cs.LG

    Step** on the Edge: Curvature Aware Learning Rate Tuners

    Authors: Vincent Roulet, Atish Agarwala, Jean-Bastien Grill, Grzegorz Swirszcz, Mathieu Blondel, Fabian Pedregosa

    Abstract: Curvature information -- particularly, the largest eigenvalue of the loss Hessian, known as the sharpness -- often forms the basis for learning rate tuners. However, recent work has shown that the curvature information undergoes complex dynamics during training, going from a phase of increasing sharpness to eventual stabilization. We analyze the closed-loop feedback effect between learning rate tu… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  2. arXiv:2110.01765  [pdf, other

    cs.LG cs.AI cs.NE

    Rapid training of deep neural networks without skip connections or normalization layers using Deep Kernel Sha**

    Authors: James Martens, Andy Ballard, Guillaume Desjardins, Grzegorz Swirszcz, Valentin Dalibard, Jascha Sohl-Dickstein, Samuel S. Schoenholz

    Abstract: Using an extended and formalized version of the Q/C map analysis of Poole et al. (2016), along with Neural Tangent Kernel theory, we identify the main pathologies present in deep networks that prevent them from training fast and generalizing to unseen data, and show how these can be avoided by carefully controlling the "shape" of the network's initialization-time kernel function. We then develop a… ▽ More

    Submitted 4 October, 2021; originally announced October 2021.

  3. arXiv:2106.08318  [pdf, other

    cs.CV cs.DC cs.LG eess.IV

    Gradient Forward-Propagation for Large-Scale Temporal Video Modelling

    Authors: Mateusz Malinowski, Dimitrios Vytiniotis, Grzegorz Swirszcz, Viorica Patraucean, Joao Carreira

    Abstract: How can neural networks be trained on large-volume temporal data efficiently? To compute the gradients required to update parameters, backpropagation blocks computations until the forward and backward passes are completed. For temporal signals, this introduces high latency and hinders real-time learning. It also creates a coupling between consecutive layers, which limits model parallelism and incr… ▽ More

    Submitted 12 July, 2021; v1 submitted 15 June, 2021; originally announced June 2021.

    Comments: Accepted to CVPR 2021. arXiv admin note: text overlap with arXiv:2001.06232

  4. arXiv:2001.06232  [pdf, other

    cs.LG cs.CV stat.ML

    Sideways: Depth-Parallel Training of Video Models

    Authors: Mateusz Malinowski, Grzegorz Swirszcz, Joao Carreira, Viorica Patraucean

    Abstract: We propose Sideways, an approximate backpropagation scheme for training video models. In standard backpropagation, the gradients and activations at every computation step through the model are temporally synchronized. The forward activations need to be stored until the backward pass is executed, preventing inter-layer (depth) parallelization. However, can we leverage smooth, redundant input stream… ▽ More

    Submitted 30 March, 2020; v1 submitted 17 January, 2020; originally announced January 2020.

    Comments: Accepted at CVPR'20

  5. arXiv:1902.09592  [pdf, other

    cs.LG stat.ML

    Verification of Non-Linear Specifications for Neural Networks

    Authors: Chongli Qin, Krishnamurthy, Dvijotham, Brendan O'Donoghue, Rudy Bunel, Robert Stanforth, Sven Gowal, Jonathan Uesato, Grzegorz Swirszcz, Pushmeet Kohli

    Abstract: Prior work on neural network verification has focused on specifications that are linear functions of the output of the network, e.g., invariance of the classifier output under adversarial perturbations of the input. In this paper, we extend verification algorithms to be able to certify richer properties of neural networks. To do this we introduce the class of convex-relaxable specifications, which… ▽ More

    Submitted 25 February, 2019; originally announced February 2019.

    Comments: ICLR conference paper

  6. arXiv:1902.02186  [pdf, other

    cs.LG cs.AI stat.ML

    Distilling Policy Distillation

    Authors: Wojciech Marian Czarnecki, Razvan Pascanu, Simon Osindero, Siddhant M. Jayakumar, Grzegorz Swirszcz, Max Jaderberg

    Abstract: The transfer of knowledge from one policy to another is an important tool in Deep Reinforcement Learning. This process, referred to as distillation, has been used to great success, for example, by enhancing the optimisation of agents, leading to stronger performance faster, on harder domains [26, 32, 5, 8]. Despite the widespread use and conceptual simplicity of distillation, many different formul… ▽ More

    Submitted 6 February, 2019; originally announced February 2019.

    Comments: Accepted at AISTATS 2019

  7. arXiv:1811.09300  [pdf, other

    cs.NE cs.CR cs.LG

    Strength in Numbers: Trading-off Robustness and Computation via Adversarially-Trained Ensembles

    Authors: Edward Grefenstette, Robert Stanforth, Brendan O'Donoghue, Jonathan Uesato, Grzegorz Swirszcz, Pushmeet Kohli

    Abstract: While deep learning has led to remarkable results on a number of challenging problems, researchers have discovered a vulnerability of neural networks in adversarial settings, where small but carefully chosen perturbations to the input can make the models produce extremely inaccurate outputs. This makes these models particularly unsuitable for safety-critical application domains (e.g. self-driving… ▽ More

    Submitted 22 November, 2018; originally announced November 2018.

    Comments: 12 pages

  8. arXiv:1706.04859  [pdf, other

    cs.LG

    Sobolev Training for Neural Networks

    Authors: Wojciech Marian Czarnecki, Simon Osindero, Max Jaderberg, Grzegorz Świrszcz, Razvan Pascanu

    Abstract: At the heart of deep learning we aim to use neural networks as function approximators - training them to produce outputs from inputs in emulation of a ground truth function or data creation process. In many cases we only have access to input-output pairs from the ground truth, however it is becoming more common to have access to derivatives of the target output with respect to the input - for exam… ▽ More

    Submitted 26 July, 2017; v1 submitted 15 June, 2017; originally announced June 2017.

  9. arXiv:1703.00522  [pdf, other

    cs.LG cs.NE

    Understanding Synthetic Gradients and Decoupled Neural Interfaces

    Authors: Wojciech Marian Czarnecki, Grzegorz Świrszcz, Max Jaderberg, Simon Osindero, Oriol Vinyals, Koray Kavukcuoglu

    Abstract: When training neural networks, the use of Synthetic Gradients (SG) allows layers or modules to be trained without update locking - without waiting for a true error gradient to be backpropagated - resulting in Decoupled Neural Interfaces (DNIs). This unlocked ability of being able to update parts of a neural network asynchronously and with only local information was demonstrated to work empirically… ▽ More

    Submitted 1 March, 2017; originally announced March 2017.

  10. arXiv:1611.06310  [pdf, other

    stat.ML cs.LG cs.NE

    Local minima in training of neural networks

    Authors: Grzegorz Swirszcz, Wojciech Marian Czarnecki, Razvan Pascanu

    Abstract: There has been a lot of recent interest in trying to characterize the error surface of deep models. This stems from a long standing question. Given that deep networks are highly nonlinear systems optimized by local gradient methods, why do they not seem to be affected by bad local minima? It is widely believed that training of deep models using gradient methods works so well because the error surf… ▽ More

    Submitted 17 February, 2017; v1 submitted 19 November, 2016; originally announced November 2016.

  11. Approximation Algorithms for the Joint Replenishment Problem with Deadlines

    Authors: Marcin Bienkowski, Jaroslaw Byrka, Marek Chrobak, Neil Dobbs, Tomasz Nowicki, Maxim Sviridenko, Grzegorz Swirszcz, Neal E. Young

    Abstract: The Joint Replenishment Problem (JRP) is a fundamental optimization problem in supply-chain management, concerned with optimizing the flow of goods from a supplier to retailers. Over time, in response to demands at the retailers, the supplier ships orders, via a warehouse, to the retailers. The objective is to schedule these orders to minimize the sum of ordering costs and retailers' waiting costs… ▽ More

    Submitted 2 December, 2015; v1 submitted 13 December, 2012; originally announced December 2012.

    MSC Class: 68W25; 90C05 ACM Class: G.1.6

    Journal ref: J. Scheduling 18(6): 545-560 (2015)

  12. arXiv:math/0402343  [pdf, ps, other

    math.DS math.PR

    Dynamics of exponential linear map in functional space

    Authors: David Gamarnik, Tomasz Nowicki, Grzegorz Swirszcz

    Abstract: We consider the question of existence of a unique invariant probability distribution which satisfies some evolutionary property. The problem arises from the random graph theory but to answer it we treat it as a dynamical system in the functional space, where we look for a global attractor. We consider the following bifurcation problem: Given a probability measure $μ$, which corresponds to the we… ▽ More

    Submitted 21 February, 2004; originally announced February 2004.

    MSC Class: 37-xx; 60C05