Skip to main content

Showing 1–20 of 20 results for author: Steger, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.15001  [pdf, other

    cs.LG cs.NE

    Discovering modular solutions that generalize compositionally

    Authors: Simon Schug, Sei** Kobayashi, Yassir Akram, Maciej Wołczyk, Alexandra Proca, Johannes von Oswald, Razvan Pascanu, João Sacramento, Angelika Steger

    Abstract: Many complex tasks can be decomposed into simpler, independent parts. Discovering such underlying compositional structure has the potential to enable compositional generalization. Despite progress, our most powerful systems struggle to compose flexibly. It therefore seems natural to make models more modular to help capture the compositional nature of many tasks. However, it is unclear under which… ▽ More

    Submitted 25 March, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: Published as a conference paper at ICLR 2024; Code available at https://github.com/smonsays/modular-hyperteacher

  2. arXiv:2309.01775  [pdf, other

    cs.LG cs.NE

    Gated recurrent neural networks discover attention

    Authors: Nicolas Zucchet, Sei** Kobayashi, Yassir Akram, Johannes von Oswald, Maxime Larcher, Angelika Steger, João Sacramento

    Abstract: Recent architectural developments have enabled recurrent neural networks (RNNs) to reach and even surpass the performance of Transformers on certain sequence modeling tasks. These modern RNNs feature a prominent design pattern: linear recurrent layers interconnected by feedforward paths with multiplicative gating. Here, we show how RNNs equipped with these two design elements can exactly implement… ▽ More

    Submitted 7 February, 2024; v1 submitted 4 September, 2023; originally announced September 2023.

  3. arXiv:2209.07509  [pdf, other

    cs.LG

    Random initialisations performing above chance and how to find them

    Authors: Frederik Benzing, Simon Schug, Robert Meier, Johannes von Oswald, Yassir Akram, Nicolas Zucchet, Laurence Aitchison, Angelika Steger

    Abstract: Neural networks trained with stochastic gradient descent (SGD) starting from different random initialisations typically find functionally very similar solutions, raising the question of whether there are meaningful differences between different SGD solutions. Entezari et al.\ recently conjectured that despite different initialisations, the solutions found by SGD lie in the same loss valley after t… ▽ More

    Submitted 7 November, 2022; v1 submitted 15 September, 2022; originally announced September 2022.

    Comments: NeurIPS 2022, 14th Annual Workshop on Optimization for Machine Learning (OPT2022)

  4. arXiv:2012.02551  [pdf, other

    cs.DS math.CO

    An O(n) time algorithm for finding Hamilton cycles with high probability

    Authors: Rajko Nenadov, Angelika Steger, Pascal Su

    Abstract: We design a randomized algorithm that finds a Hamilton cycle in $\mathcal{O}(n)$ time with high probability in a random graph $G_{n,p}$ with edge probability $p\ge C \log n / n$. This closes a gap left open in a seminal paper by Angluin and Valiant from 1979.

    Submitted 4 December, 2020; originally announced December 2020.

  5. arXiv:2002.05121  [pdf, ps, other

    cs.DS cs.DM math.CO

    An Optimal Decentralized $(Δ+ 1)$-Coloring Algorithm

    Authors: Daniel Bertschinger, Johannes Lengler, Anders Martinsson, Robert Meier, Angelika Steger, Miloš Trujić, Emo Welzl

    Abstract: Consider the following simple coloring algorithm for a graph on $n$ vertices. Each vertex chooses a color from $\{1, \dotsc, Δ(G) + 1\}$ uniformly at random. While there exists a conflicted vertex choose one such vertex uniformly at random and recolor it with a randomly chosen color. This algorithm was introduced by Bhartia et al. [MOBIHOC'16] for channel selection in WIFI-networks. We show that t… ▽ More

    Submitted 3 May, 2021; v1 submitted 12 February, 2020; originally announced February 2020.

  6. arXiv:1910.05268  [pdf, other

    cs.NE cs.LG

    Improving Gradient Estimation in Evolutionary Strategies With Past Descent Directions

    Authors: Florian Meier, Asier Mujika, Marcelo Matheus Gauy, Angelika Steger

    Abstract: Evolutionary Strategies (ES) are known to be an effective black-box optimization technique for deep neural networks when the true gradients cannot be computed, such as in Reinforcement Learning. We continue a recent line of research that uses surrogate gradients to improve the gradient estimation of ES. We propose a novel method to optimally incorporate surrogate gradient information. Our approach… ▽ More

    Submitted 11 October, 2019; originally announced October 2019.

  7. arXiv:1910.05245  [pdf, other

    cs.LG cs.NE stat.ML

    Decoupling Hierarchical Recurrent Neural Networks With Locally Computable Losses

    Authors: Asier Mujika, Felix Weissenberger, Angelika Steger

    Abstract: Learning long-term dependencies is a key long-standing challenge of recurrent neural networks (RNNs). Hierarchical recurrent neural networks (HRNNs) have been considered a promising approach as long-term dependencies are resolved through shortcuts up and down the hierarchy. Yet, the memory requirements of Truncated Backpropagation Through Time (TBPTT) still prevent training them on very long seque… ▽ More

    Submitted 11 October, 2019; originally announced October 2019.

  8. arXiv:1902.03993  [pdf, other

    cs.LG stat.ML

    Optimal Kronecker-Sum Approximation of Real Time Recurrent Learning

    Authors: Frederik Benzing, Marcelo Matheus Gauy, Asier Mujika, Anders Martinsson, Angelika Steger

    Abstract: One of the central goals of Recurrent Neural Networks (RNNs) is to learn long-term dependencies in sequential data. Nevertheless, the most popular training method, Truncated Backpropagation through Time (TBPTT), categorically forbids learning dependencies beyond the truncation horizon. In contrast, the online training algorithm Real Time Recurrent Learning (RTRL) provides untruncated gradients, wi… ▽ More

    Submitted 17 May, 2019; v1 submitted 11 February, 2019; originally announced February 2019.

    Comments: ICML 2019 camera ready version; new version includes additional plots in the appendix

  9. arXiv:1808.05566  [pdf, ps, other

    cs.NE cs.DS

    The linear hidden subset problem for the (1+1) EA with scheduled and adaptive mutation rates

    Authors: Hafsteinn Einarsson, Marcelo Matheus Gauy, Johannes Lengler, Florian Meier, Asier Mujika, Angelika Steger, Felix Weissenberger

    Abstract: We study unbiased $(1+1)$ evolutionary algorithms on linear functions with an unknown number $n$ of bits with non-zero weight. Static algorithms achieve an optimal runtime of $O(n (\ln n)^{2+ε})$, however, it remained unclear whether more dynamic parameter policies could yield better runtime guarantees. We consider two setups: one where the mutation rate follows a fixed schedule, and one where it… ▽ More

    Submitted 16 August, 2018; originally announced August 2018.

  10. arXiv:1808.01137  [pdf, ps, other

    math.PR cs.NE

    When Does Hillclimbing Fail on Monotone Functions: An entropy compression argument

    Authors: Johannes Lengler, Anders Martinsson, Angelika Steger

    Abstract: Hillclimbing is an essential part of any optimization algorithm. An important benchmark for hillclimbing algorithms on pseudo-Boolean functions $f: \{0,1\}^n \to \mathbb{R}$ are (strictly) montone functions, on which a surprising number of hillclimbers fail to be efficient. For example, the $(1+1)$-Evolutionary Algorithm is a standard hillclimber which flips each bit independently with probability… ▽ More

    Submitted 3 August, 2018; originally announced August 2018.

    Comments: 14 pages, no figures

    MSC Class: 68W40; 68W20; 60J10

  11. arXiv:1805.10842  [pdf, other

    cs.LG stat.ML

    Approximating Real-Time Recurrent Learning with Random Kronecker Factors

    Authors: Asier Mujika, Florian Meier, Angelika Steger

    Abstract: Despite all the impressive advances of recurrent neural networks, sequential data is still in need of better modelling. Truncated backpropagation through time (TBPTT), the learning algorithm most widely used in practice, suffers from the truncation bias, which drastically limits its ability to learn long-term dependencies.The Real-Time Recurrent Learning algorithm (RTRL) addresses this issue, but… ▽ More

    Submitted 5 December, 2018; v1 submitted 28 May, 2018; originally announced May 2018.

  12. arXiv:1801.07193  [pdf, ps, other

    cs.DM

    Even flying cops should think ahead

    Authors: Anders Martinsson, Florian Meier, Patrick Schnider, Angelika Steger

    Abstract: We study the entanglement game, which is a version of cops and robbers, on sparse graphs. While the minimum degree of a graph G is a lower bound for the number of cops needed to catch a robber in G, we show that the required number of cops can be much larger, even for graphs with small maximum degree. In particular, we show that there are 3-regular graphs where a linear number of cops are needed.

    Submitted 22 January, 2018; originally announced January 2018.

  13. arXiv:1705.08639  [pdf, ps, other

    cs.NE

    Fast-Slow Recurrent Neural Networks

    Authors: Asier Mujika, Florian Meier, Angelika Steger

    Abstract: Processing sequential data of variable length is a major challenge in a wide range of applications, such as speech recognition, language modeling, generative image modeling and machine translation. Here, we address this challenge by proposing a novel recurrent neural network (RNN) architecture, the Fast-Slow RNN (FS-RNN). The FS-RNN incorporates the strengths of both multiscale RNNs and deep trans… ▽ More

    Submitted 9 June, 2017; v1 submitted 24 May, 2017; originally announced May 2017.

    Comments: Corrected minor typos in Figure 1 and Zoneout citation

  14. arXiv:1610.01753  [pdf, ps, other

    cs.DM cs.DS

    A general lower bound for collaborative tree exploration

    Authors: Yann Disser, Frank Mousset, Andreas Noever, Nemanja Škorić, Angelika Steger

    Abstract: We consider collaborative graph exploration with a set of $k$ agents. All agents start at a common vertex of an initially unknown graph and need to collectively visit all other vertices. We assume agents are deterministic, vertices are distinguishable, moves are simultaneous, and we allow agents to communicate globally. For this setting, we give the first non-trivial lower bounds that bridge the g… ▽ More

    Submitted 6 October, 2016; originally announced October 2016.

  15. arXiv:1608.06451  [pdf, other

    cs.CV

    Failure Detection for Facial Landmark Detectors

    Authors: Andreas Steger, Radu Timofte, Luc Van Gool

    Abstract: Most face applications depend heavily on the accuracy of the face and facial landmarks detectors employed. Prediction of attributes such as gender, age, and identity usually completely fail when the faces are badly aligned due to inaccurate facial landmark detection. Despite the impressive recent advances in face and facial landmark detection, little study is on the recovery from and detection of… ▽ More

    Submitted 23 August, 2016; originally announced August 2016.

  16. arXiv:1608.03226  [pdf, ps, other

    math.CO cs.NE math.PR

    Drift Analysis and Evolutionary Algorithms Revisited

    Authors: Johannes Lengler, Angelika Steger

    Abstract: One of the easiest randomized greedy optimization algorithms is the following evolutionary algorithm which aims at maximizing a boolean function $f:\{0,1\}^n \to {\mathbb R}$. The algorithm starts with a random search point $ξ\in \{0,1\}^n$, and in each round it flips each bit of $ξ$ with probability $c/n$ independently at random, where $c>0$ is a fixed constant. The thus created offspring $ξ'$ re… ▽ More

    Submitted 15 November, 2017; v1 submitted 10 August, 2016; originally announced August 2016.

    Comments: minor changes to improve readability

    MSC Class: 60G40; 60J10; 68W20; 68W40 ACM Class: G.3

  17. arXiv:1607.05212  [pdf, other

    cs.DC

    Polynomial Lower Bound for Distributed Graph Coloring in a Weak LOCAL Model

    Authors: Dan Hefetz, Fabian Kuhn, Yannic Maus, Angelika Steger

    Abstract: We show an $Ω\big(Δ^{\frac{1}{3}-\fracη{3}}\big)$ lower bound on the runtime of any deterministic distributed $\mathcal{O}\big(Δ^{1+η}\big)$-graph coloring algorithm in a weak variant of the \LOCAL\ model. In particular, given a network graph \mbox{$G=(V,E)$}, in the weak \LOCAL\ model nodes communicate in synchronous rounds and they can use unbounded local computation. We assume that the nodes… ▽ More

    Submitted 14 September, 2016; v1 submitted 18 July, 2016; originally announced July 2016.

  18. arXiv:1605.03043  [pdf, ps, other

    cs.DM math.CO

    Unique reconstruction threshold for random jigsaw puzzles

    Authors: Rajko Nenadov, Pascal Pfister, Angelika Steger

    Abstract: A random jigsaw puzzle is constructed by arranging $n^2$ square pieces into an $n \times n$ grid and assigning to each edge of a piece one of $q$ available colours uniformly at random, with the restriction that touching edges receive the same colour. We show that if $q = o(n)$ then with high probability such a puzzle does not have a unique solution, while if $q \ge n^{1 + \varepsilon}$ for any con… ▽ More

    Submitted 11 May, 2016; v1 submitted 10 May, 2016; originally announced May 2016.

  19. arXiv:1104.1309  [pdf, other

    cs.DM

    Explosive Percolation in Erdös-Rényi-Like Random Graph Processes

    Authors: Konstantinos Panagiotou, Reto Spöhel, Angelika Steger, Henning Thomas

    Abstract: The evolution of the largest component has been studied intensely in a variety of random graph processes, starting in 1960 with the Erdös-Rényi process. It is well known that this process undergoes a phase transition at n/2 edges when, asymptotically almost surely, a linear-sized component appears. Moreover, this phase transition is continuous, i.e., in the limit the function f(c) denoting the fra… ▽ More

    Submitted 7 April, 2011; originally announced April 2011.

  20. arXiv:1006.1231  [pdf, ps, other

    cs.DS

    On the Insertion Time of Cuckoo Hashing

    Authors: Nikolaos Fountoulakis, Konstantinos Panagiotou, Angelika Steger

    Abstract: Cuckoo hashing is an efficient technique for creating large hash tables with high space utilization and guaranteed constant access times. There, each item can be placed in a location given by any one out of k different hash functions. In this paper we investigate further the random walk heuristic for inserting in an online fashion new items into the hash table. Provided that k > 2 and that the num… ▽ More

    Submitted 10 October, 2013; v1 submitted 7 June, 2010; originally announced June 2010.

    Comments: 27 pages, final version accepted by the SIAM Journal on Computing

    ACM Class: E.2; G.2.2