Skip to main content

Showing 1–38 of 38 results for author: Brandstetter, J

.
  1. arXiv:2406.04303  [pdf, other

    cs.CV cs.AI cs.LG

    Vision-LSTM: xLSTM as Generic Vision Backbone

    Authors: Benedikt Alkin, Maximilian Beck, Korbinian Pöppel, Sepp Hochreiter, Johannes Brandstetter

    Abstract: Transformers are widely used as generic backbones in computer vision, despite initially introduced for natural language processing. Recently, the Long Short-Term Memory (LSTM) has been extended to a scalable and performant architecture - the xLSTM - which overcomes long-standing LSTM limitations via exponential gating and parallelizable matrix memory structure. In this report, we introduce Vision-… ▽ More

    Submitted 2 July, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  2. arXiv:2405.13063  [pdf, other

    physics.ao-ph cs.LG

    Aurora: A Foundation Model of the Atmosphere

    Authors: Cristian Bodnar, Wessel P. Bruinsma, Ana Lucic, Megan Stanley, Johannes Brandstetter, Patrick Garvan, Maik Riechert, Jonathan Weyn, Haiyu Dong, Anna Vaughan, Jayesh K. Gupta, Kit Tambiratnam, Alex Archibald, Elizabeth Heider, Max Welling, Richard E. Turner, Paris Perdikaris

    Abstract: Deep learning foundation models are revolutionizing many facets of science by leveraging vast amounts of data to learn general-purpose representations that can be adapted to tackle diverse downstream tasks. Foundation models hold the promise to also transform our ability to model our planet and its subsystems by exploiting the vast expanse of Earth system data. Here we introduce Aurora, a large-sc… ▽ More

    Submitted 28 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  3. arXiv:2405.04517  [pdf, other

    cs.LG cs.AI stat.ML

    xLSTM: Extended Long Short-Term Memory

    Authors: Maximilian Beck, Korbinian Pöppel, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Michael Kopp, Günter Klambauer, Johannes Brandstetter, Sepp Hochreiter

    Abstract: In the 1990s, the constant error carousel and gating were introduced as the central ideas of the Long Short-Term Memory (LSTM). Since then, LSTMs have stood the test of time and contributed to numerous deep learning success stories, in particular they constituted the first Large Language Models (LLMs). However, the advent of the Transformer technology with parallelizable self-attention at its core… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  4. arXiv:2405.01355  [pdf, other

    physics.plasm-ph

    Neural-Parareal: Dynamically Training Neural Operators as Coarse Solvers for Time-Parallelisation of Fusion MHD Simulations

    Authors: S. J. P. Pamela, N. Carey, J. Brandstetter, R. Akers, L. Zanisi, J. Buchanan, V. Gopakumar, M. Hoelzl, G. Huijsmans, K. Pentland, T. James, G. Antonucci, the JOREK Team

    Abstract: The fusion research facility ITER is currently being assembled to demonstrate that fusion can be used for industrial energy production, while several other programmes across the world are also moving forward, such as EU-DEMO, CFETR, SPARC and STEP. The high engineering complexity of a tokamak makes it an extremely challenging device to optimise, and test-based optimisation would be too slow and to… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  5. arXiv:2404.07194  [pdf, other

    cs.LG cs.AI q-bio.BM

    VN-EGNN: E(3)-Equivariant Graph Neural Networks with Virtual Nodes Enhance Protein Binding Site Identification

    Authors: Florian Sestak, Lisa Schneckenreiter, Johannes Brandstetter, Sepp Hochreiter, Andreas Mayr, Günter Klambauer

    Abstract: Being able to identify regions within or around proteins, to which ligands can potentially bind, is an essential step to develop new drugs. Binding site identification methods can now profit from the availability of large amounts of 3D structures in protein structure databases or from AlphaFold predictions. Current binding site identification methods heavily rely on graph neural networks (GNNs), u… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  6. arXiv:2403.04750  [pdf, other

    physics.flu-dyn cs.LG

    JAX-SPH: A Differentiable Smoothed Particle Hydrodynamics Framework

    Authors: Artur P. Toshev, Harish Ramachandran, Jonas A. Erbesdobler, Gianluca Galletti, Johannes Brandstetter, Nikolaus A. Adams

    Abstract: Particle-based fluid simulations have emerged as a powerful tool for solving the Navier-Stokes equations, especially in cases that include intricate physics and free surfaces. The recent addition of machine learning methods to the toolbox for solving such problems is pushing the boundary of the quality vs. speed tradeoff of such numerical simulations. In this work, we lead the way to Lagrangian fl… ▽ More

    Submitted 7 July, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: Accepted at the ICLR 2024 Workshop on AI4Differential Equations In Science

  7. arXiv:2403.04747  [pdf, other

    cs.LG cs.AI stat.ML

    GNN-VPA: A Variance-Preserving Aggregation Strategy for Graph Neural Networks

    Authors: Lisa Schneckenreiter, Richard Freinschlag, Florian Sestak, Johannes Brandstetter, Günter Klambauer, Andreas Mayr

    Abstract: Graph neural networks (GNNs), and especially message-passing neural networks, excel in various domains such as physics, drug discovery, and molecular modeling. The expressivity of GNNs with respect to their ability to discriminate non-isomorphic graphs critically depends on the functions employed for message aggregation and graph-level readout. By applying signal propagation theory, we propose a v… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: Accepted at ICLR 2024 (Tiny Papers Track)

  8. arXiv:2402.14730  [pdf, other

    cs.LG cs.AI

    Clifford-Steerable Convolutional Neural Networks

    Authors: Maksim Zhdanov, David Ruhe, Maurice Weiler, Ana Lucic, Johannes Brandstetter, Patrick Forré

    Abstract: We present Clifford-Steerable Convolutional Neural Networks (CS-CNNs), a novel class of $\mathrm{E}(p, q)$-equivariant CNNs. CS-CNNs process multivector fields on pseudo-Euclidean spaces $\mathbb{R}^{p,q}$. They cover, for instance, $\mathrm{E}(3)$-equivariance on $\mathbb{R}^3$ and Poincaré-equivariance on Minkowski spacetime $\mathbb{R}^{1,3}$. Our approach is based on an implicit parametrizatio… ▽ More

    Submitted 6 July, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: accepted to ICML 2024

  9. arXiv:2402.14009  [pdf, other

    cs.LG cs.CV

    Geometry-Informed Neural Networks

    Authors: Arturs Berzins, Andreas Radler, Sebastian Sanokowski, Sepp Hochreiter, Johannes Brandstetter

    Abstract: Geometry is a ubiquitous language of computer graphics, design, and engineering. However, the lack of large shape datasets limits the application of state-of-the-art supervised learning methods and motivates the exploration of alternative learning strategies. To this end, we introduce geometry-informed neural networks (GINNs) to train shape generative models \emph{without any data}. GINNs combine… ▽ More

    Submitted 27 May, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  10. arXiv:2402.12365  [pdf, other

    cs.LG cs.AI physics.flu-dyn

    Universal Physics Transformers: A Framework For Efficiently Scaling Neural Operators

    Authors: Benedikt Alkin, Andreas Fürst, Simon Schmid, Lukas Gruber, Markus Holzleitner, Johannes Brandstetter

    Abstract: Neural operators, serving as physics surrogate models, have recently gained increased interest. With ever increasing problem complexity, the natural question arises: what is an efficient way to scale neural operators to larger and more complex simulations - most importantly by taking into account different types of simulation datasets. This is of special interest since, akin to their numerical cou… ▽ More

    Submitted 30 April, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

  11. arXiv:2402.10093  [pdf, other

    cs.CV cs.AI cs.LG

    MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations

    Authors: Benedikt Alkin, Lukas Miklautz, Sepp Hochreiter, Johannes Brandstetter

    Abstract: We introduce MIM (Masked Image Modeling)-Refiner, a contrastive learning boost for pre-trained MIM models. MIM-Refiner is motivated by the insight that strong representations within MIM models generally reside in intermediate layers. Accordingly, MIM-Refiner leverages multiple contrastive heads that are connected to different intermediate layers. In each head, a modified nearest neighbor objective… ▽ More

    Submitted 3 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

  12. arXiv:2402.08561  [pdf, other

    physics.plasm-ph

    Data efficiency and long term prediction capabilities for neural operator surrogate models of core and edge plasma codes

    Authors: N. Carey, L. Zanisi, S. Pamela, V. Gopakumar, J. Omotani, J. Buchanan, J. Brandstetter

    Abstract: Simulation-based plasma scenario development, optimization and control are crucial elements towards the successful deployment of next-generation experimental tokamaks and Fusion power plants. Current simulation codes require extremely intensive use of HPC resources that make them unsuitable for iterative or real time applications. Neural network based surrogate models of expensive simulators have… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: IAEA-FEC 2023 manuscript

  13. arXiv:2402.06275  [pdf, other

    physics.flu-dyn cs.LG

    Neural SPH: Improved Neural Modeling of Lagrangian Fluid Dynamics

    Authors: Artur P. Toshev, Jonas A. Erbesdobler, Nikolaus A. Adams, Johannes Brandstetter

    Abstract: Smoothed particle hydrodynamics (SPH) is omnipresent in modern engineering and scientific disciplines. SPH is a class of Lagrangian schemes that discretize fluid dynamics via finite material points that are tracked through the evolving velocity field. Due to the particle-like nature of the simulation, graph neural networks (GNNs) have emerged as appealing and successful surrogates. However, the pr… ▽ More

    Submitted 7 July, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

    Comments: Accepted at the 41st International Conference on Machine Learning (ICML 2024). Project website: https://arturtoshev.github.io/neural-sph-blog/

  14. arXiv:2311.04293  [pdf, other

    cs.LG

    Lie Point Symmetry and Physics Informed Networks

    Authors: Tara Akhound-Sadegh, Laurence Perreault-Levasseur, Johannes Brandstetter, Max Welling, Siamak Ravanbakhsh

    Abstract: Symmetries have been leveraged to improve the generalization of neural networks through different mechanisms from data augmentation to equivariant architectures. However, despite their potential, their integration into neural solvers for partial differential equations (PDEs) remains largely unexplored. We explore the integration of PDE symmetries, known as Lie point symmetries, in a major family o… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: NeurIPS 2023

  15. arXiv:2308.05732  [pdf, other

    cs.LG cs.AI

    PDE-Refiner: Achieving Accurate Long Rollouts with Neural PDE Solvers

    Authors: Phillip Lippe, Bastiaan S. Veeling, Paris Perdikaris, Richard E. Turner, Johannes Brandstetter

    Abstract: Time-dependent partial differential equations (PDEs) are ubiquitous in science and engineering. Recently, mostly due to the high computational cost of traditional solution techniques, deep neural network based surrogates have gained increased interest. The practical utility of such neural PDE solvers relies on their ability to provide accurate, stable predictions over long time horizons, which is… ▽ More

    Submitted 21 October, 2023; v1 submitted 10 August, 2023; originally announced August 2023.

    Comments: Project website: https://phlippe.github.io/PDERefiner/

  16. arXiv:2305.15603  [pdf, other

    cs.LG physics.flu-dyn

    Learning Lagrangian Fluid Mechanics with E($3$)-Equivariant Graph Neural Networks

    Authors: Artur P. Toshev, Gianluca Galletti, Johannes Brandstetter, Stefan Adami, Nikolaus A. Adams

    Abstract: We contribute to the vastly growing field of machine learning for engineering systems by demonstrating that equivariant graph neural networks have the potential to learn more accurate dynamic-interaction models than their non-equivariant counterparts. We benchmark two well-studied fluid-flow systems, namely 3D decaying Taylor-Green vortex and 3D reverse Poiseuille flow, and evaluate the models bas… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: GSI'23 6th International Conference on Geometric Science of Information; 10 pages; oral. arXiv admin note: substantial text overlap with arXiv:2304.00150

  17. arXiv:2305.11141  [pdf, other

    cs.LG cs.AI

    Clifford Group Equivariant Neural Networks

    Authors: David Ruhe, Johannes Brandstetter, Patrick Forré

    Abstract: We introduce Clifford Group Equivariant Neural Networks: a novel approach for constructing $\mathrm{O}(n)$- and $\mathrm{E}(n)$-equivariant models. We identify and study the $\textit{Clifford group}$, a subgroup inside the Clifford algebra tailored to achieve several favorable properties. Primarily, the group's action forms an orthogonal automorphism that extends beyond the typical vector space to… ▽ More

    Submitted 22 October, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: Published at NeurIPS 2023 (Oral)

  18. arXiv:2304.00150  [pdf, other

    cs.LG physics.flu-dyn

    E($3$) Equivariant Graph Neural Networks for Particle-Based Fluid Mechanics

    Authors: Artur P. Toshev, Gianluca Galletti, Johannes Brandstetter, Stefan Adami, Nikolaus A. Adams

    Abstract: We contribute to the vastly growing field of machine learning for engineering systems by demonstrating that equivariant graph neural networks have the potential to learn more accurate dynamic-interaction models than their non-equivariant counterparts. We benchmark two well-studied fluid flow systems, namely the 3D decaying Taylor-Green vortex and the 3D reverse Poiseuille flow, and compare equivar… ▽ More

    Submitted 31 March, 2023; originally announced April 2023.

    Comments: ICLR 2023 Workshop on Physics for Machine Learning

  19. arXiv:2302.08811  [pdf, other

    cs.LG

    G-Signatures: Global Graph Propagation With Randomized Signatures

    Authors: Bernhard Schäfl, Lukas Gruber, Johannes Brandstetter, Sepp Hochreiter

    Abstract: Graph neural networks (GNNs) have evolved into one of the most popular deep learning architectures. However, GNNs suffer from over-smoothing node information and, therefore, struggle to solve tasks where global graph properties are relevant. We introduce G-Signatures, a novel graph learning method that enables global graph propagation via randomized signatures. G-Signatures use a new graph convers… ▽ More

    Submitted 30 August, 2023; v1 submitted 17 February, 2023; originally announced February 2023.

    Comments: 7 pages (+ appendix); 4 figures

  20. arXiv:2302.06594  [pdf, other

    cs.LG cs.AI cs.CV

    Geometric Clifford Algebra Networks

    Authors: David Ruhe, Jayesh K. Gupta, Steven de Keninck, Max Welling, Johannes Brandstetter

    Abstract: We propose Geometric Clifford Algebra Networks (GCANs) for modeling dynamical systems. GCANs are based on symmetry group transformations using geometric (Clifford) algebras. We first review the quintessence of modern (plane-based) geometric algebra, which builds on isometries encoded as elements of the $\mathrm{Pin}(p,q,r)$ group. We then propose the concept of group action layers, which linearly… ▽ More

    Submitted 29 May, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

  21. arXiv:2301.10343  [pdf, other

    cs.LG cs.AI

    ClimaX: A foundation model for weather and climate

    Authors: Tung Nguyen, Johannes Brandstetter, Ashish Kapoor, Jayesh K. Gupta, Aditya Grover

    Abstract: Most state-of-the-art approaches for weather and climate modeling are based on physics-informed numerical models of the atmosphere. These approaches aim to model the non-linear dynamics and complex interactions between multiple variables, which are challenging to approximate. Additionally, many such numerical models are computationally intensive, especially when modeling the atmospheric phenomenon… ▽ More

    Submitted 18 December, 2023; v1 submitted 24 January, 2023; originally announced January 2023.

    Comments: International Conference on Machine Learning 2023

  22. arXiv:2209.15616  [pdf, other

    cs.LG cs.CV

    Towards Multi-spatiotemporal-scale Generalized PDE Modeling

    Authors: Jayesh K. Gupta, Johannes Brandstetter

    Abstract: Partial differential equations (PDEs) are central to describing complex physical system simulations. Their expensive solution techniques have led to an increased interest in deep neural network based surrogates. However, the practical utility of training such surrogates is contingent on their ability to model complex multi-scale spatio-temporal phenomena. Various neural network architectures have… ▽ More

    Submitted 15 November, 2022; v1 submitted 30 September, 2022; originally announced September 2022.

  23. arXiv:2209.04934  [pdf, other

    cs.LG cs.CV physics.flu-dyn

    Clifford Neural Layers for PDE Modeling

    Authors: Johannes Brandstetter, Rianne van den Berg, Max Welling, Jayesh K. Gupta

    Abstract: Partial differential equations (PDEs) see widespread use in sciences and engineering to describe simulation of physical processes as scalar and vector fields interacting and coevolving over time. Due to the computationally expensive nature of their standard solution methods, neural PDE surrogates have become an active research topic to accelerate these simulations. However, current methods do not… ▽ More

    Submitted 2 March, 2023; v1 submitted 8 September, 2022; originally announced September 2022.

    Comments: Accepted at ICLR-2023

  24. arXiv:2206.03483  [pdf, other

    cs.LG

    Few-Shot Learning by Dimensionality Reduction in Gradient Space

    Authors: Martin Gauch, Maximilian Beck, Thomas Adler, Dmytro Kotsur, Stefan Fiel, Hamid Eghbal-zadeh, Johannes Brandstetter, Johannes Kofler, Markus Holzleitner, Werner Zellinger, Daniel Klotz, Sepp Hochreiter, Sebastian Lehner

    Abstract: We introduce SubGD, a novel few-shot learning method which is based on the recent finding that stochastic gradient descent updates tend to live in a low-dimensional parameter subspace. In experimental and theoretical analyses, we show that models confined to a suitable predefined subspace generalize well for few-shot learning. A suitable subspace fulfills three criteria across the given tasks: it… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

    Comments: Accepted at Conference on Lifelong Learning Agents (CoLLAs) 2022. Code: https://github.com/ml-jku/subgd Blog post: https://ml-jku.github.io/subgd

    Journal ref: Proceedings of The 1st Conference on Lifelong Learning Agents, PMLR 199:1043-1064 (2022)

  25. arXiv:2202.07643  [pdf, other

    cs.LG cs.CV

    Lie Point Symmetry Data Augmentation for Neural PDE Solvers

    Authors: Johannes Brandstetter, Max Welling, Daniel E. Worrall

    Abstract: Neural networks are increasingly being used to solve partial differential equations (PDEs), replacing slower numerical solvers. However, a critical issue is that neural PDE solvers require high-quality ground truth data, which usually must come from the very solvers they are designed to replace. Thus, we are presented with a proverbial chicken-and-egg problem. In this paper, we present a method, w… ▽ More

    Submitted 29 May, 2022; v1 submitted 15 February, 2022; originally announced February 2022.

    Comments: Published at ICML 2022, Github: https://github.com/brandstetter-johannes/LPSDA

  26. arXiv:2202.03376  [pdf, other

    cs.LG cs.CV math.NA

    Message Passing Neural PDE Solvers

    Authors: Johannes Brandstetter, Daniel Worrall, Max Welling

    Abstract: The numerical solution of partial differential equations (PDEs) is difficult, having led to a century of research so far. Recently, there have been pushes to build neural--numerical hybrid solvers, which piggy-backs the modern trend towards fully end-to-end learned systems. Most works so far can only generalize over a subset of properties to which a generic solver would be faced, including: resolu… ▽ More

    Submitted 20 March, 2023; v1 submitted 7 February, 2022; originally announced February 2022.

    Comments: Published at ICLR 2022 (Spotlight paper), Github: https://github.com/brandstetter-johannes/MP-Neural-PDE-Solvers

  27. arXiv:2110.02905  [pdf, other

    cs.LG cs.AI stat.ML

    Geometric and Physical Quantities Improve E(3) Equivariant Message Passing

    Authors: Johannes Brandstetter, Rob Hesselink, Elise van der Pol, Erik J Bekkers, Max Welling

    Abstract: Including covariant information, such as position, force, velocity or spin is important in many tasks in computational physics and chemistry. We introduce Steerable E(3) Equivariant Graph Neural Networks (SEGNNs) that generalise equivariant graph networks, such that node and edge attributes are not restricted to invariant scalars, but can contain covariant information, such as vectors or tensors.… ▽ More

    Submitted 26 March, 2022; v1 submitted 6 October, 2021; originally announced October 2021.

    Comments: Published at ICLR 2022 (Spotlight paper), Github: https://github.com/RobDHess/Steerable-E3-GNN

  28. arXiv:2106.11299  [pdf, other

    cs.LG cs.AI stat.ML

    Boundary Graph Neural Networks for 3D Simulations

    Authors: Andreas Mayr, Sebastian Lehner, Arno Mayrhofer, Christoph Kloss, Sepp Hochreiter, Johannes Brandstetter

    Abstract: The abundance of data has given machine learning considerable momentum in natural sciences and engineering, though modeling of physical processes is often difficult. A particularly tough problem is the efficient representation of geometric boundaries. Triangularized geometric boundaries are well understood and ubiquitous in engineering applications. However, it is notoriously difficult to integrat… ▽ More

    Submitted 20 April, 2023; v1 submitted 21 June, 2021; originally announced June 2021.

    Comments: accepted for presentation at the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23)

  29. arXiv:2105.01636  [pdf, other

    cs.LG stat.ML

    Learning 3D Granular Flow Simulations

    Authors: Andreas Mayr, Sebastian Lehner, Arno Mayrhofer, Christoph Kloss, Sepp Hochreiter, Johannes Brandstetter

    Abstract: Recently, the application of machine learning models has gained momentum in natural sciences and engineering, which is a natural fit due to the abundance of data in these fields. However, the modeling of physical processes from simulation data without first principle solutions remains difficult. Here, we present a Graph Neural Networks approach towards accurate modeling of complex 3D granular flow… ▽ More

    Submitted 4 May, 2021; originally announced May 2021.

  30. arXiv:2012.01399  [pdf, ps, other

    cs.LG cs.AI math.OC

    Convergence Proof for Actor-Critic Methods Applied to PPO and RUDDER

    Authors: Markus Holzleitner, Lukas Gruber, José Arjona-Medina, Johannes Brandstetter, Sepp Hochreiter

    Abstract: We prove under commonly used assumptions the convergence of actor-critic reinforcement learning algorithms, which simultaneously learn a policy function, the actor, and a value function, the critic. Both functions can be deep neural networks of arbitrary complexity. Our framework allows showing convergence of the well known Proximal Policy Optimization (PPO) and of the recently introduced RUDDER.… ▽ More

    Submitted 2 December, 2020; originally announced December 2020.

    Comments: 20 pages

  31. arXiv:2010.06498  [pdf, other

    cs.LG

    Cross-Domain Few-Shot Learning by Representation Fusion

    Authors: Thomas Adler, Johannes Brandstetter, Michael Widrich, Andreas Mayr, David Kreil, Michael Kopp, Günter Klambauer, Sepp Hochreiter

    Abstract: In order to quickly adapt to new data, few-shot learning aims at learning from few examples, often by using already acquired knowledge. The new data often differs from the previously seen data due to a domain shift, that is, a change of the input-target distribution. While several methods perform well on small domain shifts like new target classes with similar inputs, larger domain shifts are stil… ▽ More

    Submitted 16 February, 2021; v1 submitted 13 October, 2020; originally announced October 2020.

  32. arXiv:2009.14108  [pdf, other

    cs.LG cs.AI stat.ML

    Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution

    Authors: Vihang P. Patil, Markus Hofmarcher, Marius-Constantin Dinu, Matthias Dorfer, Patrick M. Blies, Johannes Brandstetter, Jose A. Arjona-Medina, Sepp Hochreiter

    Abstract: Reinforcement learning algorithms require many samples when solving complex hierarchical tasks with sparse and delayed rewards. For such complex tasks, the recently proposed RUDDER uses reward redistribution to leverage steps in the Q-function that are associated with accomplishing sub-tasks. However, often only few episodes with high rewards are available as demonstrations since current explorati… ▽ More

    Submitted 28 June, 2022; v1 submitted 29 September, 2020; originally announced September 2020.

    Comments: Github: https://github.com/ml-jku/align-rudder, YouTube: https://youtu.be/HO-_8ZUl-UY

  33. arXiv:2008.02217  [pdf, other

    cs.NE cs.CL cs.LG stat.ML

    Hopfield Networks is All You Need

    Authors: Hubert Ramsauer, Bernhard Schäfl, Johannes Lehner, Philipp Seidl, Michael Widrich, Thomas Adler, Lukas Gruber, Markus Holzleitner, Milena Pavlović, Geir Kjetil Sandve, Victor Greiff, David Kreil, Michael Kopp, Günter Klambauer, Johannes Brandstetter, Sepp Hochreiter

    Abstract: We introduce a modern Hopfield network with continuous states and a corresponding update rule. The new Hopfield network can store exponentially (with the dimension of the associative space) many patterns, retrieves the pattern with one update, and has exponentially small retrieval errors. It has three types of energy minima (fixed points of the update): (1) global fixed point averaging over all pa… ▽ More

    Submitted 28 April, 2021; v1 submitted 16 July, 2020; originally announced August 2020.

    Comments: 10 pages (+ appendix); 12 figures; Blog: https://ml-jku.github.io/hopfield-layers/; GitHub: https://github.com/ml-jku/hopfield-layers

  34. arXiv:2007.13505  [pdf, other

    cs.LG q-bio.BM stat.ML

    Modern Hopfield Networks and Attention for Immune Repertoire Classification

    Authors: Michael Widrich, Bernhard Schäfl, Hubert Ramsauer, Milena Pavlović, Lukas Gruber, Markus Holzleitner, Johannes Brandstetter, Geir Kjetil Sandve, Victor Greiff, Sepp Hochreiter, Günter Klambauer

    Abstract: A central mechanism in machine learning is to identify, store, and recognize patterns. How to learn, access, and retrieve such patterns is crucial in Hopfield networks and the more recent transformer architectures. We show that the attention mechanism of transformer architectures is actually the update rule of modern Hopfield networks that can store exponentially many patterns. We exploit this hig… ▽ More

    Submitted 16 July, 2020; originally announced July 2020.

    Comments: 10 pages (+appendix); Source code and datasets: https://github.com/ml-jku/DeepRC

    Journal ref: Advances in Neural Information Processing Systems 33 (NeurIPS 2020)

  35. arXiv:1911.03941  [pdf, other

    cs.LG physics.ao-ph stat.ML

    Using LSTMs for climate change assessment studies on droughts and floods

    Authors: Frederik Kratzert, Daniel Klotz, Johannes Brandstetter, Pieter-Jan Hoedt, Grey Nearing, Sepp Hochreiter

    Abstract: Climate change affects occurrences of floods and droughts worldwide. However, predicting climate impacts over individual watersheds is difficult, primarily because accurate hydrological forecasts require models that are calibrated to past data. In this work we present a large-scale LSTM-based modeling approach that -- by training on large data sets -- learns a diversity of hydrological behaviors.… ▽ More

    Submitted 28 November, 2019; v1 submitted 10 November, 2019; originally announced November 2019.

  36. arXiv:1910.13804  [pdf, other

    cs.LG quant-ph stat.ML

    Quantum Optical Experiments Modeled by Long Short-Term Memory

    Authors: Thomas Adler, Manuel Erhard, Mario Krenn, Johannes Brandstetter, Johannes Kofler, Sepp Hochreiter

    Abstract: We demonstrate how machine learning is able to model experiments in quantum physics. Quantum entanglement is a cornerstone for upcoming quantum technologies such as quantum computation and quantum cryptography. Of particular interest are complex quantum states with more than two particles and a large number of entangled quantum levels. Given such a multiparticle high-dimensional quantum state, it… ▽ More

    Submitted 30 October, 2019; originally announced October 2019.

    Comments: 9 pages

    Journal ref: Photonics 8(12), 535 (2021)

  37. arXiv:1806.07857  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    RUDDER: Return Decomposition for Delayed Rewards

    Authors: Jose A. Arjona-Medina, Michael Gillhofer, Michael Widrich, Thomas Unterthiner, Johannes Brandstetter, Sepp Hochreiter

    Abstract: We propose RUDDER, a novel reinforcement learning approach for delayed rewards in finite Markov decision processes (MDPs). In MDPs the Q-values are equal to the expected immediate reward plus the expected future rewards. The latter are related to bias problems in temporal difference (TD) learning and to high variance problems in Monte Carlo (MC) learning. Both problems are even more severe when re… ▽ More

    Submitted 10 September, 2019; v1 submitted 20 June, 2018; originally announced June 2018.

    Comments: 9 Pages plus appendix. For videos https://goo.gl/EQerZV

  38. arXiv:1801.07926  [pdf, other

    hep-ex hep-ph

    Higgs boson results on couplings to fermions, CP parameters and perspectives for HL-LHC (ATLAS AND CMS)

    Authors: Johannes Brandstetter

    Abstract: This report summarizes latest ATLAS and CMS results on Higgs boson couplings to fermions.~Presented topics include decays into final states of pairs of tau leptons and pairs of bottom quarks as well as results on the ttH production mode.~Results are complemented by tests of the CP invariance and searches for lepton flavor violating decays.~Finally, prospects of future Higgs boson analyses within t… ▽ More

    Submitted 24 January, 2018; originally announced January 2018.

    Comments: Talk presented at the International Workshop on Future Linear Colliders (LCWS2017), Strasbourg, France, 23-27 October 2017. C17-10-23.2