Skip to main content

Showing 1–50 of 217 results for author: Hofmann, T

.
  1. arXiv:2407.02294  [pdf, ps, other

    math.NT math.GR math.KT math.RA

    Determination of the stably free cancellation property for orders

    Authors: Werner Bley, Tommy Hofmann, Henri Johnston

    Abstract: Let $K$ be a number field, let $A$ be a finite-dimensional semisimple $K$-algebra, and let $Λ$ be an $\mathcal{O}_{K}$-order in $A$. We give practical algorithms that determine whether or not $Λ$ has the stably free cancellation property (SFC). As an application, we determine all finite groups $G$ of order at most $383$ such that the integral group ring $\mathbb{Z}[G]$ has SFC.

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 40 pages, comments welcome!

    MSC Class: 16H10; 16H20; 16Z05; 20C05; 20C10; 11R33; 11Y40

  2. arXiv:2406.16300  [pdf, other

    cs.LG

    Landsca** Linear Mode Connectivity

    Authors: Sidak Pal Singh, Linara Adilova, Michael Kamp, Asja Fischer, Bernhard Schölkopf, Thomas Hofmann

    Abstract: The presence of linear paths in parameter space between two different network solutions in certain cases, i.e., linear mode connectivity (LMC), has garnered interest from both theoretical and practical fronts. There has been significant research that either practically designs algorithms catered for connecting networks by adjusting for the permutation symmetries as well as some others that more th… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: ICML 2024 HiLD workshop paper

  3. arXiv:2406.11606  [pdf, other

    physics.class-ph

    Non-Weyl Behavior Induced by Superradiance: A Microwave Graph Study

    Authors: Junjie Lu, Tobias Hofmann, Hans-Jürgen Stöckmann, Ulrich Kuhl

    Abstract: We study experimentally the manifestation of non-Weyl graph behavior in open systems using microwave networks. For this a coupling variation to the network is necessary, which was out of reach till now. The coupling to the environment is changed by indirectly varying the boundary condition at the coupling vertex from Dirichlet to Neumann using a dangling bond with variable length attached the coup… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 6 pages, 3 figures

  4. arXiv:2406.10256  [pdf, other

    cs.CL cs.AI cs.LG

    Explicit Word Density Estimation for Language Modelling

    Authors: Jovan Andonov, Octavian Ganea, Paulina Grnarova, Gary Bécigneul, Thomas Hofmann

    Abstract: Language Modelling has been a central part of Natural Language Processing for a very long time and in the past few years LSTM-based language models have been the go-to method for commercial language modeling. Recently, it has been shown that when looking at language modelling from a matrix factorization point of view, the final Softmax layer limits the expressiveness of the model, by putting an up… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Master's thesis

  5. arXiv:2406.04327  [pdf, other

    cs.LG

    Causal Estimation of Memorisation Profiles

    Authors: Pietro Lesci, Clara Meister, Thomas Hofmann, Andreas Vlachos, Tiago Pimentel

    Abstract: Understanding memorisation in language models has practical and societal implications, e.g., studying models' training dynamics or preventing copyright infringements. Prior work defines memorisation as the causal effect of training with an instance on the model's ability to predict that instance. This definition relies on a counterfactual: the ability to observe what would have happened had the mo… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Published at the ACL 2024 Conference (main)

  6. arXiv:2406.01448  [pdf, other

    quant-ph cond-mat.stat-mech

    Theory of Eigenstate Thermalisation

    Authors: Tobias Helbig, Tobias Hofmann, Ronny Thomale, Martin Greiter

    Abstract: If we prepare an isolated, interacting quantum system in an eigenstate and perturb a local observable at an initial time, its expectation value will relax towards a thermal expectation value, even though the time evolution of the system is deterministic. The eigenstate thermalization hypothesis (ETH) of Deutsch and Srednicki suggests that this is possible because each eigenstate of the full quantu… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 27 pages, 19 figures

  7. arXiv:2405.19279  [pdf, other

    cs.LG

    Understanding and Minimising Outlier Features in Neural Network Training

    Authors: Bobby He, Lorenzo Noci, Daniele Paliotta, Imanol Schlag, Thomas Hofmann

    Abstract: Outlier Features (OF) are neurons whose activation magnitudes significantly exceed the average over a neural network's (NN) width. They are well known to emerge during standard transformer training and have the undesirable effect of hindering quantisation in afflicted models. Despite their practical importance, little is known behind why OFs emerge during training, nor how one can minimise them.… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  8. arXiv:2404.13766  [pdf, other

    cs.CV

    Object-Attribute Binding in Text-to-Image Generation: Evaluation and Control

    Authors: Maria Mihaela Trusca, Wolf Nuyts, Jonathan Thomm, Robert Honig, Thomas Hofmann, Tinne Tuytelaars, Marie-Francine Moens

    Abstract: Current diffusion models create photorealistic images given a text prompt as input but struggle to correctly bind attributes mentioned in the text to the right objects in the image. This is evidenced by our novel image-graph alignment model called EPViT (Edge Prediction Vision Transformer) for the evaluation of image-text alignment. To alleviate the above problem, we propose focused cross-attentio… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  9. arXiv:2404.09694  [pdf

    physics.app-ph

    MAM-STM: A software for autonomous control of single moieties towards specific surface positions

    Authors: Bernhard Ramsauer, Johannes J. Cartus, Oliver T. Hofmann

    Abstract: In this publication we introduce MAM-STM, a software to autonomously manipulate arbitrary moieties towards specific positions on a metal surface utilizing the tip of a scanning tunneling microscope (STM). Finding the optimal manipulation parameters for a specific moiety is challenging and time consuming, even for human experts. MAM-STM combines autonomous data acquisition with a sophisticated Q-le… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  10. arXiv:2404.07982  [pdf, other

    cs.CL cs.LG

    Language Imbalance Can Boost Cross-lingual Generalisation

    Authors: Anton Schäfer, Shauli Ravfogel, Thomas Hofmann, Tiago Pimentel, Imanol Schlag

    Abstract: Multilinguality is crucial for extending recent advancements in language modelling to diverse linguistic communities. To maintain high performance while representing multiple languages, multilingual models ideally align representations, allowing what is learned in one language to generalise to others. Prior research has emphasised the importance of parallel data and shared vocabulary elements as k… ▽ More

    Submitted 13 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    ACM Class: I.2.7

  11. arXiv:2404.06858  [pdf, other

    math.NT

    Number Theory in OSCAR

    Authors: Claus Fieker, Tommy Hofmann

    Abstract: We give a brief introduction to computational algebraic number theory in OSCAR. Our main focus is on number fields, rings of integers and their invariants. After recalling some classical results and their constructive counterparts, we showcase the functionality in two examples related to the investigation of the Cohen-Lenstra heuristic for quadratic fields and the Galois module structure of rings… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: Submitted as chapter for the upcoming book on the computer algebra system OSCAR

    MSC Class: 11Y40; 11-04; 11R29; 11R33; 11R37; 11N45

  12. arXiv:2404.06508  [pdf, other

    cs.CL cs.LG

    On the Effect of (Near) Duplicate Subwords in Language Modelling

    Authors: Anton Schäfer, Thomas Hofmann, Imanol Schlag, Tiago Pimentel

    Abstract: Tokenisation is a core part of language models (LMs). It involves splitting a character sequence into subwords which are assigned arbitrary indices before being served to the LM. While typically lossless, however, this process may lead to less sample efficient LM training: as it removes character-level information, it could make it harder for LMs to generalise across similar subwords, such as now… ▽ More

    Submitted 2 May, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    ACM Class: I.2.7

  13. arXiv:2404.02499  [pdf, other

    cs.AI cs.LG

    Learning Generalized Policies for Fully Observable Non-Deterministic Planning Domains

    Authors: Till Hofmann, Hector Geffner

    Abstract: General policies represent reactive strategies for solving large families of planning problems like the infinite collection of solvable instances from a given domain. Methods for learning such policies from a collection of small training instances have been developed successfully for classical domains. In this work, we extend the formulations and the resulting combinatorial methods for learning ge… ▽ More

    Submitted 13 May, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: presented at IJCAI'24

  14. arXiv:2403.07379  [pdf, other

    cs.LG cs.CL stat.ML

    Hallmarks of Optimization Trajectories in Neural Networks: Directional Exploration and Redundancy

    Authors: Sidak Pal Singh, Bobby He, Thomas Hofmann, Bernhard Schölkopf

    Abstract: We propose a fresh take on understanding the mechanisms of neural networks by analyzing the rich directional structure of optimization trajectories, represented by their pointwise parameters. Towards this end, we introduce some natural notions of the complexity of optimization trajectories, both qualitative and quantitative, which hallmark the directional nature of optimization in neural networks:… ▽ More

    Submitted 24 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: Preprint, 57 pages

  15. arXiv:2402.17457  [pdf, other

    cs.LG

    Why do Learning Rates Transfer? Reconciling Optimization and Scaling Limits for Deep Learning

    Authors: Lorenzo Noci, Alexandru Meterez, Thomas Hofmann, Antonio Orvieto

    Abstract: Recently, there has been growing evidence that if the width and depth of a neural network are scaled toward the so-called rich feature learning limit ($μ$P and its depth extension), then some hyperparameters - such as the learning rate - exhibit transfer from small to very large models, thus reducing the cost of hyperparameter tuning. From an optimization perspective, this phenomenon is puzzling,… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  16. arXiv:2402.14433  [pdf, other

    cs.CL cs.AI

    A Language Model's Guide Through Latent Space

    Authors: Dimitri von Rütte, Sotiris Anagnostidis, Gregor Bachmann, Thomas Hofmann

    Abstract: Concept guidance has emerged as a cheap and simple way to control the behavior of language models by probing their hidden representations for concept vectors and using them to perturb activations at inference time. While the focus of previous work has largely been on truthfulness, in this paper we extend this framework to a richer set of concepts such as appropriateness, humor, creativity and qual… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    ACM Class: I.2

  17. arXiv:2402.07839  [pdf, other

    cs.CV cs.LG

    Towards Meta-Pruning via Optimal Transport

    Authors: Alexander Theus, Olin Geimer, Friedrich Wicke, Thomas Hofmann, Sotiris Anagnostidis, Sidak Pal Singh

    Abstract: Structural pruning of neural networks conventionally relies on identifying and discarding less important neurons, a practice often resulting in significant accuracy loss that necessitates subsequent fine-tuning efforts. This paper introduces a novel approach named Intra-Fusion, challenging this prevailing pruning paradigm. Unlike existing methods that focus on designing meaningful neuron importanc… ▽ More

    Submitted 13 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: Accepted as a Spotlight (top 5% of submissions) at the International Conference on Learning Representations (ICLR) 2024

  18. arXiv:2402.03187  [pdf, other

    cs.LG

    How Good is a Single Basin?

    Authors: Kai Lion, Lorenzo Noci, Thomas Hofmann, Gregor Bachmann

    Abstract: The multi-modal nature of neural loss landscapes is often considered to be the main driver behind the empirical success of deep ensembles. In this work, we probe this belief by constructing various "connected" ensembles which are restricted to lie in the same basin. Through our experiments, we demonstrate that increased connectivity indeed negatively impacts performance. However, when incorporatin… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  19. arXiv:2402.03164  [pdf, ps, other

    cs.AI cs.LO

    Decidable Reasoning About Time in Finite-Domain Situation Calculus Theories

    Authors: Till Hofmann, Stefan Schupp, Gerhard Lakemeyer

    Abstract: Representing time is crucial for cyber-physical systems and has been studied extensively in the Situation Calculus. The most commonly used approach represents time by adding a real-valued fluent $\mathit{time}(a)$ that attaches a time point to each action and consequently to each situation. We show that in this approach, checking whether there is a reachable situation that satisfies a given formul… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  20. arXiv:2401.16024  [pdf, other

    cs.LG cs.AI

    Probabilistic Abduction for Visual Abstract Reasoning via Learning Rules in Vector-symbolic Architectures

    Authors: Michael Hersche, Francesco di Stefano, Thomas Hofmann, Abu Sebastian, Abbas Rahimi

    Abstract: Abstract reasoning is a cornerstone of human intelligence, and replicating it with artificial intelligence (AI) presents an ongoing challenge. This study focuses on efficiently solving Raven's progressive matrices (RPM), a visual test for assessing abstract reasoning abilities, by using distributed computation and operators provided by vector-symbolic architectures (VSA). Instead of hard-coding th… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted in NeurIPS 2023 Workshop on MATH-AI

  21. arXiv:2401.09276  [pdf, other

    cond-mat.mtrl-sci cond-mat.mes-hall cond-mat.soft physics.app-ph physics.chem-ph

    Wafer-scale fabrication of mesoporous silicon functionalized with electrically conductive polymers

    Authors: Manfred May, Mathis Boderius, Natalia Gostkowska-Lekner, Mark Busch, Klaus Habicht, Tommy Hofmann, Patrick Huber

    Abstract: The fabrication of hybrid materials consisting of nanoporous hosts with conductive polymers is a challenging task, since the extreme spatial confinement often conflicts with the stringent physico-chemical requirements for polymerization of organic constituents. Here, several low-threshold and scalable synthesis routes for such hybrids are presented. First, the electrochemical synthesis of composit… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: 11 pages, 7 figures

  22. Towards Bridging the Gap between High-Level Reasoning and Execution on Robots

    Authors: Till Hofmann

    Abstract: When reasoning about actions, e.g., by means of task planning or agent programming with Golog, the robot's actions are typically modeled on an abstract level, where complex actions such as picking up an object are treated as atomic primitives with deterministic effects and preconditions that only depend on the current state. However, when executing such an action on a robot it can no longer be see… ▽ More

    Submitted 30 December, 2023; originally announced January 2024.

    Comments: PhD Thesis

  23. arXiv:2312.09832  [pdf, other

    cs.LG

    Disentangling Linear Mode-Connectivity

    Authors: Gul Sena Altintas, Gregor Bachmann, Lorenzo Noci, Thomas Hofmann

    Abstract: Linear mode-connectivity (LMC) (or lack thereof) is one of the intriguing characteristics of neural network loss landscapes. While empirically well established, it unfortunately still lacks a proper theoretical understanding. Even worse, although empirical data points are abound, a systematic study of when networks exhibit LMC is largely missing in the literature. In this work we aim to close this… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: 9 pages, 5 figures

  24. arXiv:2312.09256  [pdf, other

    cs.CV

    LIME: Localized Image Editing via Attention Regularization in Diffusion Models

    Authors: Enis Simsar, Alessio Tonioni, Yongqin Xian, Thomas Hofmann, Federico Tombari

    Abstract: Diffusion models (DMs) have gained prominence due to their ability to generate high-quality, varied images, with recent advancements in text-to-image generation. The research focus is now shifting towards the controllability of DMs. A significant challenge within this domain is localized editing, where specific areas of an image are modified without affecting the rest of the content. This paper in… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  25. arXiv:2312.01538  [pdf, other

    cs.LG cs.NE

    Recurrent Distance Filtering for Graph Representation Learning

    Authors: Yuhui Ding, Antonio Orvieto, Bobby He, Thomas Hofmann

    Abstract: Graph neural networks based on iterative one-hop message passing have been shown to struggle in harnessing the information from distant nodes effectively. Conversely, graph transformers allow each node to attend to all other nodes directly, but lack graph inductive bias and have to rely on ad-hoc positional encoding. In this paper, we propose a new architecture to reconcile these challenges. Our a… ▽ More

    Submitted 5 June, 2024; v1 submitted 3 December, 2023; originally announced December 2023.

    Comments: ICML 2024

  26. arXiv:2311.06224  [pdf, other

    cs.CV cs.AI cs.LG

    Harnessing Synthetic Datasets: The Role of Shape Bias in Deep Neural Network Generalization

    Authors: Elior Benarous, Sotiris Anagnostidis, Luca Biggio, Thomas Hofmann

    Abstract: Recent advancements in deep learning have been primarily driven by the use of large models trained on increasingly vast datasets. While neural scaling laws have emerged to predict network performance given a specific level of computational resources, the growing demand for expansive datasets raises concerns. To address this, a new research direction has emerged, focusing on the creation of synthet… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

  27. arXiv:2311.03233  [pdf, other

    cs.LG cs.CV

    Navigating Scaling Laws: Compute Optimality in Adaptive Model Training

    Authors: Sotiris Anagnostidis, Gregor Bachmann, Imanol Schlag, Thomas Hofmann

    Abstract: In recent years, the state-of-the-art in deep learning has been dominated by very large models that have been pre-trained on vast amounts of data. The paradigm is very simple: investing more computational resources (optimally) leads to better performance, and even predictably so; neural scaling laws have been derived that accurately forecast the performance of a network for a desired level of comp… ▽ More

    Submitted 23 May, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

  28. arXiv:2311.01906  [pdf, other

    cs.LG

    Simplifying Transformer Blocks

    Authors: Bobby He, Thomas Hofmann

    Abstract: A simple design recipe for deep Transformers is to compose identical building blocks. But standard transformer blocks are far from simple, interweaving attention and MLP sub-blocks with skip connections & normalisation layers in precise arrangements. This complexity leads to brittle architectures, where seemingly minor changes can significantly reduce training speed, or render models untrainable.… ▽ More

    Submitted 31 May, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

    Comments: ICLR 2024

  29. arXiv:2310.05719  [pdf, other

    cs.LG stat.ML

    Transformer Fusion with Optimal Transport

    Authors: Moritz Imfeld, Jacopo Graldi, Marco Giordano, Thomas Hofmann, Sotiris Anagnostidis, Sidak Pal Singh

    Abstract: Fusion is a technique for merging multiple independently-trained neural networks in order to combine their capabilities. Past attempts have been restricted to the case of fully-connected, convolutional, and residual networks. This paper presents a systematic approach for fusing two or more transformer-based networks exploiting Optimal Transport to (soft-)align the various architectural components.… ▽ More

    Submitted 22 April, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: Appears at International Conference on Learning Representations (ICLR), 2024. M. Imfeld, J. Graldi, and M. Giordano are the first authors and contributed equally to this work

  30. arXiv:2310.01243  [pdf

    cond-mat.mtrl-sci

    Kinetic trap** of charge-transfer molecules at metal interfaces

    Authors: Anna Werkovits, Simon B Hollweger, Max Niederreiter, Thomas Risse, Johannes J. Cartus, Martin Sterrer, Sebastian Matera, Oliver T. Hofmann

    Abstract: Despite the common expectation that conjugated organic molecules on metals tend to adsorb in a flat-lying wetting layer, several recent studies have found strong indications for coverage-dependent transitions to upright-standing phases, which exhibit notably different physical properties. In this work, we argue that from an energetic perspective, thermodynamically stable upright-standing phases ma… ▽ More

    Submitted 3 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

  31. arXiv:2310.01165  [pdf, other

    cs.LG cs.AI

    Towards guarantees for parameter isolation in continual learning

    Authors: Giulia Lanzillotta, Sidak Pal Singh, Benjamin F. Grewe, Thomas Hofmann

    Abstract: Deep learning has proved to be a successful paradigm for solving many challenges in machine learning. However, deep neural networks fail when trained sequentially on multiple tasks, a shortcoming known as catastrophic forgetting in the continual learning literature. Despite a recent flourish of learning algorithms successfully addressing this problem, we find that provable guarantees against catas… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

    Comments: 10 pages, 3 figures

  32. arXiv:2309.11197  [pdf, other

    cs.LG cs.CL

    The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute

    Authors: Aleksandar Stanić, Dylan Ashley, Oleg Serikov, Louis Kirsch, Francesco Faccio, Jürgen Schmidhuber, Thomas Hofmann, Imanol Schlag

    Abstract: The Languini Kitchen serves as both a research collective and codebase designed to empower researchers with limited computational resources to contribute meaningfully to the field of language modelling. We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours. The number of tokens on which a model is trained is defined by the m… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

  33. arXiv:2308.11423  [pdf

    cond-mat.mtrl-sci

    Adsorption configurations of Co-phthalocyanine on In2O3(111)

    Authors: Margareta Wagner, Fabio Calcinelli, Andreas Jeindl, Michael Schmid, Oliver T. Hofmann, Ulrike Diebold

    Abstract: Indium oxide offers optical transparency paired with electric conductivity, a combination required in many optoelectronic applications. The most-stable In2O3(111) surface has a large unit cell (1.43 nm lattice constant). It contains a mixture of both bulk-like and undercoordinated O and In atoms and provides an ideal playground to explore the interaction of surfaces with organic molecules of simil… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

    Journal ref: Surface Science 722, 2022, 122065

  34. arXiv:2306.17759  [pdf, other

    stat.ML cs.LG

    The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit

    Authors: Lorenzo Noci, Chuning Li, Mufan Bill Li, Bobby He, Thomas Hofmann, Chris Maddison, Daniel M. Roy

    Abstract: In deep learning theory, the covariance matrix of the representations serves as a proxy to examine the network's trainability. Motivated by the success of Transformers, we study the covariance matrix of a modified Softmax-based attention model with skip connections in the proportional limit of infinite-depth-and-width. We show that at initialization the limiting distribution can be described by a… ▽ More

    Submitted 9 December, 2023; v1 submitted 30 June, 2023; originally announced June 2023.

  35. arXiv:2306.15434  [pdf, other

    cond-mat.other cond-mat.mes-hall physics.app-ph

    Realizing efficient topological temporal pum** in electrical circuits

    Authors: Alexander Stegmaier, Hauke Brand, Stefan Imhof, Alexander Fritzsche, Tobias Helbig, Tobias Hofmann, Igor Boettcher, Martin Greiter, Ching Hua Lee, Gaurav Bahl, Alexander Szameit, Tobias Kießling, Ronny Thomale, Lavi K. Upreti

    Abstract: Quantized adiabatic transport can occur when a system is slowly modulated over time. In most realizations however, the efficiency of such transport is reduced by unwanted dissipation, back-scattering, and non-adiabatic effects. In this work, we realize a topological adiabatic pump in an electrical circuit network that supports remarkably stable and long-lasting pum** of a voltage signal. We furt… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

    Comments: main (5 pages, 3 figures) plus supplement (8 pages, 4 figures)

  36. arXiv:2306.13575  [pdf, other

    cs.LG

    Scaling MLPs: A Tale of Inductive Bias

    Authors: Gregor Bachmann, Sotiris Anagnostidis, Thomas Hofmann

    Abstract: In this work we revisit the most fundamental building block in deep learning, the multi-layer perceptron (MLP), and study the limits of its performance on vision tasks. Empirical insights into MLPs are important for multiple reasons. (1) Given the recent narrative "less inductive bias is better", popularized due to transformers eclipsing convolutional models, it is natural to explore the limits of… ▽ More

    Submitted 3 October, 2023; v1 submitted 23 June, 2023; originally announced June 2023.

  37. arXiv:2306.02329  [pdf, other

    cs.CV

    Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes

    Authors: Alexandros Delitzas, Maria Parelli, Nikolas Hars, Georgios Vlassis, Sotirios Anagnostidis, Gregor Bachmann, Thomas Hofmann

    Abstract: Training models to apply common-sense linguistic knowledge and visual concepts from 2D images to 3D scene understanding is a promising direction that researchers have only recently started to explore. However, it still remains understudied whether 2D distilled knowledge can provide useful representations for downstream 3D vision-language tasks such as 3D question answering. In this paper, we propo… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

    Comments: The first two authors contributed equally. arXiv admin note: text overlap with arXiv:2304.06061

  38. arXiv:2305.15805  [pdf, other

    cs.CL cs.LG

    Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers

    Authors: Sotiris Anagnostidis, Dario Pavllo, Luca Biggio, Lorenzo Noci, Aurelien Lucchi, Thomas Hofmann

    Abstract: Autoregressive Transformers adopted in Large Language Models (LLMs) are hard to scale to long sequences. Despite several works trying to reduce their computational cost, most of LLMs still adopt attention layers between all pairs of tokens in the sequence, thus incurring a quadratic cost. In this study, we present a novel approach that dynamically prunes contextual information while preserving the… ▽ More

    Submitted 31 May, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

  39. arXiv:2305.09088  [pdf, other

    cs.LG stat.ML

    The Hessian perspective into the Nature of Convolutional Neural Networks

    Authors: Sidak Pal Singh, Thomas Hofmann, Bernhard Schölkopf

    Abstract: While Convolutional Neural Networks (CNNs) have long been investigated and applied, as well as theorized, we aim to provide a slightly different perspective into their nature -- through the perspective of their Hessian maps. The reason is that the loss Hessian captures the pairwise interaction of parameters and therefore forms a natural ground to probe how the architectural aspects of CNN get mani… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: ICML 2023 conference proceedings

  40. arXiv:2304.12427  [pdf, other

    cond-mat.mtrl-sci

    The impact of static distortion waves on superlubricity

    Authors: Lukas Hörmann, Johannes J. Cartus, Oliver T. Hofmann

    Abstract: Friction is a major source of energy loss in mechanical devices. This energy loss may be minimized by creating interfaces with extremely reduced friction, i.e. superlubricity. Conventional wisdom holds that incommensurate interface structures facilitate superlubricity. Accurately describing friction necessitates precise modeling of the interface structure. This, in turn, requires the use of accura… ▽ More

    Submitted 22 December, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

  41. arXiv:2304.06061  [pdf, other

    cs.CV

    CLIP-Guided Vision-Language Pre-training for Question Answering in 3D Scenes

    Authors: Maria Parelli, Alexandros Delitzas, Nikolas Hars, Georgios Vlassis, Sotirios Anagnostidis, Gregor Bachmann, Thomas Hofmann

    Abstract: Training models to apply linguistic knowledge and visual concepts from 2D images to 3D world understanding is a promising direction that researchers have only recently started to explore. In this work, we design a novel 3D pre-training Vision-Language method that helps a model learn semantically meaningful and transferable 3D scene point cloud representations. We inject the representational power… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Comments: CVPRW 2023. Code will be made publicly available: https://github.com/AlexDelitzas/3D-VQA

  42. arXiv:2303.09483  [pdf, other

    cs.LG cs.CV

    Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning

    Authors: Sanghwan Kim, Lorenzo Noci, Antonio Orvieto, Thomas Hofmann

    Abstract: In contrast to the natural capabilities of humans to learn new tasks in a sequential fashion, neural networks are known to suffer from catastrophic forgetting, where the model's performances on old tasks drop dramatically after being optimized for a new task. Since then, the continual learning (CL) community has proposed several solutions aiming to equip the neural network with the ability to lear… ▽ More

    Submitted 31 March, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  43. arXiv:2302.12091  [pdf, other

    cs.LG

    Random Teachers are Good Teachers

    Authors: Felix Sarnthein, Gregor Bachmann, Sotiris Anagnostidis, Thomas Hofmann

    Abstract: In this work, we investigate the implicit regularization induced by teacher-student learning dynamics in self-distillation. To isolate its effect, we describe a simple experiment where we consider teachers at random initialization instead of trained teachers. Surprisingly, when distilling a student into such a random teacher, we observe that the resulting model and its representations already poss… ▽ More

    Submitted 19 June, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

    Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR, volume 202 (2023), pages 30022-30041

  44. arXiv:2211.16966  [pdf, other

    math.CO

    Properties of uniformly $3$-connected graphs

    Authors: Frank Göring, Tobias Hofmann

    Abstract: A graph on at least ${k+1}$ vertices is uniformly $k$-connected if each pair of its vertices is connected by $k$ and not more than $k$ independent paths. We reinvestigate a recent constructive characterization of uniformly $3$-connected graphs and obtain a more detailed result that relates the number of vertices to the operations involved in constructing a respective uniformly $3$-connected graph.… ▽ More

    Submitted 5 June, 2024; v1 submitted 30 November, 2022; originally announced November 2022.

    MSC Class: 05C40; 05C75; 05C07; 05D99

  45. arXiv:2211.12346  [pdf, other

    astro-ph.CO cs.LG

    Cosmology from Galaxy Redshift Surveys with PointNet

    Authors: Sotiris Anagnostidis, Arne Thomsen, Tomasz Kacprzak, Tilman Tröster, Luca Biggio, Alexandre Refregier, Thomas Hofmann

    Abstract: In recent years, deep learning approaches have achieved state-of-the-art results in the analysis of point cloud data. In cosmology, galaxy redshift surveys resemble such a permutation invariant collection of positions in space. These surveys have so far mostly been analysed with two-point statistics, such as power spectra and correlation functions. The usage of these summary statistics is best jus… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

  46. arXiv:2210.14019  [pdf, other

    cs.LG

    The Curious Case of Benign Memorization

    Authors: Sotiris Anagnostidis, Gregor Bachmann, Lorenzo Noci, Thomas Hofmann

    Abstract: Despite the empirical advances of deep learning across a variety of learning tasks, our theoretical understanding of its success is still very restricted. One of the key challenges is the overparametrized nature of modern models, enabling complete overfitting of the data even if the labels are randomized, i.e. networks can completely \textit{memorize} all given patterns. While such a memorization… ▽ More

    Submitted 23 February, 2023; v1 submitted 25 October, 2022; originally announced October 2022.

  47. arXiv:2210.12084  [pdf, other

    cs.CL cs.AI cs.LG

    Decoding a Neural Retriever's Latent Space for Query Suggestion

    Authors: Leonard Adolphs, Michelle Chen Huebscher, Christian Buck, Sertan Girgin, Olivier Bachem, Massimiliano Ciaramita, Thomas Hofmann

    Abstract: Neural retrieval models have superseded classic bag-of-words methods such as BM25 as the retrieval framework of choice. However, neural systems lack the interpretability of bag-of-words models; it is not trivial to connect a query change to a change in the latent space that ultimately determines the retrieval results. To shed light on this embedding space, we learn a "query decoder" that, given a… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

  48. arXiv:2210.00828  [pdf, other

    cs.CV

    Mastering Spatial Graph Prediction of Road Networks

    Authors: Sotiris Anagnostidis, Aurelien Lucchi, Thomas Hofmann

    Abstract: Accurately predicting road networks from satellite images requires a global understanding of the network topology. We propose to capture such high-level information by introducing a graph-based framework that simulates the addition of sequences of graph edges using a reinforcement learning (RL) approach. In particular, given a partially generated graph associated with a satellite image, an RL agen… ▽ More

    Submitted 3 October, 2022; originally announced October 2022.

  49. arXiv:2207.12763  [pdf, ps, other

    cs.AI

    Using Abstraction for Interpretable Robot Programs in Stochastic Domains

    Authors: Till Hofmann, Vaishak Belle

    Abstract: A robot's actions are inherently stochastic, as its sensors are noisy and its actions do not always have the intended effects. For this reason, the agent language Golog has been extended to models with degrees of belief and stochastic actions. While this allows more precise robot models, the resulting programs are much harder to comprehend, because they need to deal with the noise, e.g., by loopin… ▽ More

    Submitted 26 July, 2022; originally announced July 2022.

    Comments: Presented at the KR'22 Workshop on Explainable Logic-Based Knowledge Representation (XLoKR). arXiv admin note: substantial text overlap with arXiv:2204.03536

  50. arXiv:2207.12319  [pdf, other

    cs.CV cs.AI

    OpenFilter: A Framework to Democratize Research Access to Social Media AR Filters

    Authors: Piera Riccio, Bill Psomas, Francesco Galati, Francisco Escolano, Thomas Hofmann, Nuria Oliver

    Abstract: Augmented Reality or AR filters on selfies have become very popular on social media platforms for a variety of applications, including marketing, entertainment and aesthetics. Given the wide adoption of AR face filters and the importance of faces in our social structures and relations, there is increased interest by the scientific community to analyze the impact of such filters from a psychologica… ▽ More

    Submitted 27 September, 2022; v1 submitted 19 July, 2022; originally announced July 2022.

    ACM Class: I.4.9