Skip to main content

Showing 1–16 of 16 results for author: Anagnostidis, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.14433  [pdf, other

    cs.CL cs.AI

    A Language Model's Guide Through Latent Space

    Authors: Dimitri von Rütte, Sotiris Anagnostidis, Gregor Bachmann, Thomas Hofmann

    Abstract: Concept guidance has emerged as a cheap and simple way to control the behavior of language models by probing their hidden representations for concept vectors and using them to perturb activations at inference time. While the focus of previous work has largely been on truthfulness, in this paper we extend this framework to a richer set of concepts such as appropriateness, humor, creativity and qual… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    ACM Class: I.2

  2. arXiv:2402.07839  [pdf, other

    cs.CV cs.LG

    Towards Meta-Pruning via Optimal Transport

    Authors: Alexander Theus, Olin Geimer, Friedrich Wicke, Thomas Hofmann, Sotiris Anagnostidis, Sidak Pal Singh

    Abstract: Structural pruning of neural networks conventionally relies on identifying and discarding less important neurons, a practice often resulting in significant accuracy loss that necessitates subsequent fine-tuning efforts. This paper introduces a novel approach named Intra-Fusion, challenging this prevailing pruning paradigm. Unlike existing methods that focus on designing meaningful neuron importanc… ▽ More

    Submitted 13 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: Accepted as a Spotlight (top 5% of submissions) at the International Conference on Learning Representations (ICLR) 2024

  3. arXiv:2311.06224  [pdf, other

    cs.CV cs.AI cs.LG

    Harnessing Synthetic Datasets: The Role of Shape Bias in Deep Neural Network Generalization

    Authors: Elior Benarous, Sotiris Anagnostidis, Luca Biggio, Thomas Hofmann

    Abstract: Recent advancements in deep learning have been primarily driven by the use of large models trained on increasingly vast datasets. While neural scaling laws have emerged to predict network performance given a specific level of computational resources, the growing demand for expansive datasets raises concerns. To address this, a new research direction has emerged, focusing on the creation of synthet… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

  4. arXiv:2311.03233  [pdf, other

    cs.LG cs.CV

    Navigating Scaling Laws: Compute Optimality in Adaptive Model Training

    Authors: Sotiris Anagnostidis, Gregor Bachmann, Imanol Schlag, Thomas Hofmann

    Abstract: In recent years, the state-of-the-art in deep learning has been dominated by very large models that have been pre-trained on vast amounts of data. The paradigm is very simple: investing more computational resources (optimally) leads to better performance, and even predictably so; neural scaling laws have been derived that accurately forecast the performance of a network for a desired level of comp… ▽ More

    Submitted 23 May, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

  5. arXiv:2310.05719  [pdf, other

    cs.LG stat.ML

    Transformer Fusion with Optimal Transport

    Authors: Moritz Imfeld, Jacopo Graldi, Marco Giordano, Thomas Hofmann, Sotiris Anagnostidis, Sidak Pal Singh

    Abstract: Fusion is a technique for merging multiple independently-trained neural networks in order to combine their capabilities. Past attempts have been restricted to the case of fully-connected, convolutional, and residual networks. This paper presents a systematic approach for fusing two or more transformer-based networks exploiting Optimal Transport to (soft-)align the various architectural components.… ▽ More

    Submitted 22 April, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: Appears at International Conference on Learning Representations (ICLR), 2024. M. Imfeld, J. Graldi, and M. Giordano are the first authors and contributed equally to this work

  6. arXiv:2306.13575  [pdf, other

    cs.LG

    Scaling MLPs: A Tale of Inductive Bias

    Authors: Gregor Bachmann, Sotiris Anagnostidis, Thomas Hofmann

    Abstract: In this work we revisit the most fundamental building block in deep learning, the multi-layer perceptron (MLP), and study the limits of its performance on vision tasks. Empirical insights into MLPs are important for multiple reasons. (1) Given the recent narrative "less inductive bias is better", popularized due to transformers eclipsing convolutional models, it is natural to explore the limits of… ▽ More

    Submitted 3 October, 2023; v1 submitted 23 June, 2023; originally announced June 2023.

  7. arXiv:2306.02329  [pdf, other

    cs.CV

    Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes

    Authors: Alexandros Delitzas, Maria Parelli, Nikolas Hars, Georgios Vlassis, Sotirios Anagnostidis, Gregor Bachmann, Thomas Hofmann

    Abstract: Training models to apply common-sense linguistic knowledge and visual concepts from 2D images to 3D scene understanding is a promising direction that researchers have only recently started to explore. However, it still remains understudied whether 2D distilled knowledge can provide useful representations for downstream 3D vision-language tasks such as 3D question answering. In this paper, we propo… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

    Comments: The first two authors contributed equally. arXiv admin note: text overlap with arXiv:2304.06061

  8. arXiv:2305.15805  [pdf, other

    cs.CL cs.LG

    Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers

    Authors: Sotiris Anagnostidis, Dario Pavllo, Luca Biggio, Lorenzo Noci, Aurelien Lucchi, Thomas Hofmann

    Abstract: Autoregressive Transformers adopted in Large Language Models (LLMs) are hard to scale to long sequences. Despite several works trying to reduce their computational cost, most of LLMs still adopt attention layers between all pairs of tokens in the sequence, thus incurring a quadratic cost. In this study, we present a novel approach that dynamically prunes contextual information while preserving the… ▽ More

    Submitted 31 May, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

  9. arXiv:2304.07327  [pdf, other

    cs.CL cs.AI

    OpenAssistant Conversations -- Democratizing Large Language Model Alignment

    Authors: Andreas Köpf, Yannic Kilcher, Dimitri von Rütte, Sotiris Anagnostidis, Zhi-Rui Tam, Keith Stevens, Abdullah Barhoum, Nguyen Minh Duc, Oliver Stanley, Richárd Nagyfi, Shahul ES, Sameer Suri, David Glushkov, Arnav Dantuluri, Andrew Maguire, Christoph Schuhmann, Huu Nguyen, Alexander Mattick

    Abstract: Aligning large language models (LLMs) with human preferences has proven to drastically improve usability and has driven rapid adoption as demonstrated by ChatGPT. Alignment techniques such as supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) greatly reduce the required skill and domain knowledge to effectively harness the capabilities of LLMs, increasing their acce… ▽ More

    Submitted 31 October, 2023; v1 submitted 14 April, 2023; originally announced April 2023.

    Comments: Published in NeurIPS 2023 Datasets and Benchmarks

    Report number: V-02 ACM Class: I.2

  10. arXiv:2304.06061  [pdf, other

    cs.CV

    CLIP-Guided Vision-Language Pre-training for Question Answering in 3D Scenes

    Authors: Maria Parelli, Alexandros Delitzas, Nikolas Hars, Georgios Vlassis, Sotirios Anagnostidis, Gregor Bachmann, Thomas Hofmann

    Abstract: Training models to apply linguistic knowledge and visual concepts from 2D images to 3D world understanding is a promising direction that researchers have only recently started to explore. In this work, we design a novel 3D pre-training Vision-Language method that helps a model learn semantically meaningful and transferable 3D scene point cloud representations. We inject the representational power… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Comments: CVPRW 2023. Code will be made publicly available: https://github.com/AlexDelitzas/3D-VQA

  11. arXiv:2302.12091  [pdf, other

    cs.LG

    Random Teachers are Good Teachers

    Authors: Felix Sarnthein, Gregor Bachmann, Sotiris Anagnostidis, Thomas Hofmann

    Abstract: In this work, we investigate the implicit regularization induced by teacher-student learning dynamics in self-distillation. To isolate its effect, we describe a simple experiment where we consider teachers at random initialization instead of trained teachers. Surprisingly, when distilling a student into such a random teacher, we observe that the resulting model and its representations already poss… ▽ More

    Submitted 19 June, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

    Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR, volume 202 (2023), pages 30022-30041

  12. arXiv:2211.12346  [pdf, other

    astro-ph.CO cs.LG

    Cosmology from Galaxy Redshift Surveys with PointNet

    Authors: Sotiris Anagnostidis, Arne Thomsen, Tomasz Kacprzak, Tilman Tröster, Luca Biggio, Alexandre Refregier, Thomas Hofmann

    Abstract: In recent years, deep learning approaches have achieved state-of-the-art results in the analysis of point cloud data. In cosmology, galaxy redshift surveys resemble such a permutation invariant collection of positions in space. These surveys have so far mostly been analysed with two-point statistics, such as power spectra and correlation functions. The usage of these summary statistics is best jus… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

  13. arXiv:2210.14019  [pdf, other

    cs.LG

    The Curious Case of Benign Memorization

    Authors: Sotiris Anagnostidis, Gregor Bachmann, Lorenzo Noci, Thomas Hofmann

    Abstract: Despite the empirical advances of deep learning across a variety of learning tasks, our theoretical understanding of its success is still very restricted. One of the key challenges is the overparametrized nature of modern models, enabling complete overfitting of the data even if the labels are randomized, i.e. networks can completely \textit{memorize} all given patterns. While such a memorization… ▽ More

    Submitted 23 February, 2023; v1 submitted 25 October, 2022; originally announced October 2022.

  14. arXiv:2210.00828  [pdf, other

    cs.CV

    Mastering Spatial Graph Prediction of Road Networks

    Authors: Sotiris Anagnostidis, Aurelien Lucchi, Thomas Hofmann

    Abstract: Accurately predicting road networks from satellite images requires a global understanding of the network topology. We propose to capture such high-level information by introducing a graph-based framework that simulates the addition of sequences of graph edges using a reinforcement learning (RL) approach. In particular, given a partially generated graph associated with a satellite image, an RL agen… ▽ More

    Submitted 3 October, 2022; originally announced October 2022.

  15. arXiv:2206.03126  [pdf, other

    cs.LG

    Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse

    Authors: Lorenzo Noci, Sotiris Anagnostidis, Luca Biggio, Antonio Orvieto, Sidak Pal Singh, Aurelien Lucchi

    Abstract: Transformers have achieved remarkable success in several domains, ranging from natural language processing to computer vision. Nevertheless, it has been recently shown that stacking self-attention layers - the distinctive architectural component of Transformers - can result in rank collapse of the tokens' representations at initialization. The question of if and how rank collapse affects training… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

  16. arXiv:2102.11386  [pdf, other

    math.OC cs.LG

    Direct-Search for a Class of Stochastic Min-Max Problems

    Authors: Sotiris Anagnostidis, Aurelien Lucchi, Youssef Diouane

    Abstract: Recent applications in machine learning have renewed the interest of the community in min-max optimization problems. While gradient-based optimization methods are widely used to solve such problems, there are however many scenarios where these techniques are not well-suited, or even not applicable when the gradient is not accessible. We investigate the use of direct-search methods that belong to a… ▽ More

    Submitted 14 April, 2021; v1 submitted 22 February, 2021; originally announced February 2021.