Skip to main content

Showing 1–11 of 11 results for author: Deiseroth, B

.
  1. arXiv:2406.19223  [pdf, other

    cs.CL cs.AI cs.LG

    T-FREE: Tokenizer-Free Generative LLMs via Sparse Representations for Memory-Efficient Embeddings

    Authors: Björn Deiseroth, Manuel Brack, Patrick Schramowski, Kristian Kersting, Samuel Weinbach

    Abstract: Tokenizers are crucial for encoding information in Large Language Models, but their development has recently stagnated, and they contain inherent weaknesses. Major limitations include computational overhead, ineffective vocabulary use, and unnecessarily large embedding and head layers. Additionally, their performance is biased towards a reference corpus, leading to reduced effectiveness for underr… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2403.17844  [pdf, other

    cs.LG

    Mechanistic Design and Scaling of Hybrid Architectures

    Authors: Michael Poli, Armin W Thomas, Eric Nguyen, Pragaash Ponnusamy, Björn Deiseroth, Kristian Kersting, Taiji Suzuki, Brian Hie, Stefano Ermon, Christopher Ré, Ce Zhang, Stefano Massaroli

    Abstract: The development of deep learning architectures is a resource-demanding process, due to a vast design space, long prototy** times, and high compute costs associated with at-scale model training and evaluation. We set out to simplify this process by grounding it in an end-to-end mechanistic architecture design (MAD) pipeline, encompassing small-scale capability unit tests predictive of scaling law… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  3. arXiv:2311.01544  [pdf, other

    cs.CL cs.LG

    Divergent Token Metrics: Measuring degradation to prune away LLM components -- and optimize quantization

    Authors: Björn Deiseroth, Max Meuer, Nikolas Gritsch, Constantin Eichenberg, Patrick Schramowski, Matthias Aßenmacher, Kristian Kersting

    Abstract: Large Language Models (LLMs) have reshaped natural language processing with their impressive capabilities. However, their ever-increasing size has raised concerns about their effective deployment and the need for LLM compression. This study introduces the Divergent Token Metrics (DTMs), a novel approach to assessing compressed LLMs, addressing the limitations of traditional perplexity or accuracy… ▽ More

    Submitted 3 April, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

  4. arXiv:2305.15296  [pdf, other

    cs.CV cs.AI cs.LG

    MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation

    Authors: Marco Bellagente, Manuel Brack, Hannah Teufel, Felix Friedrich, Björn Deiseroth, Constantin Eichenberg, Andrew Dai, Robert Baldock, Souradeep Nanda, Koen Oostermeijer, Andres Felipe Cruz-Salinas, Patrick Schramowski, Kristian Kersting, Samuel Weinbach

    Abstract: The recent popularity of text-to-image diffusion models (DM) can largely be attributed to the intuitive interface they provide to users. The intended generation can be expressed in natural language, with the model producing faithful interpretations of text prompts. However, expressing complex or nuanced ideas in text alone can be difficult. To ease image generation, we propose MultiFusion that all… ▽ More

    Submitted 20 December, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Proceedings of Advances in Neural Information Processing Systems: Annual Conference on Neural Information Processing Systems (NeurIPS)

  5. arXiv:2301.08110  [pdf, other

    cs.LG cs.AI

    AtMan: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation

    Authors: Björn Deiseroth, Mayukh Deb, Samuel Weinbach, Manuel Brack, Patrick Schramowski, Kristian Kersting

    Abstract: Generative transformer models have become increasingly complex, with large numbers of parameters and the ability to process multiple input modalities. Current methods for explaining their predictions are resource-intensive. Most crucially, they require prohibitively large amounts of extra memory, since they rely on backpropagation which allocates almost twice as much GPU memory as the forward pass… ▽ More

    Submitted 5 November, 2023; v1 submitted 19 January, 2023; originally announced January 2023.

  6. arXiv:2212.02936  [pdf, other

    cs.CV

    M-VADER: A Model for Diffusion with Multimodal Context

    Authors: Samuel Weinbach, Marco Bellagente, Constantin Eichenberg, Andrew Dai, Robert Baldock, Souradeep Nanda, Björn Deiseroth, Koen Oostermeijer, Hannah Teufel, Andres Felipe Cruz-Salinas

    Abstract: We introduce M-VADER: a diffusion model (DM) for image generation where the output can be specified using arbitrary combinations of images and text. We show how M-VADER enables the generation of images specified using combinations of image and text, and combinations of multiple images. Previously, a number of successful DM image generation algorithms have been introduced that make it possible to s… ▽ More

    Submitted 7 December, 2022; v1 submitted 6 December, 2022; originally announced December 2022.

    Comments: 22 pages, 14 figures, 2 tables, fixed figure 3

  7. arXiv:2211.07733  [pdf, other

    cs.CL

    Speaking Multiple Languages Affects the Moral Bias of Language Models

    Authors: Katharina Hämmerl, Björn Deiseroth, Patrick Schramowski, **dřich Libovický, Constantin A. Rothkopf, Alexander Fraser, Kristian Kersting

    Abstract: Pre-trained multilingual language models (PMLMs) are commonly used when dealing with data from multiple languages and cross-lingual transfer. However, PMLMs are trained on varying amounts of data for each language. In practice this means their performance is often much better on English than many other languages. We explore to what extent this also applies to moral norms. Do the models capture mor… ▽ More

    Submitted 1 June, 2023; v1 submitted 14 November, 2022; originally announced November 2022.

    Comments: To appear in ACL Findings 2023

  8. arXiv:2211.05105  [pdf, other

    cs.CV cs.AI cs.LG

    Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models

    Authors: Patrick Schramowski, Manuel Brack, Björn Deiseroth, Kristian Kersting

    Abstract: Text-conditioned image generation models have recently achieved astonishing results in image quality and text alignment and are consequently employed in a fast-growing number of applications. Since they are highly data-driven, relying on billion-sized datasets randomly scraped from the internet, they also suffer, as we demonstrate, from degenerated and biased human behavior. In turn, they may even… ▽ More

    Submitted 26 April, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

    Comments: Proceedings of the 22nd IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

  9. arXiv:2208.13518  [pdf, other

    cs.AI cs.CL cs.CV cs.LO cs.SC

    LogicRank: Logic Induced Reranking for Generative Text-to-Image Systems

    Authors: Björn Deiseroth, Patrick Schramowski, Hikaru Shindo, Devendra Singh Dhami, Kristian Kersting

    Abstract: Text-to-image models have recently achieved remarkable success with seemingly accurate samples in photo-realistic quality. However as state-of-the-art language models still struggle evaluating precise statements consistently, so do language model based image generation processes. In this work we showcase problems of state-of-the-art text-to-image models like DALL-E with generating accurate samples… ▽ More

    Submitted 29 August, 2022; originally announced August 2022.

  10. arXiv:2208.08241  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.HC

    ILLUME: Rationalizing Vision-Language Models through Human Interactions

    Authors: Manuel Brack, Patrick Schramowski, Björn Deiseroth, Kristian Kersting

    Abstract: Bootstrap** from pre-trained language models has been proven to be an efficient approach for building vision-language models (VLM) for tasks such as image captioning or visual question answering. However, outputs of these models rarely align with user's rationales for specific answers. In order to improve this alignment and reinforce commonsense reasons, we propose a tuning paradigm based on hum… ▽ More

    Submitted 31 May, 2023; v1 submitted 17 August, 2022; originally announced August 2022.

    Comments: Proceedings of the 40th International Conference on Machine Learning (ICML), 2023

  11. arXiv:2203.09904  [pdf, ps, other

    cs.CL

    Do Multilingual Language Models Capture Differing Moral Norms?

    Authors: Katharina Hämmerl, Björn Deiseroth, Patrick Schramowski, **dřich Libovický, Alexander Fraser, Kristian Kersting

    Abstract: Massively multilingual sentence representations are trained on large corpora of uncurated data, with a very imbalanced proportion of languages included in the training. This may cause the models to grasp cultural values including moral judgments from the high-resource languages and impose them on the low-resource languages. The lack of data in certain languages can also lead to develo** random a… ▽ More

    Submitted 18 March, 2022; originally announced March 2022.