Skip to main content

Showing 1–50 of 351 results for author: Schölkopf, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00529  [pdf, other

    cs.LG cs.SD eess.AS math.ST stat.ML

    Detecting and Identifying Selection Structure in Sequential Data

    Authors: Yujia Zheng, Zeyu Tang, Yiwen Qiu, Bernhard Schölkopf, Kun Zhang

    Abstract: We argue that the selective inclusion of data points based on latent objectives is common in practical situations, such as music sequences. Since this selection process often distorts statistical analysis, previous work primarily views it as a bias to be corrected and proposes various methods to mitigate its effect. However, while controlling this bias is crucial, selection also offers an opportun… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: ICML 2024

  2. arXiv:2406.19307  [pdf, other

    cs.CL

    The Odyssey of Commonsense Causality: From Foundational Benchmarks to Cutting-Edge Reasoning

    Authors: Shaobo Cui, Zhi**g **, Bernhard Schölkopf, Boi Faltings

    Abstract: Understanding commonsense causality is a unique mark of intelligence for humans. It helps people understand the principles of the real world better and benefits the decision-making process related to causation. For instance, commonsense causality is crucial in judging whether a defendant's action causes the plaintiff's loss in determining legal liability. Despite its significance, a systematic exp… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 42 pages

  3. arXiv:2406.19049  [pdf, other

    cs.LG cs.AI stat.ML

    Accuracy on the wrong line: On the pitfalls of noisy data for out-of-distribution generalisation

    Authors: Amartya Sanyal, Yaxi Hu, Yaodong Yu, Yian Ma, Yixin Wang, Bernhard Schölkopf

    Abstract: "Accuracy-on-the-line" is a widely observed phenomenon in machine learning, where a model's accuracy on in-distribution (ID) and out-of-distribution (OOD) data is positively correlated across different hyperparameters and data configurations. But when does this useful relationship break down? In this work, we explore its robustness. The key observation is that noisy data and the presence of nuisan… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  4. arXiv:2406.18450  [pdf, other

    cs.LG cs.AI

    Preference Elicitation for Offline Reinforcement Learning

    Authors: Alizée Pace, Bernhard Schölkopf, Gunnar Rätsch, Giorgia Ramponi

    Abstract: Applying reinforcement learning (RL) to real-world problems is often made challenging by the inability to interact with the environment and the difficulty of designing reward functions. Offline RL addresses the first challenge by considering access to an offline dataset of environment interactions labeled by the reward function. In contrast, Preference-based RL does not assume access to the reward… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  5. arXiv:2406.16300  [pdf, other

    cs.LG

    Landsca** Linear Mode Connectivity

    Authors: Sidak Pal Singh, Linara Adilova, Michael Kamp, Asja Fischer, Bernhard Schölkopf, Thomas Hofmann

    Abstract: The presence of linear paths in parameter space between two different network solutions in certain cases, i.e., linear mode connectivity (LMC), has garnered interest from both theoretical and practical fronts. There has been significant research that either practically designs algorithms catered for connecting networks by adjusting for the permutation symmetries as well as some others that more th… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: ICML 2024 HiLD workshop paper

  6. arXiv:2406.14302  [pdf, ps, other

    stat.ML cs.AI cs.LG

    Identifiable Exchangeable Mechanisms for Causal Structure and Representation Learning

    Authors: Patrik Reizinger, Siyuan Guo, Ferenc Huszár, Bernhard Schölkopf, Wieland Brendel

    Abstract: Identifying latent representations or causal structures is important for good generalization and downstream task performance. However, both fields have been developed rather independently. We observe that several methods in both representation and causal structure learning rely on the same data-generating process (DGP), namely, exchangeable but not i.i.d. (independent and identically distributed)… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  7. arXiv:2406.11601  [pdf, other

    cs.LG stat.ML

    Standardizing Structural Causal Models

    Authors: Weronika Ormaniec, Scott Sussex, Lars Lorch, Bernhard Schölkopf, Andreas Krause

    Abstract: Synthetic datasets generated by structural causal models (SCMs) are commonly used for benchmarking causal structure learning algorithms. However, the variances and pairwise correlations in SCM data tend to increase along the causal ordering. Several popular algorithms exploit these artifacts, possibly leading to conclusions that do not generalize to real-world settings. Existing metrics like… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  8. arXiv:2406.04344  [pdf, other

    cs.LG cs.CL cs.CV

    Verbalized Machine Learning: Revisiting Machine Learning with Language Models

    Authors: Tim Z. Xiao, Robert Bamler, Bernhard Schölkopf, Weiyang Liu

    Abstract: Motivated by the large progress made by large language models (LLMs), we introduce the framework of verbalized machine learning (VML). In contrast to conventional machine learning models that are typically optimized over a continuous parameter space, VML constrains the parameter space to be human-interpretable natural language. Such a constraint leads to a new perspective of function approximation… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Technical Report v1 (92 pages, 15 figures)

  9. arXiv:2406.02329  [pdf, other

    cs.CL cs.LG

    On Affine Homotopy between Language Encoders

    Authors: Robin SM Chan, Reda Boumasmoud, Anej Svete, Yuxin Ren, Qipeng Guo, Zhi**g **, Shauli Ravfogel, Mrinmaya Sachan, Bernhard Schölkopf, Mennatallah El-Assady, Ryan Cotterell

    Abstract: Pre-trained language encoders -- functions that represent text as vectors -- are an integral component of many NLP tasks. We tackle a natural question in language encoder analysis: What does it mean for two encoders to be similar? We contend that a faithful measure of similarity needs to be \emph{intrinsic}, that is, task-independent, yet still be informative of \emph{extrinsic} similarity -- the… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 10 pages

  10. arXiv:2405.20318  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    CausalQuest: Collecting Natural Causal Questions for AI Agents

    Authors: Roberto Ceraolo, Dmitrii Kharlapenko, Amélie Reymond, Rada Mihalcea, Mrinmaya Sachan, Bernhard Schölkopf, Zhi**g **

    Abstract: Humans have an innate drive to seek out causality. Whether fuelled by curiosity or specific goals, we constantly question why things happen, how they are interconnected, and many other related phenomena. To develop AI agents capable of addressing this natural human quest for causality, we urgently need a comprehensive dataset of natural causal questions. Unfortunately, existing datasets either con… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  11. arXiv:2405.18836  [pdf, other

    stat.ME cs.LG

    Do Finetti: On Causal Effects for Exchangeable Data

    Authors: Siyuan Guo, Chi Zhang, Karthika Mohan, Ferenc Huszár, Bernhard Schölkopf

    Abstract: We study causal effect estimation in a setting where the data are not i.i.d. (independent and identically distributed). We focus on exchangeable data satisfying an assumption of independent causal mechanisms. Traditional causal effect estimation frameworks, e.g., relying on structural causal models and do-calculus, are typically limited to i.i.d. data and do not extend to more general exchangeable… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  12. arXiv:2405.15485  [pdf, other

    cs.AI cs.CL cs.LG

    Learning Beyond Pattern Matching? Assaying Mathematical Understanding in LLMs

    Authors: Siyuan Guo, Aniket Didolkar, Nan Rosemary Ke, Anirudh Goyal, Ferenc Huszár, Bernhard Schölkopf

    Abstract: We are beginning to see progress in language model assisted scientific discovery. Motivated by the use of LLMs as a general scientific assistant, this paper assesses the domain knowledge of LLMs through its understanding of different mathematical skills required to solve problems. In particular, we look at not just what the pre-trained model already knows, but how it learned to learn from informat… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  13. arXiv:2405.14808  [pdf, other

    cs.CL cs.AI cs.CY cs.HC cs.LG

    Implicit Personalization in Language Models: A Systematic Study

    Authors: Zhi**g **, Nils Heil, Jiarui Liu, Shehzaad Dhuliawala, Yahang Qi, Bernhard Schölkopf, Rada Mihalcea, Mrinmaya Sachan

    Abstract: Implicit Personalization (IP) is a phenomenon of language models inferring a user's background from the implicit cues in the input prompts and tailoring the response based on this inference. While previous work has touched upon various instances of this problem, there lacks a unified framework to study this behavior. This work systematically studies IP through a rigorous mathematical formulation,… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  14. arXiv:2405.11633  [pdf, other

    cs.LG stat.ML

    Geometry-Aware Instrumental Variable Regression

    Authors: Heiner Kremer, Bernhard Schölkopf

    Abstract: Instrumental variable (IV) regression can be approached through its formulation in terms of conditional moment restrictions (CMR). Building on variants of the generalized method of moments, most CMR estimators are implicitly based on approximating the population data distribution via reweightings of the empirical sample. While for large sample sizes, in the independent identically distributed (IID… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  15. arXiv:2405.01502  [pdf, other

    cs.CL cs.AI cs.LG

    Analyzing the Role of Semantic Representations in the Era of Large Language Models

    Authors: Zhi**g **, Yuen Chen, Fernando Gonzalez, Jiarui Liu, Jiayi Zhang, Julian Michael, Bernhard Schölkopf, Mona Diab

    Abstract: Traditionally, natural language processing (NLP) models often use a rich set of features created by linguistic expertise, such as semantic representations. However, in the era of large language models (LLMs), more and more tasks are turned into generic, end-to-end sequence generation problems. In this paper, we investigate the question: what is the role of semantic representations in the era of LL… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: NAACL 2024

  16. arXiv:2404.16698  [pdf, other

    cs.CL

    Cooperate or Collapse: Emergence of Sustainability Behaviors in a Society of LLM Agents

    Authors: Giorgio Piatti, Zhi**g **, Max Kleiman-Weiner, Bernhard Schölkopf, Mrinmaya Sachan, Rada Mihalcea

    Abstract: As AI systems pervade human life, ensuring that large language models (LLMs) make safe decisions is a significant challenge. This paper introduces the Governance of the Commons Simulation (GovSim), a generative simulation platform designed to study strategic interactions and cooperative decision-making in LLMs. Using GovSim, we investigate the dynamics of sustainable resource sharing in a society… ▽ More

    Submitted 31 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Revised version

  17. arXiv:2404.15109  [pdf, other

    cs.LG

    Compete and Compose: Learning Independent Mechanisms for Modular World Models

    Authors: Anson Lei, Frederik Nolte, Bernhard Schölkopf, Ingmar Posner

    Abstract: We present COmpetitive Mechanisms for Efficient Transfer (COMET), a modular world model which leverages reusable, independent mechanisms across different environments. COMET is trained on multiple environments with varying dynamics via a two-step process: competition and composition. This enables the model to recognise and learn transferable mechanisms. Specifically, in the competition phase, COME… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  18. arXiv:2404.11055  [pdf, other

    cs.CL

    On the Causal Nature of Sentiment Analysis

    Authors: Zhiheng Lyu, Zhi**g **, Fernando Gonzalez, Rada Mihalcea, Bernhard Schoelkopf, Mrinmaya Sachan

    Abstract: Sentiment analysis (SA) aims to identify the sentiment expressed in a text, such as a product review. Given a review and the sentiment associated with it, this paper formulates SA as a combination of two tasks: (1) a causal discovery task that distinguishes whether a review "primes" the sentiment (Causal Hypothesis C1), or the sentiment "primes" the review (Causal Hypothesis C2); and (2) the tradi… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: An enhanced version of our previous exploration in arXiv:2305.01764

  19. arXiv:2403.19352  [pdf, other

    cs.CL

    A diverse Multilingual News Headlines Dataset from around the World

    Authors: Felix Leeb, Bernhard Schölkopf

    Abstract: Babel Briefings is a novel dataset featuring 4.7 million news headlines from August 2020 to November 2021, across 30 languages and 54 locations worldwide with English translations of all articles included. Designed for natural language processing and media studies, it serves as a high-quality dataset for training or evaluating language models as well as offering a simple, accessible collection of… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Published in NAACL 2024 Proceedings (Short Paper track)

  20. arXiv:2403.14443  [pdf, other

    cs.AI cs.CL cs.GT cs.LG cs.MA cs.SI

    Language Models Can Reduce Asymmetry in Information Markets

    Authors: Nasim Rahaman, Martin Weiss, Manuel Wüthrich, Yoshua Bengio, Li Erran Li, Chris Pal, Bernhard Schölkopf

    Abstract: This work addresses the buyer's inspection paradox for information markets. The paradox is that buyers need to access information to determine its value, while sellers need to limit access to prevent theft. To study this, we introduce an open-source simulated digital marketplace where intelligent agents, powered by language models, buy and sell information on behalf of external participants. The c… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  21. arXiv:2403.13041  [pdf, other

    cs.CR cs.AI cs.LG stat.ML

    Provable Privacy with Non-Private Pre-Processing

    Authors: Yaxi Hu, Amartya Sanyal, Bernhard Schölkopf

    Abstract: When analysing Differentially Private (DP) machine learning pipelines, the potential privacy cost of data-dependent pre-processing is frequently overlooked in privacy accounting. In this work, we propose a general framework to evaluate the additional privacy cost incurred by non-private data-dependent pre-processing algorithms. Our framework establishes upper bounds on the overall privacy guarante… ▽ More

    Submitted 21 June, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

  22. arXiv:2403.07379  [pdf, other

    cs.LG cs.CL stat.ML

    Hallmarks of Optimization Trajectories in Neural Networks: Directional Exploration and Redundancy

    Authors: Sidak Pal Singh, Bobby He, Thomas Hofmann, Bernhard Schölkopf

    Abstract: We propose a fresh take on understanding the mechanisms of neural networks by analyzing the rich directional structure of optimization trajectories, represented by their pointwise parameters. Towards this end, we introduce some natural notions of the complexity of optimization trajectories, both qualitative and quantitative, which hallmark the directional nature of optimization in neural networks:… ▽ More

    Submitted 24 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: Preprint, 57 pages

  23. arXiv:2403.03639  [pdf, other

    cs.RO

    Efficient Search and Learning for Agile Locomotion on Step** Stones

    Authors: Adithya Kumar Chinnakkonda Ravi, Victor Dhédin, Armand Jordana, Huaijiang Zhu, Avadesh Meduri, Ludovic Righetti, Bernhard Schölkopf, Majid Khadiv

    Abstract: Legged robots have become capable of performing highly dynamic maneuvers in the past few years. However, agile locomotion in highly constrained environments such as step** stones is still a challenge. In this paper, we propose a combination of model-based control, search, and learning to design efficient control policies for agile locomotion on step** stones. In our framework, we use nonlinear… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  24. arXiv:2402.12874  [pdf, other

    cs.LG

    Skill or Luck? Return Decomposition via Advantage Functions

    Authors: Hsiao-Ru Pan, Bernhard Schölkopf

    Abstract: Learning from off-policy data is essential for sample-efficient reinforcement learning. In the present work, we build on the insight that the advantage function can be understood as the causal effect of an action on the return, and show that this allows us to decompose the return of a trajectory into parts caused by the agent's actions (skill) and parts outside of the agent's control (luck). Furth… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: ICLR 2024

  25. arXiv:2402.11655  [pdf, other

    cs.CL

    Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals

    Authors: Francesco Ortu, Zhi**g **, Diego Doimo, Mrinmaya Sachan, Alberto Cazzaniga, Bernhard Schölkopf

    Abstract: Interpretability research aims to bridge the gap between empirical success and our scientific understanding of the inner workings of large language models (LLMs). However, most existing research focuses on analyzing a single mechanism, such as how models copy or recall factual knowledge. In this work, we propose a formulation of competition of mechanisms, which focuses on the interplay of multiple… ▽ More

    Submitted 6 June, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

    Comments: ACL 2024

  26. arXiv:2402.09236  [pdf, other

    cs.LG cs.AI math.ST stat.ML

    Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models

    Authors: Goutham Rajendran, Simon Buchholz, Bryon Aragam, Bernhard Schölkopf, Pradeep Ravikumar

    Abstract: To build intelligent machine learning systems, there are two broad approaches. One approach is to build inherently interpretable models, as endeavored by the growing field of causal representation learning. The other approach is to build highly-performant foundation models and then invest efforts into understanding how they work. In this work, we relate these two approaches and study how to learn… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: 36 pages

  27. arXiv:2402.06665  [pdf, other

    cs.AI cs.CL cs.LG cs.RO

    The Essential Role of Causality in Foundation World Models for Embodied AI

    Authors: Tarun Gupta, Wenbo Gong, Chao Ma, Nick Pawlowski, Agrin Hilmkil, Meyer Scetbon, Marc Rigter, Ade Famoti, Ashley Juan Llorens, Jianfeng Gao, Stefan Bauer, Danica Kragic, Bernhard Schölkopf, Cheng Zhang

    Abstract: Recent advances in foundation models, especially in large multi-modal models and conversational agents, have ignited interest in the potential of generally capable embodied agents. Such agents will require the ability to perform new tasks in many different real-world environments. However, current foundation models fail to accurately model physical interactions and are therefore insufficient for E… ▽ More

    Submitted 29 April, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  28. arXiv:2402.05785  [pdf, other

    cs.LG cs.AI cs.CL

    Limits of Transformer Language Models on Learning to Compose Algorithms

    Authors: Jonathan Thomm, Aleksandar Terzic, Giacomo Camposampiero, Michael Hersche, Bernhard Schölkopf, Abbas Rahimi

    Abstract: We analyze the capabilities of Transformer language models in learning compositional discrete tasks. To this end, we evaluate training LLaMA models and prompting GPT-4 and Gemini on four tasks demanding to learn a composition of several discrete sub-tasks. On both training LLaMA models from scratch and prompting on GPT-4 and Gemini, we measure how well these models can reuse primitives observable… ▽ More

    Submitted 25 May, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

  29. arXiv:2402.01988  [pdf, other

    cs.ET physics.optics

    Low-power scalable multilayer optoelectronic neural networks enabled with incoherent light

    Authors: Alexander Song, Sai Nikhilesh Murty Kottapalli, Rahul Goyal, Bernhard Schölkopf, Peer Fischer

    Abstract: Optical approaches have made great strides towards the goal of high-speed, energy-efficient computing necessary for modern deep learning and AI applications. Read-in and read-out of data, however, limit the overall performance of existing approaches. This study introduces a multilayer optoelectronic computing framework that alternates between optical and optoelectronic layers to implement matrix-v… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  30. arXiv:2402.01399  [pdf, other

    cs.LG cs.AI stat.ML

    A Probabilistic Model behind Self-Supervised Learning

    Authors: Alice Bizeul, Bernhard Schölkopf, Carl Allen

    Abstract: In self-supervised learning (SSL), representations are learned via an auxiliary task without annotated labels. A common task is to classify augmentations or different modalities of the data, which share semantic content (e.g. an object in an image) but differ in style (e.g. the object's location). Many approaches to self-supervised learning have been proposed, e.g. SimCLR, CLIP, and VicREG, which… ▽ More

    Submitted 4 June, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  31. arXiv:2401.18070  [pdf, other

    cs.CL cs.AI cs.LG

    Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners?

    Authors: Andreas Opedal, Alessandro Stolfo, Haruki Shirakami, Ying Jiao, Ryan Cotterell, Bernhard Schölkopf, Abulhair Saparov, Mrinmaya Sachan

    Abstract: There is increasing interest in employing large language models (LLMs) as cognitive models. For such purposes, it is central to understand which properties of human cognition are well-modeled by LLMs, and which are not. In this work, we study the biases of LLMs in relation to those known in children when solving arithmetic word problems. Surveying the learning science literature, we posit that the… ▽ More

    Submitted 17 June, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

    Comments: Accepted at ICML 2024

  32. arXiv:2401.06604  [pdf, other

    cs.LG

    Identifying Policy Gradient Subspaces

    Authors: Jan Schneider, Pierre Schumacher, Simon Guist, Le Chen, Daniel Häufle, Bernhard Schölkopf, Dieter Büchler

    Abstract: Policy gradient methods hold great potential for solving complex continuous control tasks. Still, their training efficiency can be improved by exploiting structure within the optimization problem. Recent work indicates that supervised learning can be accelerated by leveraging the fact that gradients lie in a low-dimensional and slowly-changing subspace. In this paper, we conduct a thorough evaluat… ▽ More

    Submitted 18 March, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: Published as conference paper at ICLR 2024

    ACM Class: I.2.6

  33. arXiv:2401.06035  [pdf, other

    cs.CV cs.LG

    RAVEN: Rethinking Adversarial Video Generation with Efficient Tri-plane Networks

    Authors: Partha Ghosh, Soubhik Sanyal, Cordelia Schmid, Bernhard Schölkopf

    Abstract: We present a novel unconditional video generative model designed to address long-term spatial and temporal dependencies. To capture these dependencies, our approach incorporates a hybrid explicit-implicit tri-plane representation inspired by 3D-aware generative frameworks developed for three-dimensional object representation and employs a singular latent code to model an entire video sequence. Ind… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  34. arXiv:2312.13438  [pdf, ps, other

    stat.ML cs.LG

    Independent Mechanism Analysis and the Manifold Hypothesis

    Authors: Shubhangi Ghosh, Luigi Gresele, Julius von Kügelgen, Michel Besserve, Bernhard Schölkopf

    Abstract: Independent Mechanism Analysis (IMA) seeks to address non-identifiability in nonlinear Independent Component Analysis (ICA) by assuming that the Jacobian of the mixing function has orthogonal columns. As typical in ICA, previous work focused on the case with an equal number of latent components and observed mixtures. Here, we extend IMA to settings with a larger number of mixtures that reside on a… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: 6 pages, Accepted at Neurips Causal Representation Learning 2023

  35. arXiv:2312.08295  [pdf, other

    astro-ph.IM astro-ph.EP cs.LG

    Inferring Atmospheric Properties of Exoplanets with Flow Matching and Neural Importance Sampling

    Authors: Timothy D. Gebhard, Jonas Wildberger, Maximilian Dax, Daniel Angerhausen, Sascha P. Quanz, Bernhard Schölkopf

    Abstract: Atmospheric retrievals (AR) characterize exoplanets by estimating atmospheric parameters from observed light spectra, typically by framing the task as a Bayesian inference problem. However, traditional approaches such as nested sampling are computationally expensive, thus sparking an interest in solutions based on machine learning (ML). In this ongoing work, we first explore flow matching posterio… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: Accepted at the "AI to Accelerate Science and Engineering (AI2ASE)" workshop at AAAI 2024

  36. arXiv:2312.04350  [pdf, other

    cs.CL cs.AI cs.LG

    CLadder: Assessing Causal Reasoning in Language Models

    Authors: Zhi**g **, Yuen Chen, Felix Leeb, Luigi Gresele, Ojasv Kamal, Zhiheng Lyu, Kevin Blin, Fernando Gonzalez Adauto, Max Kleiman-Weiner, Mrinmaya Sachan, Bernhard Schölkopf

    Abstract: The ability to perform causal reasoning is widely considered a core feature of intelligence. In this work, we investigate whether large language models (LLMs) can coherently reason about causality. Much of the existing work in natural language processing (NLP) focuses on evaluating commonsense causal reasoning in LLMs, thus failing to assess whether a model can perform causal inference in accordan… ▽ More

    Submitted 17 January, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: NeurIPS 2023; updated with CLadder dataset v1.5

  37. arXiv:2312.00093  [pdf, other

    cs.CV cs.GR cs.LG

    GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs

    Authors: Gege Gao, Weiyang Liu, Anpei Chen, Andreas Geiger, Bernhard Schölkopf

    Abstract: As pretrained text-to-image diffusion models become increasingly powerful, recent efforts have been made to distill knowledge from these text-to-image pretrained models for optimizing a text-guided 3D model. Most of the existing methods generate a holistic 3D model from a plain text input. This can be problematic when the text describes a complex scene with multiple objects, because the vectorized… ▽ More

    Submitted 10 June, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

    Comments: CVPR 2024 (18 pages, 11 figures, https://graphdreamer.github.io/)

  38. arXiv:2311.18639  [pdf, other

    stat.ML cs.LG

    Targeted Reduction of Causal Models

    Authors: Armin Kekić, Bernhard Schölkopf, Michel Besserve

    Abstract: Why does a phenomenon occur? Addressing this question is central to most scientific inquiries and often relies on simulations of scientific models. As models become more intricate, deciphering the causes behind phenomena in high-dimensional spaces of interconnected variables becomes increasingly challenging. Causal Representation Learning (CRL) offers a promising avenue to uncover interpretable ca… ▽ More

    Submitted 3 June, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

  39. arXiv:2311.08815  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Self-Supervised Disentanglement by Leveraging Structure in Data Augmentations

    Authors: Cian Eastwood, Julius von Kügelgen, Linus Ericsson, Diane Bouchacourt, Pascal Vincent, Bernhard Schölkopf, Mark Ibrahim

    Abstract: Self-supervised representation learning often uses data augmentations to induce some invariance to "style" attributes of the data. However, with downstream tasks generally unknown at training time, it is difficult to deduce a priori which attributes of the data are indeed "style" and can be safely discarded. To address this, we introduce a more principled approach that seeks to disentangle style f… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  40. arXiv:2311.08605  [pdf, other

    cs.CL cs.AI cs.CY cs.SI

    Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis

    Authors: David F. Jenny, Yann Billeter, Mrinmaya Sachan, Bernhard Schölkopf, Zhi**g **

    Abstract: The rapid advancement of Large Language Models (LLMs) has sparked intense debate regarding the prevalence of bias in these models and its mitigation. Yet, as exemplified by both results on debiasing methods in the literature and reports of alignment-related defects from the wider community, bias remains a poorly understood topic despite its practical relevance. To enhance the understanding of the… ▽ More

    Submitted 12 May, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

  41. arXiv:2311.06243  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization

    Authors: Weiyang Liu, Zeju Qiu, Yao Feng, Yuliang Xiu, Yuxuan Xue, Longhui Yu, Haiwen Feng, Zhen Liu, Juyeon Heo, Songyou Peng, Yandong Wen, Michael J. Black, Adrian Weller, Bernhard Schölkopf

    Abstract: Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly larg… ▽ More

    Submitted 28 April, 2024; v1 submitted 10 November, 2023; originally announced November 2023.

    Comments: ICLR 2024 (v2: 34 pages, 19 figures)

  42. arXiv:2311.02790  [pdf, other

    cs.CL cs.AI cs.CY cs.IR cs.LG

    CausalCite: A Causal Formulation of Paper Citations

    Authors: Ishan Kumar, Zhi**g **, Ehsan Mokhtarian, Siyuan Guo, Yuen Chen, Mrinmaya Sachan, Bernhard Schölkopf

    Abstract: Citation count of a paper is a commonly used proxy for evaluating the significance of a paper in the scientific community. Yet citation measures are widely criticized for failing to accurately reflect the true impact of a paper. Thus, we propose CausalCite, a new way to measure the significance of a paper by assessing the causal impact of the paper on its follow-up papers. CausalCite is based on a… ▽ More

    Submitted 27 May, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

    Comments: ACL 2024 Findings

  43. arXiv:2310.17405  [pdf, other

    cs.LG

    Causal Modeling with Stationary Diffusions

    Authors: Lars Lorch, Andreas Krause, Bernhard Schölkopf

    Abstract: We develop a novel approach towards causal inference. Rather than structural equations over a causal graph, we learn stochastic differential equations (SDEs) whose stationary densities model a system's behavior under interventions. These stationary diffusion models do not require the formalism of causal graphs, let alone the common assumption of acyclicity. We show that in several cases, they gene… ▽ More

    Submitted 16 March, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: AISTATS 2024

  44. arXiv:2310.15168  [pdf, other

    cs.CV cs.GR cs.LG

    Ghost on the Shell: An Expressive Representation of General 3D Shapes

    Authors: Zhen Liu, Yao Feng, Yuliang Xiu, Weiyang Liu, Liam Paull, Michael J. Black, Bernhard Schölkopf

    Abstract: The creation of photorealistic virtual worlds requires the accurate modeling of 3D surface geometry for a wide range of objects. For this, meshes are appealing since they 1) enable fast physics-based rendering with realistic material and lighting, 2) support physical simulation, and 3) are memory-efficient for modern graphics pipelines. Recent work on reconstructing and statistically modeling 3D s… ▽ More

    Submitted 24 March, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: ICLR 2024 Oral (v3: 30 pages, 19 figures, Project Page: https://gshell3d.github.io/)

  45. arXiv:2310.09449  [pdf, other

    cs.CV cs.LG

    Pairwise Similarity Learning is SimPLE

    Authors: Yandong Wen, Weiyang Liu, Yao Feng, Bhiksha Raj, Rita Singh, Adrian Weller, Michael J. Black, Bernhard Schölkopf

    Abstract: In this paper, we focus on a general yet important learning problem, pairwise similarity learning (PSL). PSL subsumes a wide range of important applications, such as open-set face recognition, speaker verification, image retrieval and person re-identification. The goal of PSL is to learn a pairwise similarity function assigning a higher similarity score to positive pairs (i.e., a pair of samples w… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

    Comments: Published in ICCV 2023 (Project page: https://simple.is.tue.mpg.de/)

  46. arXiv:2310.08864  [pdf, other

    cs.RO

    Open X-Embodiment: Robotic Learning Datasets and RT-X Models

    Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, A**kya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (267 additional authors not shown)

    Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More

    Submitted 1 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Project website: https://robotics-transformer-x.github.io

  47. arXiv:2310.07665  [pdf, other

    cs.AI cs.LG stat.ML

    Deep Backtracking Counterfactuals for Causally Compliant Explanations

    Authors: Klaus-Rudolf Kladny, Julius von Kügelgen, Bernhard Schölkopf, Michael Muehlebach

    Abstract: Counterfactuals answer questions of what would have been observed under altered circumstances and can therefore offer valuable insights. Whereas the classical interventional interpretation of counterfactuals has been studied extensively, backtracking constitutes a less studied alternative where all causal laws are kept intact. In the present work, we introduce a practical method called deep backtr… ▽ More

    Submitted 9 February, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

  48. arXiv:2310.01425  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Borges and AI

    Authors: Léon Bottou, Bernhard Schölkopf

    Abstract: Many believe that Large Language Models (LLMs) open the era of Artificial Intelligence (AI). Some see opportunities while others see dangers. Yet both proponents and opponents grasp AI through the imagery popularised by science fiction. Will the machine become sentient and rebel against its creators? Will we experience a paperclip apocalypse? Before answering such questions, we should first ask wh… ▽ More

    Submitted 4 October, 2023; v1 submitted 27 September, 2023; originally announced October 2023.

  49. arXiv:2309.06921  [pdf, other

    cs.LG

    Investigating the Impact of Action Representations in Policy Gradient Algorithms

    Authors: Jan Schneider, Pierre Schumacher, Daniel Häufle, Bernhard Schölkopf, Dieter Büchler

    Abstract: Reinforcement learning~(RL) is a versatile framework for learning to solve complex real-world tasks. However, influences on the learning performance of RL algorithms are often poorly understood in practice. We discuss different analysis techniques and assess their effectiveness for investigating the impact of action representations in RL. Our experiments demonstrate that the action representation… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: Published at the Workshop on effective Representations, Abstractions, and Priors for Robot Learning (RAP4Robots) at ICRA 2023

  50. arXiv:2309.03075  [pdf, other

    astro-ph.IM astro-ph.EP cs.LG

    Parameterizing pressure-temperature profiles of exoplanet atmospheres with neural networks

    Authors: Timothy D. Gebhard, Daniel Angerhausen, Björn S. Konrad, Eleonora Alei, Sascha P. Quanz, Bernhard Schölkopf

    Abstract: Atmospheric retrievals (AR) of exoplanets typically rely on a combination of a Bayesian inference technique and a forward simulator to estimate atmospheric properties from an observed spectrum. A key component in simulating spectra is the pressure-temperature (PT) profile, which describes the thermal structure of the atmosphere. Current AR pipelines commonly use ad hoc fitting functions here that… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

    Comments: Accepted for publication in Astronomy & Astrophysics

    Journal ref: A&A 681, A3 (2024)