Search | arXiv e-print repository

Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models

Authors: Goutham Rajendran, Simon Buchholz, Bryon Aragam, Bernhard Schölkopf, Pradeep Ravikumar

Abstract: To build intelligent machine learning systems, there are two broad approaches. One approach is to build inherently interpretable models, as endeavored by the growing field of causal representation learning. The other approach is to build highly-performant foundation models and then invest efforts into understanding how they work. In this work, we relate these two approaches and study how to learn… ▽ More To build intelligent machine learning systems, there are two broad approaches. One approach is to build inherently interpretable models, as endeavored by the growing field of causal representation learning. The other approach is to build highly-performant foundation models and then invest efforts into understanding how they work. In this work, we relate these two approaches and study how to learn human-interpretable concepts from data. Weaving together ideas from both fields, we formally define a notion of concepts and show that they can be provably recovered from diverse data. Experiments on synthetic data and large language models show the utility of our unified approach. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: 36 pages

arXiv:2306.02235 [pdf, other]

Learning Linear Causal Representations from Interventions under General Nonlinear Mixing

Authors: Simon Buchholz, Goutham Rajendran, Elan Rosenfeld, Bryon Aragam, Bernhard Schölkopf, Pradeep Ravikumar

Abstract: We study the problem of learning causal representations from unknown, latent interventions in a general setting, where the latent distribution is Gaussian but the mixing function is completely general. We prove strong identifiability results given unknown single-node interventions, i.e., without having access to the intervention targets. This generalizes prior works which have focused on weaker cl… ▽ More We study the problem of learning causal representations from unknown, latent interventions in a general setting, where the latent distribution is Gaussian but the mixing function is completely general. We prove strong identifiability results given unknown single-node interventions, i.e., without having access to the intervention targets. This generalizes prior works which have focused on weaker classes, such as linear maps or paired counterfactual data. This is also the first instance of causal identifiability from non-paired interventions for deep neural network embeddings. Our proof relies on carefully uncovering the high-dimensional geometric structure present in the data distribution after a non-linear density transformation, which we capture by analyzing quadratic forms of precision matrices of the latent distributions. Finally, we propose a contrastive algorithm to identify the latent variables in practice and evaluate its performance on various tasks. △ Less

Submitted 18 December, 2023; v1 submitted 3 June, 2023; originally announced June 2023.

Comments: Accepted as Oral paper at NeurIPS 2023

arXiv:2305.17225 [pdf, other]

Causal Component Analysis

Authors: Liang Wendong, Armin Kekić, Julius von Kügelgen, Simon Buchholz, Michel Besserve, Luigi Gresele, Bernhard Schölkopf

Abstract: Independent Component Analysis (ICA) aims to recover independent latent variables from observed mixtures thereof. Causal Representation Learning (CRL) aims instead to infer causally related (thus often statistically dependent) latent variables, together with the unknown graph encoding their causal relationships. We introduce an intermediate problem termed Causal Component Analysis (CauCA). CauCA c… ▽ More Independent Component Analysis (ICA) aims to recover independent latent variables from observed mixtures thereof. Causal Representation Learning (CRL) aims instead to infer causally related (thus often statistically dependent) latent variables, together with the unknown graph encoding their causal relationships. We introduce an intermediate problem termed Causal Component Analysis (CauCA). CauCA can be viewed as a generalization of ICA, modelling the causal dependence among the latent components, and as a special case of CRL. In contrast to CRL, it presupposes knowledge of the causal graph, focusing solely on learning the unmixing function and the causal mechanisms. Any impossibility results regarding the recovery of the ground truth in CauCA also apply for CRL, while possibility results may serve as a step** stone for extensions to CRL. We characterize CauCA identifiability from multiple datasets generated through different types of interventions on the latent causal variables. As a corollary, this interventional perspective also leads to new identifiability results for nonlinear ICA -- a special case of CauCA with an empty graph -- requiring strictly fewer datasets than previous results. We introduce a likelihood-based approach using normalizing flows to estimate both the unmixing function and the causal mechanisms, and demonstrate its effectiveness through extensive synthetic experiments in the CauCA and ICA setting. △ Less

Submitted 17 January, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

Comments: NeurIPS 2023 final camera-ready version

arXiv:2305.17161 [pdf, other]

Flow Matching for Scalable Simulation-Based Inference

Authors: Maximilian Dax, Jonas Wildberger, Simon Buchholz, Stephen R. Green, Jakob H. Macke, Bernhard Schölkopf

Abstract: Neural posterior estimation methods based on discrete normalizing flows have become established tools for simulation-based inference (SBI), but scaling them to high-dimensional problems can be challenging. Building on recent advances in generative modeling, we here present flow matching posterior estimation (FMPE), a technique for SBI using continuous normalizing flows. Like diffusion models, and… ▽ More Neural posterior estimation methods based on discrete normalizing flows have become established tools for simulation-based inference (SBI), but scaling them to high-dimensional problems can be challenging. Building on recent advances in generative modeling, we here present flow matching posterior estimation (FMPE), a technique for SBI using continuous normalizing flows. Like diffusion models, and in contrast to discrete flows, flow matching allows for unconstrained architectures, providing enhanced flexibility for complex data modalities. Flow matching, therefore, enables exact density evaluation, fast training, and seamless scalability to large architectures--making it ideal for SBI. We show that FMPE achieves competitive performance on an established SBI benchmark, and then demonstrate its improved scalability on a challenging scientific problem: for gravitational-wave inference, FMPE outperforms methods based on comparable discrete flows, reducing training time by 30% with substantially improved accuracy. Our work underscores the potential of FMPE to enhance performance in challenging inference scenarios, thereby paving the way for more advanced applications to scientific problems. △ Less

Submitted 27 October, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

Comments: NeurIPS 2023. Code available at https://github.com/dingo-gw/flow-matching-posterior-estimation

arXiv:2305.17139 [pdf, other]

A Measure-Theoretic Axiomatisation of Causality

Authors: Junhyung Park, Simon Buchholz, Bernhard Schölkopf, Krikamol Muandet

Abstract: Causality is a central concept in a wide range of research areas, yet there is still no universally agreed axiomatisation of causality. We view causality both as an extension of probability theory and as a study of \textit{what happens when one intervenes on a system}, and argue in favour of taking Kolmogorov's measure-theoretic axiomatisation of probability as the starting point towards an axioma… ▽ More Causality is a central concept in a wide range of research areas, yet there is still no universally agreed axiomatisation of causality. We view causality both as an extension of probability theory and as a study of \textit{what happens when one intervenes on a system}, and argue in favour of taking Kolmogorov's measure-theoretic axiomatisation of probability as the starting point towards an axiomatisation of causality. To that end, we propose the notion of a \textit{causal space}, consisting of a probability space along with a collection of transition probability kernels, called \textit{causal kernels}, that encode the causal information of the space. Our proposed framework is not only rigorously grounded in measure theory, but it also sheds light on long-standing limitations of existing frameworks including, for example, cycles, latent variables and stochastic processes. △ Less

Submitted 6 June, 2024; v1 submitted 19 May, 2023; originally announced May 2023.

arXiv:2208.06406 [pdf, other]

Function Classes for Identifiable Nonlinear Independent Component Analysis

Authors: Simon Buchholz, Michel Besserve, Bernhard Schölkopf

Abstract: Unsupervised learning of latent variable models (LVMs) is widely used to represent data in machine learning. When such models reflect the ground truth factors and the mechanisms map** them to observations, there is reason to expect that they allow generalization in downstream tasks. It is however well known that such identifiability guaranties are typically not achievable without putting constra… ▽ More Unsupervised learning of latent variable models (LVMs) is widely used to represent data in machine learning. When such models reflect the ground truth factors and the mechanisms map** them to observations, there is reason to expect that they allow generalization in downstream tasks. It is however well known that such identifiability guaranties are typically not achievable without putting constraints on the model class. This is notably the case for nonlinear Independent Component Analysis, in which the LVM maps statistically independent variables to observations via a deterministic nonlinear function. Several families of spurious solutions fitting perfectly the data, but that do not correspond to the ground truth factors can be constructed in generic settings. However, recent work suggests that constraining the function class of such models may promote identifiability. Specifically, function classes with constraints on their partial derivatives, gathered in the Jacobian matrix, have been proposed, such as orthogonal coordinate transformations (OCT), which impose orthogonality of the Jacobian columns. In the present work, we prove that a subclass of these transformations, conformal maps, is identifiable and provide novel theoretical results suggesting that OCTs have properties that prevent families of spurious solutions to spoil identifiability in a generic setting. △ Less

Submitted 12 August, 2022; originally announced August 2022.

Comments: 43 pages

Journal ref: NeurIPS 2022

arXiv:2206.08843 [pdf, other]

AutoML Two-Sample Test

Authors: Jonas M. Kübler, Vincent Stimper, Simon Buchholz, Krikamol Muandet, Bernhard Schölkopf

Abstract: Two-sample tests are important in statistics and machine learning, both as tools for scientific discovery as well as to detect distribution shifts. This led to the development of many sophisticated test procedures going beyond the standard supervised learning frameworks, whose usage can require specialized knowledge about two-sample testing. We use a simple test that takes the mean discrepancy of… ▽ More Two-sample tests are important in statistics and machine learning, both as tools for scientific discovery as well as to detect distribution shifts. This led to the development of many sophisticated test procedures going beyond the standard supervised learning frameworks, whose usage can require specialized knowledge about two-sample testing. We use a simple test that takes the mean discrepancy of a witness function as the test statistic and prove that minimizing a squared loss leads to a witness with optimal testing power. This allows us to leverage recent advancements in AutoML. Without any user input about the problems at hand, and using the same method for all our experiments, our AutoML two-sample test achieves competitive performance on a diverse distribution shift benchmark as well as on challenging two-sample testing problems. We provide an implementation of the AutoML two-sample test in the Python package autotst. △ Less

Submitted 15 January, 2023; v1 submitted 17 June, 2022; originally announced June 2022.

Comments: NeurIPS 2022

arXiv:2012.10244 [pdf, other]

Time Aggregation Techniques Applied to a Capacity Expansion Model for Real-Life Sector Coupled Energy Systems

Authors: Mette Gamst, Stefanie Buchholz, David Pisinger

Abstract: Simulating energy systems is vital for energy planning to understand the effects of fluctuating renewable energy sources and integration of multiple energy sectors. Capacity expansion is a powerful tool for energy analysts and consists of simulating energy systems with the option of investing in new energy sources. In this paper, we apply clustering based aggregation techniques from the literature… ▽ More Simulating energy systems is vital for energy planning to understand the effects of fluctuating renewable energy sources and integration of multiple energy sectors. Capacity expansion is a powerful tool for energy analysts and consists of simulating energy systems with the option of investing in new energy sources. In this paper, we apply clustering based aggregation techniques from the literature to very different real-life sector coupled energy systems. We systematically compare the aggregation techniques with respect to solution quality and simulation time. Furthermore, we propose two new clustering approaches with promising results. We show that the aggregation techniques result in consistent solution time savings between 75% and 90%. Also, the quality of the aggregated solutions is generally very good. To the best of our knowledge, we are the first to analyze and conclude that a weighted representation of clusters is beneficial. Furthermore, to the best of our knowledge, we are the first to recommend a clustering technique with good performance across very different energy systems: the k-means with Euclidean distance measure, clustering days and with weighted selection, where the median, maximum and minimum elements from clusters are selected. A deeper analysis of the results reveal that the aggregation techniques excel when the investment decisions correlate well with the overall behavior of the energy system. We propose future research directions to remedy when this is not the case. △ Less

Submitted 17 December, 2020; originally announced December 2020.

ACM Class: G.4; G.2.3

arXiv:2010.13158 [pdf, other]

A "DIY" data acquisition system for acoustic field measurements under harsh conditions

Authors: Steffen Büchholz, Mathias Lemke, Julius Reiss, Jörn Sesterhenn

Abstract: Monitoring active volcanos is an ongoing and important task hel** to understand and predict volcanic eruptions. In recent years, analysing the acoustic properties of eruptions became more relevant. We present an inexpensive, lightweight, portable, easy to use and modular acoustic data acquisition system for field measurements that can record data with up to 100~kHz. The system is based on a Rasp… ▽ More Monitoring active volcanos is an ongoing and important task hel** to understand and predict volcanic eruptions. In recent years, analysing the acoustic properties of eruptions became more relevant. We present an inexpensive, lightweight, portable, easy to use and modular acoustic data acquisition system for field measurements that can record data with up to 100~kHz. The system is based on a Raspberry Pi 3 B running a custom build bare metal operating system. It connects to an external analog - digital converter with the microphone sensor. A GPS receiver allows the logging of the position and in addition the recording of a very accurate time signal synchronously to the acoustic data. With that, it is possible for multiple modules to effectively work as a single microphone array. The whole system can be build with low cost and demands only minimal technical infrastructure. We demonstrate a possible use of such a microphone array by deploying 20 modules on the active volcano \textit{Stromboli} in the Aeolian Islands by Sicily, Italy. We use the collected acoustic data to indentify the sound source position for all recorded eruptions. △ Less

Submitted 25 October, 2020; originally announced October 2020.

Comments: 9 figures at the end

arXiv:cs/0009008 [pdf, ps, other]

Introduction to the CoNLL-2000 Shared Task: Chunking

Authors: Erik F. Tjong Kim Sang, Sabine Buchholz

Abstract: We describe the CoNLL-2000 shared task: dividing text into syntactically related non-overlap** groups of words, so-called text chunking. We give background information on the data sets, present a general overview of the systems that have taken part in the shared task and briefly discuss their performance. We describe the CoNLL-2000 shared task: dividing text into syntactically related non-overlap** groups of words, so-called text chunking. We give background information on the data sets, present a general overview of the systems that have taken part in the shared task and briefly discuss their performance. △ Less

Submitted 18 September, 2000; originally announced September 2000.

Comments: 6 pages

ACM Class: I.2.7

Journal ref: Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal

arXiv:cs/9906005 [pdf, ps, other]

Memory-Based Shallow Parsing

Authors: Walter Daelemans, Sabine Buchholz, Jorn Veenstra

Abstract: We present a memory-based learning (MBL) approach to shallow parsing in which POS tagging, chunking, and identification of syntactic relations are formulated as memory-based modules. The experiments reported in this paper show competitive results, the F-value for the Wall Street Journal (WSJ) treebank is: 93.8% for NP chunking, 94.7% for VP chunking, 77.1% for subject detection and 79.0% for obj… ▽ More We present a memory-based learning (MBL) approach to shallow parsing in which POS tagging, chunking, and identification of syntactic relations are formulated as memory-based modules. The experiments reported in this paper show competitive results, the F-value for the Wall Street Journal (WSJ) treebank is: 93.8% for NP chunking, 94.7% for VP chunking, 77.1% for subject detection and 79.0% for object detection. △ Less

Submitted 2 June, 1999; originally announced June 1999.

Comments: 8 pages, to appear in: Proceedings of the EACL'99 workshop on Computational Natural Language Learning (CoNLL-99), Bergen, Norway, June 1999

Report number: ILK-9907 ACM Class: I.6.2; I.7.1

arXiv:cs/9906004 [pdf, ps, other]

Cascaded Grammatical Relation Assignment

Authors: Sabine Buchholz, Jorn Veenstra, Walter Daelemans

Abstract: In this paper we discuss cascaded Memory-Based grammatical relations assignment. In the first stages of the cascade, we find chunks of several types (NP,VP,ADJP,ADVP,PP) and label them with their adverbial function (e.g. local, temporal). In the last stage, we assign grammatical relations to pairs of chunks. We studied the effect of adding several levels to this cascaded classifier and we found… ▽ More In this paper we discuss cascaded Memory-Based grammatical relations assignment. In the first stages of the cascade, we find chunks of several types (NP,VP,ADJP,ADVP,PP) and label them with their adverbial function (e.g. local, temporal). In the last stage, we assign grammatical relations to pairs of chunks. We studied the effect of adding several levels to this cascaded classifier and we found that even the less performing chunkers enhanced the performance of the relation finder. △ Less

Submitted 2 June, 1999; originally announced June 1999.

Comments: 8 pages, to appear in: proceedings of EMNLP/VLC-99, University of Maryland, USA, June 21-22, 1999

Report number: ILK-9908 ACM Class: I.6.2; I.7.1

Showing 1–12 of 12 results for author: Buchholz, S