-
Inference Optimization of Foundation Models on AI Accelerators
Authors:
Youngsuk Park,
Kailash Budhathoki,
Liangfu Chen,
Jonas Kübler,
Jiaji Huang,
Matthäus Kleindessner,
Jun Huan,
Volkan Cevher,
Yida Wang,
George Karypis
Abstract:
Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI across various industries. Industry and research community have witnessed a large number of new applications, based on those foundation models. Such applications include question and answer, customer services, image and video generation, and code completions…
▽ More
Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI across various industries. Industry and research community have witnessed a large number of new applications, based on those foundation models. Such applications include question and answer, customer services, image and video generation, and code completions, among others. However, as the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios. As a result, the demand for cost-effective and fast inference using AI accelerators is ever more higher. To this end, our tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators. Beginning with an overview of basic Transformer architectures and deep learning system frameworks, we deep dive into system optimization techniques for fast and memory-efficient attention computations and discuss how they can be implemented efficiently on AI accelerators. Next, we describe architectural elements that are key for fast transformer inference. Finally, we examine various model compression and fast decoding strategies in the same context.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Evaluating the Fairness of Discriminative Foundation Models in Computer Vision
Authors:
Junaid Ali,
Matthaeus Kleindessner,
Florian Wenzel,
Kailash Budhathoki,
Volkan Cevher,
Chris Russell
Abstract:
We propose a novel taxonomy for bias evaluation of discriminative foundation models, such as Contrastive Language-Pretraining (CLIP), that are used for labeling tasks. We then systematically evaluate existing methods for mitigating bias in these models with respect to our taxonomy. Specifically, we evaluate OpenAI's CLIP and OpenCLIP models for key applications, such as zero-shot classification, i…
▽ More
We propose a novel taxonomy for bias evaluation of discriminative foundation models, such as Contrastive Language-Pretraining (CLIP), that are used for labeling tasks. We then systematically evaluate existing methods for mitigating bias in these models with respect to our taxonomy. Specifically, we evaluate OpenAI's CLIP and OpenCLIP models for key applications, such as zero-shot classification, image retrieval and image captioning. We categorize desired behaviors based around three axes: (i) if the task concerns humans; (ii) how subjective the task is (i.e., how likely it is that people from a diverse range of backgrounds would agree on a labeling); and (iii) the intended purpose of the task and if fairness is better served by impartiality (i.e., making decisions independent of the protected attributes) or representation (i.e., making decisions to maximize diversity). Finally, we provide quantitative fairness evaluations for both binary-valued and multi-valued protected attributes over ten diverse datasets. We find that fair PCA, a post-processing method for fair representations, works very well for debiasing in most of the aforementioned tasks while incurring only minor loss of performance. However, different debiasing approaches vary in their effectiveness depending on the task. Hence, one should choose the debiasing approach depending on the specific use case.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
Meaningful Causal Aggregation and Paradoxical Confounding
Authors:
Yuchen Zhu,
Kailash Budhathoki,
Jonas Kuebler,
Dominik Janzing
Abstract:
In aggregated variables the impact of interventions is typically ill-defined because different micro-realizations of the same macro-intervention can result in different changes of downstream macro-variables. We show that this ill-definedness of causality on aggregated variables can turn unconfounded causal relations into confounded ones and vice versa, depending on the respective micro-realization…
▽ More
In aggregated variables the impact of interventions is typically ill-defined because different micro-realizations of the same macro-intervention can result in different changes of downstream macro-variables. We show that this ill-definedness of causality on aggregated variables can turn unconfounded causal relations into confounded ones and vice versa, depending on the respective micro-realization. We argue that it is practically infeasible to only use aggregated causal systems when we are free from this ill-definedness. Instead, we need to accept that macro causal relations are typically defined only with reference to the micro states. On the positive side, we show that cause-effect relations can be aggregated when the macro interventions are such that the distribution of micro states is the same as in the observational distribution; we term this natural macro interventions. We also discuss generalizations of this observation.
△ Less
Submitted 22 February, 2024; v1 submitted 23 April, 2023;
originally announced April 2023.
-
Explaining the root causes of unit-level changes
Authors:
Kailash Budhathoki,
George Michailidis,
Dominik Janzing
Abstract:
Existing methods of explainable AI and interpretable ML cannot explain change in the values of an output variable for a statistical unit in terms of the change in the input values and the change in the "mechanism" (the function transforming input to output). We propose two methods based on counterfactuals for explaining unit-level changes at various input granularities using the concept of Shapley…
▽ More
Existing methods of explainable AI and interpretable ML cannot explain change in the values of an output variable for a statistical unit in terms of the change in the input values and the change in the "mechanism" (the function transforming input to output). We propose two methods based on counterfactuals for explaining unit-level changes at various input granularities using the concept of Shapley values from game theory. These methods satisfy two key axioms desirable for any unit-level change attribution method. Through simulations, we study the reliability and the scalability of the proposed methods. We get sensible results from a case study on identifying the drivers of the change in the earnings for individuals in the US.
△ Less
Submitted 26 June, 2022;
originally announced June 2022.
-
DoWhy-GCM: An extension of DoWhy for causal inference in graphical causal models
Authors:
Patrick Blöbaum,
Peter Götz,
Kailash Budhathoki,
Atalanti A. Mastakouri,
Dominik Janzing
Abstract:
We present DoWhy-GCM, an extension of the DoWhy Python library, which leverages graphical causal models. Unlike existing causality libraries, which mainly focus on effect estimation, DoWhy-GCM addresses diverse causal queries, such as identifying the root causes of outliers and distributional changes, attributing causal influences to the data generating process of each node, or diagnosis of causal…
▽ More
We present DoWhy-GCM, an extension of the DoWhy Python library, which leverages graphical causal models. Unlike existing causality libraries, which mainly focus on effect estimation, DoWhy-GCM addresses diverse causal queries, such as identifying the root causes of outliers and distributional changes, attributing causal influences to the data generating process of each node, or diagnosis of causal structures. With DoWhy-GCM, users typically specify cause-effect relations via a causal graph, fit causal mechanisms, and pose causal queries -- all with just a few lines of code. The general documentation is available at https://www.pywhy.org/dowhy and the DoWhy-GCM specific code at https://github.com/py-why/dowhy/tree/main/dowhy/gcm.
△ Less
Submitted 6 June, 2024; v1 submitted 14 June, 2022;
originally announced June 2022.
-
Why did the distribution change?
Authors:
Kailash Budhathoki,
Dominik Janzing,
Patrick Bloebaum,
Hoiyi Ng
Abstract:
We describe a formal approach based on graphical causal models to identify the "root causes" of the change in the probability distribution of variables. After factorizing the joint distribution into conditional distributions of each variable, given its parents (the "causal mechanisms"), we attribute the change to changes of these causal mechanisms. This attribution analysis accounts for the fact t…
▽ More
We describe a formal approach based on graphical causal models to identify the "root causes" of the change in the probability distribution of variables. After factorizing the joint distribution into conditional distributions of each variable, given its parents (the "causal mechanisms"), we attribute the change to changes of these causal mechanisms. This attribution analysis accounts for the fact that mechanisms often change independently and sometimes only some of them change. Through simulations, we study the performance of our distribution change attribution method. We then present a real-world case study identifying the drivers of the difference in the income distribution between men and women.
△ Less
Submitted 23 May, 2021; v1 submitted 26 February, 2021;
originally announced February 2021.
-
Discovering Reliable Causal Rules
Authors:
Kailash Budhathoki,
Mario Boley,
Jilles Vreeken
Abstract:
We study the problem of deriving policies, or rules, that when enacted on a complex system, cause a desired outcome. Absent the ability to perform controlled experiments, such rules have to be inferred from past observations of the system's behaviour. This is a challenging problem for two reasons: First, observational effects are often unrepresentative of the underlying causal effect because they…
▽ More
We study the problem of deriving policies, or rules, that when enacted on a complex system, cause a desired outcome. Absent the ability to perform controlled experiments, such rules have to be inferred from past observations of the system's behaviour. This is a challenging problem for two reasons: First, observational effects are often unrepresentative of the underlying causal effect because they are skewed by the presence of confounding factors. Second, naive empirical estimations of a rule's effect have a high variance, and, hence, their maximisation can lead to random results.
To address these issues, first we measure the causal effect of a rule from observational data---adjusting for the effect of potential confounders. Importantly, we provide a graphical criteria under which causal rule discovery is possible. Moreover, to discover reliable causal rules from a sample, we propose a conservative and consistent estimator of the causal effect, and derive an efficient and exact algorithm that maximises the estimator. On synthetic data, the proposed estimator converges faster to the ground truth than the naive estimator and recovers relevant causal rules even at small sample sizes. Extensive experiments on a variety of real-world datasets show that the proposed algorithm is efficient and discovers meaningful rules.
△ Less
Submitted 8 September, 2020; v1 submitted 6 September, 2020;
originally announced September 2020.
-
Quantifying intrinsic causal contributions via structure preserving interventions
Authors:
Dominik Janzing,
Patrick Blöbaum,
Atalanti A. Mastakouri,
Philipp M. Faller,
Lenon Minorics,
Kailash Budhathoki
Abstract:
We propose a notion of causal influence that describes the `intrinsic' part of the contribution of a node on a target node in a DAG. By recursively writing each node as a function of the upstream noise terms, we separate the intrinsic information added by each node from the one obtained from its ancestors. To interpret the intrinsic information as a {\it causal} contribution, we consider `structur…
▽ More
We propose a notion of causal influence that describes the `intrinsic' part of the contribution of a node on a target node in a DAG. By recursively writing each node as a function of the upstream noise terms, we separate the intrinsic information added by each node from the one obtained from its ancestors. To interpret the intrinsic information as a {\it causal} contribution, we consider `structure-preserving interventions' that randomize each node in a way that mimics the usual dependence on the parents and does not perturb the observed joint distribution. To get a measure that is invariant with respect to relabelling nodes we use Shapley based symmetrization and show that it reduces in the linear case to simple ANOVA after resolving the target node into noise variables. We describe our contribution analysis for variance and entropy, but contributions for other target metrics can be defined analogously. The code is available in the package gcm of the open source library DoWhy.
△ Less
Submitted 8 March, 2024; v1 submitted 1 July, 2020;
originally announced July 2020.
-
Memristive Model of Excitable Cells
Authors:
Maheshwar Sah,
Ram Kaji Budhathoki
Abstract:
This paper presents in-depth analysis of the excitable membranes of a biological system. We rigorously prove from the Chay neuron model that the state dependent voltage-sensitive potassium ion-channel and calcium sensitive potassium ion-channel in excitable cells are in-fact generic memristors and state independent mixed sodium and calcium ion-channel is non-memristive (nonlinear resistor) element…
▽ More
This paper presents in-depth analysis of the excitable membranes of a biological system. We rigorously prove from the Chay neuron model that the state dependent voltage-sensitive potassium ion-channel and calcium sensitive potassium ion-channel in excitable cells are in-fact generic memristors and state independent mixed sodium and calcium ion-channel is non-memristive (nonlinear resistor) element in the perspective of electrical circuit theory. The mechanism to give the rise of the periodic oscillation, aperiodic (chaotic) oscillation, spikes and bursting in excitable cells are also analyzed via the small-signal model, pole-zero diagram, local-activity principle, edge of chaos and Hopf-bifurcation theorem. It is also shown that the presence of complex-conjugate and positive real part of zeros (equivalent to the Eigenvalues) of the admittance function inside the two bifurcation points lead to the generation of complicated electrical signals in excitable membrane.
△ Less
Submitted 8 May, 2020;
originally announced May 2020.
-
Causal structure based root cause analysis of outliers
Authors:
Dominik Janzing,
Kailash Budhathoki,
Lenon Minorics,
Patrick Blöbaum
Abstract:
We describe a formal approach to identify 'root causes' of outliers observed in $n$ variables $X_1,\dots,X_n$ in a scenario where the causal relation between the variables is a known directed acyclic graph (DAG). To this end, we first introduce a systematic way to define outlier scores. Further, we introduce the concept of 'conditional outlier score' which measures whether a value of some variable…
▽ More
We describe a formal approach to identify 'root causes' of outliers observed in $n$ variables $X_1,\dots,X_n$ in a scenario where the causal relation between the variables is a known directed acyclic graph (DAG). To this end, we first introduce a systematic way to define outlier scores. Further, we introduce the concept of 'conditional outlier score' which measures whether a value of some variable is unexpected *given the value of its parents* in the DAG, if one were to assume that the causal structure and the corresponding conditional distributions are also valid for the anomaly. Finally, we quantify to what extent the high outlier score of some target variable can be attributed to outliers of its ancestors. This quantification is defined via Shapley values from cooperative game theory.
△ Less
Submitted 5 December, 2019;
originally announced December 2019.
-
Comparative Analysis of Switching Dynamics in Different Memristor Models
Authors:
Santosh Parajuli,
Ram Kaji Budhathoki
Abstract:
Memristor, memory resistor, is an emerging technology for computational memory. Number of different memristor models are available based on the physical experiments. To use memristor as a computational memory element, one should know how the internal state modulates in time when driven by current or voltage. In this paper, we examine three widely used models and make a comparison of how internal s…
▽ More
Memristor, memory resistor, is an emerging technology for computational memory. Number of different memristor models are available based on the physical experiments. To use memristor as a computational memory element, one should know how the internal state modulates in time when driven by current or voltage. In this paper, we examine three widely used models and make a comparison of how internal state in these models changes with respect to input current or voltage. In Strukov model, internal state changes linearly with the input current. However, the linearity of internal state modulation in Yang model can be controlled. On the other hand, Pickett model shows non linear variation in internal state with the input current.
△ Less
Submitted 13 June, 2019;
originally announced June 2019.
-
Nonvolatile Memory Cell Based on Memristor Emulator
Authors:
Santosh Parajuli,
Ram Kaji Budhathoki,
Hyongsuk Kim
Abstract:
Memristor, one of the fundamental circuit elements, has promising applications in non-volatile memory and storage technology as it can theoretically achieve infinite states. Information can be stored independently in these states and retrieved whenever required. In this paper, we have proposed a non volatile memory cell based on memristor emulator. The circuit is able to perform read and write ope…
▽ More
Memristor, one of the fundamental circuit elements, has promising applications in non-volatile memory and storage technology as it can theoretically achieve infinite states. Information can be stored independently in these states and retrieved whenever required. In this paper, we have proposed a non volatile memory cell based on memristor emulator. The circuit is able to perform read and write operations. In this memristor based memroy cell, unipolar pulse is used for writing and bipolar pulse is used for reading. Unlike other earlier designs, the circuit does not need external read/write enable switches to switch between read and write operations; the switching is achieved by the zero average bipolar read pulse given after the completion of write cycle. In our proposed memristor based memory cell, single bit can be read and any voltages from 0 to 5 volts can be written. Mathematical analysis and the simulation results of memristor emulator based read write circuit have been presented to confirm its operation.
△ Less
Submitted 13 May, 2019;
originally announced May 2019.
-
Causal Inference by Stochastic Complexity
Authors:
Kailash Budhathoki,
Jilles Vreeken
Abstract:
The algorithmic Markov condition states that the most likely causal direction between two random variables X and Y can be identified as that direction with the lowest Kolmogorov complexity. Due to the halting problem, however, this notion is not computable.
We hence propose to do causal inference by stochastic complexity. That is, we propose to approximate Kolmogorov complexity via the Minimum D…
▽ More
The algorithmic Markov condition states that the most likely causal direction between two random variables X and Y can be identified as that direction with the lowest Kolmogorov complexity. Due to the halting problem, however, this notion is not computable.
We hence propose to do causal inference by stochastic complexity. That is, we propose to approximate Kolmogorov complexity via the Minimum Description Length (MDL) principle, using a score that is mini-max optimal with regard to the model class under consideration. This means that even in an adversarial setting, such as when the true distribution is not in this class, we still obtain the optimal encoding for the data relative to the class.
We instantiate this framework, which we call CISC, for pairs of univariate discrete variables, using the class of multinomial distributions. Experiments show that CISC is highly accurate on synthetic, benchmark, as well as real-world data, outperforming the state of the art by a margin, and scales extremely well with regard to sample and domain sizes.
△ Less
Submitted 22 February, 2017;
originally announced February 2017.