Search | arXiv e-print repository

Inference Optimization of Foundation Models on AI Accelerators

Authors: Youngsuk Park, Kailash Budhathoki, Liangfu Chen, Jonas Kübler, Jiaji Huang, Matthäus Kleindessner, Jun Huan, Volkan Cevher, Yida Wang, George Karypis

Abstract: Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI across various industries. Industry and research community have witnessed a large number of new applications, based on those foundation models. Such applications include question and answer, customer services, image and video generation, and code completions… ▽ More Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI across various industries. Industry and research community have witnessed a large number of new applications, based on those foundation models. Such applications include question and answer, customer services, image and video generation, and code completions, among others. However, as the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios. As a result, the demand for cost-effective and fast inference using AI accelerators is ever more higher. To this end, our tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators. Beginning with an overview of basic Transformer architectures and deep learning system frameworks, we deep dive into system optimization techniques for fast and memory-efficient attention computations and discuss how they can be implemented efficiently on AI accelerators. Next, we describe architectural elements that are key for fast transformer inference. Finally, we examine various model compression and fast decoding strategies in the same context. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: Tutorial published at KDD 2024. Camera-ready version

arXiv:2310.11867 [pdf, other]

doi 10.1145/3600211.3604720

Evaluating the Fairness of Discriminative Foundation Models in Computer Vision

Authors: Junaid Ali, Matthaeus Kleindessner, Florian Wenzel, Kailash Budhathoki, Volkan Cevher, Chris Russell

Abstract: We propose a novel taxonomy for bias evaluation of discriminative foundation models, such as Contrastive Language-Pretraining (CLIP), that are used for labeling tasks. We then systematically evaluate existing methods for mitigating bias in these models with respect to our taxonomy. Specifically, we evaluate OpenAI's CLIP and OpenCLIP models for key applications, such as zero-shot classification, i… ▽ More We propose a novel taxonomy for bias evaluation of discriminative foundation models, such as Contrastive Language-Pretraining (CLIP), that are used for labeling tasks. We then systematically evaluate existing methods for mitigating bias in these models with respect to our taxonomy. Specifically, we evaluate OpenAI's CLIP and OpenCLIP models for key applications, such as zero-shot classification, image retrieval and image captioning. We categorize desired behaviors based around three axes: (i) if the task concerns humans; (ii) how subjective the task is (i.e., how likely it is that people from a diverse range of backgrounds would agree on a labeling); and (iii) the intended purpose of the task and if fairness is better served by impartiality (i.e., making decisions independent of the protected attributes) or representation (i.e., making decisions to maximize diversity). Finally, we provide quantitative fairness evaluations for both binary-valued and multi-valued protected attributes over ten diverse datasets. We find that fair PCA, a post-processing method for fair representations, works very well for debiasing in most of the aforementioned tasks while incurring only minor loss of performance. However, different debiasing approaches vary in their effectiveness depending on the task. Hence, one should choose the debiasing approach depending on the specific use case. △ Less

Submitted 18 October, 2023; originally announced October 2023.

Comments: Accepted at AIES'23

arXiv:2304.11625 [pdf, ps, other]

Meaningful Causal Aggregation and Paradoxical Confounding

Authors: Yuchen Zhu, Kailash Budhathoki, Jonas Kuebler, Dominik Janzing

Abstract: In aggregated variables the impact of interventions is typically ill-defined because different micro-realizations of the same macro-intervention can result in different changes of downstream macro-variables. We show that this ill-definedness of causality on aggregated variables can turn unconfounded causal relations into confounded ones and vice versa, depending on the respective micro-realization… ▽ More In aggregated variables the impact of interventions is typically ill-defined because different micro-realizations of the same macro-intervention can result in different changes of downstream macro-variables. We show that this ill-definedness of causality on aggregated variables can turn unconfounded causal relations into confounded ones and vice versa, depending on the respective micro-realization. We argue that it is practically infeasible to only use aggregated causal systems when we are free from this ill-definedness. Instead, we need to accept that macro causal relations are typically defined only with reference to the micro states. On the positive side, we show that cause-effect relations can be aggregated when the macro interventions are such that the distribution of micro states is the same as in the observational distribution; we term this natural macro interventions. We also discuss generalizations of this observation. △ Less

Submitted 22 February, 2024; v1 submitted 23 April, 2023; originally announced April 2023.

Comments: CLeaR 2024

arXiv:2206.12986 [pdf, other]

Explaining the root causes of unit-level changes

Authors: Kailash Budhathoki, George Michailidis, Dominik Janzing

Abstract: Existing methods of explainable AI and interpretable ML cannot explain change in the values of an output variable for a statistical unit in terms of the change in the input values and the change in the "mechanism" (the function transforming input to output). We propose two methods based on counterfactuals for explaining unit-level changes at various input granularities using the concept of Shapley… ▽ More Existing methods of explainable AI and interpretable ML cannot explain change in the values of an output variable for a statistical unit in terms of the change in the input values and the change in the "mechanism" (the function transforming input to output). We propose two methods based on counterfactuals for explaining unit-level changes at various input granularities using the concept of Shapley values from game theory. These methods satisfy two key axioms desirable for any unit-level change attribution method. Through simulations, we study the reliability and the scalability of the proposed methods. We get sensible results from a case study on identifying the drivers of the change in the earnings for individuals in the US. △ Less

Submitted 26 June, 2022; originally announced June 2022.

Comments: Under review

arXiv:2206.06821 [pdf, other]

DoWhy-GCM: An extension of DoWhy for causal inference in graphical causal models

Authors: Patrick Blöbaum, Peter Götz, Kailash Budhathoki, Atalanti A. Mastakouri, Dominik Janzing

Abstract: We present DoWhy-GCM, an extension of the DoWhy Python library, which leverages graphical causal models. Unlike existing causality libraries, which mainly focus on effect estimation, DoWhy-GCM addresses diverse causal queries, such as identifying the root causes of outliers and distributional changes, attributing causal influences to the data generating process of each node, or diagnosis of causal… ▽ More We present DoWhy-GCM, an extension of the DoWhy Python library, which leverages graphical causal models. Unlike existing causality libraries, which mainly focus on effect estimation, DoWhy-GCM addresses diverse causal queries, such as identifying the root causes of outliers and distributional changes, attributing causal influences to the data generating process of each node, or diagnosis of causal structures. With DoWhy-GCM, users typically specify cause-effect relations via a causal graph, fit causal mechanisms, and pose causal queries -- all with just a few lines of code. The general documentation is available at https://www.pywhy.org/dowhy and the DoWhy-GCM specific code at https://github.com/py-why/dowhy/tree/main/dowhy/gcm. △ Less

Submitted 6 June, 2024; v1 submitted 14 June, 2022; originally announced June 2022.

Journal ref: Journal of Machine Learning Research 25(147), 2024

arXiv:2102.13384 [pdf, other]

Why did the distribution change?

Authors: Kailash Budhathoki, Dominik Janzing, Patrick Bloebaum, Hoiyi Ng

Abstract: We describe a formal approach based on graphical causal models to identify the "root causes" of the change in the probability distribution of variables. After factorizing the joint distribution into conditional distributions of each variable, given its parents (the "causal mechanisms"), we attribute the change to changes of these causal mechanisms. This attribution analysis accounts for the fact t… ▽ More We describe a formal approach based on graphical causal models to identify the "root causes" of the change in the probability distribution of variables. After factorizing the joint distribution into conditional distributions of each variable, given its parents (the "causal mechanisms"), we attribute the change to changes of these causal mechanisms. This attribution analysis accounts for the fact that mechanisms often change independently and sometimes only some of them change. Through simulations, we study the performance of our distribution change attribution method. We then present a real-world case study identifying the drivers of the difference in the income distribution between men and women. △ Less

Submitted 23 May, 2021; v1 submitted 26 February, 2021; originally announced February 2021.

Comments: Proceedings of the Twenty Fourth International Conference on Artificial Intelligence and Statistics (AISTATS), 2021

arXiv:2009.02728 [pdf, other]

Discovering Reliable Causal Rules

Authors: Kailash Budhathoki, Mario Boley, Jilles Vreeken

Abstract: We study the problem of deriving policies, or rules, that when enacted on a complex system, cause a desired outcome. Absent the ability to perform controlled experiments, such rules have to be inferred from past observations of the system's behaviour. This is a challenging problem for two reasons: First, observational effects are often unrepresentative of the underlying causal effect because they… ▽ More We study the problem of deriving policies, or rules, that when enacted on a complex system, cause a desired outcome. Absent the ability to perform controlled experiments, such rules have to be inferred from past observations of the system's behaviour. This is a challenging problem for two reasons: First, observational effects are often unrepresentative of the underlying causal effect because they are skewed by the presence of confounding factors. Second, naive empirical estimations of a rule's effect have a high variance, and, hence, their maximisation can lead to random results. To address these issues, first we measure the causal effect of a rule from observational data---adjusting for the effect of potential confounders. Importantly, we provide a graphical criteria under which causal rule discovery is possible. Moreover, to discover reliable causal rules from a sample, we propose a conservative and consistent estimator of the causal effect, and derive an efficient and exact algorithm that maximises the estimator. On synthetic data, the proposed estimator converges faster to the ground truth than the naive estimator and recovers relevant causal rules even at small sample sizes. Extensive experiments on a variety of real-world datasets show that the proposed algorithm is efficient and discovers meaningful rules. △ Less

Submitted 8 September, 2020; v1 submitted 6 September, 2020; originally announced September 2020.

Comments: Poster presented in NeurIPS 2018 Workshop on Causal Learning

arXiv:2007.00714 [pdf, other]

Quantifying intrinsic causal contributions via structure preserving interventions

Authors: Dominik Janzing, Patrick Blöbaum, Atalanti A. Mastakouri, Philipp M. Faller, Lenon Minorics, Kailash Budhathoki

Abstract: We propose a notion of causal influence that describes the `intrinsic' part of the contribution of a node on a target node in a DAG. By recursively writing each node as a function of the upstream noise terms, we separate the intrinsic information added by each node from the one obtained from its ancestors. To interpret the intrinsic information as a {\it causal} contribution, we consider `structur… ▽ More We propose a notion of causal influence that describes the `intrinsic' part of the contribution of a node on a target node in a DAG. By recursively writing each node as a function of the upstream noise terms, we separate the intrinsic information added by each node from the one obtained from its ancestors. To interpret the intrinsic information as a {\it causal} contribution, we consider `structure-preserving interventions' that randomize each node in a way that mimics the usual dependence on the parents and does not perturb the observed joint distribution. To get a measure that is invariant with respect to relabelling nodes we use Shapley based symmetrization and show that it reduces in the linear case to simple ANOVA after resolving the target node into noise variables. We describe our contribution analysis for variance and entropy, but contributions for other target metrics can be defined analogously. The code is available in the package gcm of the open source library DoWhy. △ Less

Submitted 8 March, 2024; v1 submitted 1 July, 2020; originally announced July 2020.

Comments: to appear at AISTATS 2024

Journal ref: AISTATS 2024, https://proceedings.mlr.press/v238/janzing24a.html

arXiv:2005.05789 [pdf]

Memristive Model of Excitable Cells

Authors: Maheshwar Sah, Ram Kaji Budhathoki

Abstract: This paper presents in-depth analysis of the excitable membranes of a biological system. We rigorously prove from the Chay neuron model that the state dependent voltage-sensitive potassium ion-channel and calcium sensitive potassium ion-channel in excitable cells are in-fact generic memristors and state independent mixed sodium and calcium ion-channel is non-memristive (nonlinear resistor) element… ▽ More This paper presents in-depth analysis of the excitable membranes of a biological system. We rigorously prove from the Chay neuron model that the state dependent voltage-sensitive potassium ion-channel and calcium sensitive potassium ion-channel in excitable cells are in-fact generic memristors and state independent mixed sodium and calcium ion-channel is non-memristive (nonlinear resistor) element in the perspective of electrical circuit theory. The mechanism to give the rise of the periodic oscillation, aperiodic (chaotic) oscillation, spikes and bursting in excitable cells are also analyzed via the small-signal model, pole-zero diagram, local-activity principle, edge of chaos and Hopf-bifurcation theorem. It is also shown that the presence of complex-conjugate and positive real part of zeros (equivalent to the Eigenvalues) of the admittance function inside the two bifurcation points lead to the generation of complicated electrical signals in excitable membrane. △ Less

Submitted 8 May, 2020; originally announced May 2020.

arXiv:1912.02724 [pdf, other]

Causal structure based root cause analysis of outliers

Authors: Dominik Janzing, Kailash Budhathoki, Lenon Minorics, Patrick Blöbaum

Abstract: We describe a formal approach to identify 'root causes' of outliers observed in $n$ variables $X_1,\dots,X_n$ in a scenario where the causal relation between the variables is a known directed acyclic graph (DAG). To this end, we first introduce a systematic way to define outlier scores. Further, we introduce the concept of 'conditional outlier score' which measures whether a value of some variable… ▽ More We describe a formal approach to identify 'root causes' of outliers observed in $n$ variables $X_1,\dots,X_n$ in a scenario where the causal relation between the variables is a known directed acyclic graph (DAG). To this end, we first introduce a systematic way to define outlier scores. Further, we introduce the concept of 'conditional outlier score' which measures whether a value of some variable is unexpected *given the value of its parents* in the DAG, if one were to assume that the causal structure and the corresponding conditional distributions are also valid for the anomaly. Finally, we quantify to what extent the high outlier score of some target variable can be attributed to outliers of its ancestors. This quantification is defined via Shapley values from cooperative game theory. △ Less

Submitted 5 December, 2019; originally announced December 2019.

Comments: 11 pages, 9 Figures

arXiv:1906.05643 [pdf, other]

Comparative Analysis of Switching Dynamics in Different Memristor Models

Authors: Santosh Parajuli, Ram Kaji Budhathoki

Abstract: Memristor, memory resistor, is an emerging technology for computational memory. Number of different memristor models are available based on the physical experiments. To use memristor as a computational memory element, one should know how the internal state modulates in time when driven by current or voltage. In this paper, we examine three widely used models and make a comparison of how internal s… ▽ More Memristor, memory resistor, is an emerging technology for computational memory. Number of different memristor models are available based on the physical experiments. To use memristor as a computational memory element, one should know how the internal state modulates in time when driven by current or voltage. In this paper, we examine three widely used models and make a comparison of how internal state in these models changes with respect to input current or voltage. In Strukov model, internal state changes linearly with the input current. However, the linearity of internal state modulation in Yang model can be controlled. On the other hand, Pickett model shows non linear variation in internal state with the input current. △ Less

Submitted 13 June, 2019; originally announced June 2019.

arXiv:1905.04864 [pdf, other]

Nonvolatile Memory Cell Based on Memristor Emulator

Authors: Santosh Parajuli, Ram Kaji Budhathoki, Hyongsuk Kim

Abstract: Memristor, one of the fundamental circuit elements, has promising applications in non-volatile memory and storage technology as it can theoretically achieve infinite states. Information can be stored independently in these states and retrieved whenever required. In this paper, we have proposed a non volatile memory cell based on memristor emulator. The circuit is able to perform read and write ope… ▽ More Memristor, one of the fundamental circuit elements, has promising applications in non-volatile memory and storage technology as it can theoretically achieve infinite states. Information can be stored independently in these states and retrieved whenever required. In this paper, we have proposed a non volatile memory cell based on memristor emulator. The circuit is able to perform read and write operations. In this memristor based memroy cell, unipolar pulse is used for writing and bipolar pulse is used for reading. Unlike other earlier designs, the circuit does not need external read/write enable switches to switch between read and write operations; the switching is achieved by the zero average bipolar read pulse given after the completion of write cycle. In our proposed memristor based memory cell, single bit can be read and any voltages from 0 to 5 volts can be written. Mathematical analysis and the simulation results of memristor emulator based read write circuit have been presented to confirm its operation. △ Less

Submitted 13 May, 2019; originally announced May 2019.

arXiv:1702.06776 [pdf, other]

Causal Inference by Stochastic Complexity

Authors: Kailash Budhathoki, Jilles Vreeken

Abstract: The algorithmic Markov condition states that the most likely causal direction between two random variables X and Y can be identified as that direction with the lowest Kolmogorov complexity. Due to the halting problem, however, this notion is not computable. We hence propose to do causal inference by stochastic complexity. That is, we propose to approximate Kolmogorov complexity via the Minimum D… ▽ More The algorithmic Markov condition states that the most likely causal direction between two random variables X and Y can be identified as that direction with the lowest Kolmogorov complexity. Due to the halting problem, however, this notion is not computable. We hence propose to do causal inference by stochastic complexity. That is, we propose to approximate Kolmogorov complexity via the Minimum Description Length (MDL) principle, using a score that is mini-max optimal with regard to the model class under consideration. This means that even in an adversarial setting, such as when the true distribution is not in this class, we still obtain the optimal encoding for the data relative to the class. We instantiate this framework, which we call CISC, for pairs of univariate discrete variables, using the class of multinomial distributions. Experiments show that CISC is highly accurate on synthetic, benchmark, as well as real-world data, outperforming the state of the art by a margin, and scales extremely well with regard to sample and domain sizes. △ Less

Submitted 22 February, 2017; originally announced February 2017.

Showing 1–13 of 13 results for author: Budhathoki, K