Stochastic Concept Bottleneck Models

Moritz Vandenhirtz, Sonia Laguna¹¹footnotemark: 1, Ričards Marcinkevičs, Julia E. Vogt
Department of Computer Science
ETH Zurich
Switzerland
Equal contribution. Correspondence to [email protected]

Abstract

Concept Bottleneck Models (CBMs) have emerged as a promising interpretable method whose final prediction is based on intermediate, human-understandable concepts rather than the raw input. Through time-consuming manual interventions, a user can correct wrongly predicted concept values to enhance the model’s downstream performance. We propose Stochastic Concept Bottleneck Models (SCBMs), a novel approach that models concept dependencies. In SCBMs, a single-concept intervention affects all correlated concepts, thereby improving intervention effectiveness. Unlike previous approaches that model the concept relations via an autoregressive structure, we introduce an explicit, distributional parameterization that allows SCBMs to retain the CBMs’ efficient training and inference procedure. Additionally, we leverage the parameterization to derive an effective intervention strategy based on the confidence region. We show empirically on synthetic tabular and natural image datasets that our approach improves intervention effectiveness significantly. Notably, we showcase the versatility and usability of SCBMs by examining a setting with CLIP-inferred concepts, alleviating the need for manual concept annotations.

1 Introduction

In today’s world, machine learning plays a crucial role in making important decisions, from healthcare to finance and law. However, as these algorithms become more complex, understanding how they arrive at their decisions becomes increasingly challenging. This lack of interpretability is a significant concern, especially in situations where trustworthiness, transparency, and accountability are paramount (Lipton, \APACyear2016; Doshi-Velez \BBA Kim, \APACyear2017). Recent studies have focused on Concept Bottleneck Models (CBMs) (Koh \BOthers., \APACyear2020; Havasi \BOthers., \APACyear2022; Shin \BOthers., \APACyear2023), a class of models that predict human-understandable concepts upon which the final target prediction is based. CBMs offer interpretability since a user can inspect the predicted concept values to understand how the model arrives at its final target prediction. Moreover, if they disagree with a concept prediction, they can intervene by adjusting it to the right value, which in turn affects the target prediction.

For example, consider the yellow warbler in Figure 1 (a), where a user might notice that the binary concept ‘yellow primary color’ is mispredicted. Upon this realization, they can intervene on the CBM by setting its value to $1$ , which increases the probability of the class yellow warbler. This way of interacting allows any untrained user to engage with the model to increase its predictive performance.

However, if the user input is that the primary color is yellow, should not the likelihood of a yellow belly increase too? This adaptation would increase the predicted likelihood of the correct class even more, as yellow warblers are characterized by their fully yellow body. Currently, vanilla CBMs do not exhibit this behavior as they do not use the intervened-on concepts to update their remaining concept predictions. This indicates that they suboptimally adapt to the additional knowledge gained. To this end, we propose to extend the concept predictions with the modeling of their dependencies, as depicted in Figure 1 (a,c).

Refer to caption — Figure 1: Overview of the proposed method for the CUB dataset. (a) A user intervenes on the concept of ‘primary color: yellow’. Unlike CBMs, our method then uses this information to adjust the predicted probability of correlated concepts, thereby affecting the target prediction. (b) Schematic overview of the intervention procedure. A user’s intervention ${\bm{c}}^{\prime}_{\mathcal{S}}$ is used to infer the logits $\bm{\eta}_{\setminus\mathcal{S}}$ of the remaining concepts. (c) Visualization of the learned global dependency structure as a correlation matrix for the 112 concepts of CUB (Wah \BOthers., \APACyear2011). Characterization of concepts on the left.

The proposed approach captures the concept dependencies by modeling the concept logits with a learnable non-diagonal normal distribution, which enables efficient, scalable computing of the effect of interventions on other concepts. By integrating concept correlations, we reduce the time and effort of having to laboriously intervene on many correlated variables and increase the efficacy of interventions on the downstream prediction. Thanks to the explicit distributional assumptions, the model is trained end-to-end, retaining the training and inference speed of classic CBMs as well as the benefits of training the concept and target predictor jointly. Moreover, we show that our method excels when querying user interventions based on predicted concept uncertainty (Shin \BOthers., \APACyear2023), further highlighting the practical utility of our approach as such policies spare users from manually sifting through the concepts to identify necessary interventions. Lastly, based on the distributional concept parameterization, we propose a novel approach for computing dependency-aware interventions through the likelihood-based confidence region.

Contributions

This work contributes to the line of research on concept bottleneck models in several ways. (i) We propose to capture and model concept dependencies with a multivariate normal distribution. (ii) We derive a novel intervention strategy based on the confidence region of the normal distribution that incorporates concept correlations. Using the learned concept dependencies during the intervention procedure allows for stronger interventional effectiveness. (iii) We provide a thorough empirical assessment of the proposed method on synthetic tabular and natural image data. Additionally, we combine our method with concept discovery where we alleviate the need for annotations by using CLIP-inferred concepts. In particular, we show the proposed method (a) discovers meaningful, interpretable patterns in the form of concept dependencies, (b) allows for fast, scalable inference, and (c) outperforms related work with respect to intervention effectiveness thanks to the proposed concept modeling and intervention strategy.

2 Background & Related Work

Concept bottleneck models (Koh \BOthers., \APACyear2020; Lampert \BOthers., \APACyear2009; N. Kumar \BOthers., \APACyear2009) are typically trained on data points $\left({\bm{x}},{\bm{c}},y\right)$ , comprising the covariates ${\bm{x}}\in\mathcal{X}$ , target $y\in\mathcal{Y}$ , and $C$ annotated binary concepts ${\bm{c}}\in\mathcal{C}$ . Consider a neural network $f_{\bm{\theta}}$ parameterized by $\bm{\theta}$ and a slice $\left\langle g_{\bm{\psi}},h_{\bm{\phi}}\right\rangle$ (Leino \BOthers., \APACyear2018) s.t. $\hat{y}\mathrel{\hbox to0.0pt{\raisebox{1.29167pt}{$\cdot$}\hss}\raisebox{-1.2% 9167pt}{$\cdot$}}=f_{\bm{\theta}}\left({\bm{x}}\right)=g_{\bm{\psi}}\left(h_{% \bm{\phi}}\left({\bm{x}}\right)\right)$ . CBMs enforce a concept bottleneck $\bm{\hat{c}}\mathrel{\hbox to0.0pt{\raisebox{1.29167pt}{$\cdot$}\hss}\raisebox% {-1.29167pt}{$\cdot$}}=h_{\bm{\phi}}({\bm{x}})$ such that the model’s final output depends on the covariates ${\bm{x}}$ solely through the predicted concepts $\bm{\hat{c}}$ .

While Koh \BOthers. (\APACyear2020) propose the soft CBM, where the concept logits parameterize the bottleneck, Havasi \BOthers. (\APACyear2022) argue that such a representation leads to leakage, where additional unwanted information in the concept representation is used to predict the target (Margeloiu \BOthers., \APACyear2021; Mahinpei \BOthers., \APACyear2021). Thus, they parameterize the bottleneck by binarized concept predictions and call it the hard CBM. Then, Havasi \BOthers. (\APACyear2022) equip the hard CBM with an autoregressive structure of the form $c_{i}|{\bm{x}},{\bm{c}}_{<i}$ , which is supposed to learn the concept dependencies. As such, the implicit autoregressive modeling of concept dependencies by Havasi \BOthers. (\APACyear2022) is the most related to the current work. Complementary to our work, Heidemann \BOthers. (\APACyear2023) analyze how a CBM’s performance is affected by concept correlations. Unlike approaches that restrict the bottleneck to prevent leakage, Concept Embedding Models (CEM) (Espinosa Zarlenga \BOthers., \APACyear2022) represent each concept with a predicted embedding vector from which the concept probabilities can be inferred, treating the problem akin to a multi-task setting. E. Kim \BOthers. (\APACyear2023) model the embedding with a normal distribution, assuming a diagonal covariance matrix, which prevents them from capturing concept dependencies. Recent works explored how a CBM-like structure can be enforced even without a concept-annotated training set. Yuksekgonul \BOthers. (\APACyear2023) transform a pre-trained model into a CBM via a concept bank from concept activation vectors and multimodal models (B. Kim \BOthers., \APACyear2018), while Oikarinen \BOthers. (\APACyear2023) query GPT-3 (Brown \BOthers., \APACyear2020) for the concept set $\mathcal{C}$ and assign the values of the concept activations to each datapoint ${\bm{x}}$ with CLIP (Radford \BOthers., \APACyear2021) similarities. Marcinkevičs \BOthers. (\APACyear2024) instead relax the need for a concept labeled training set to a smaller validation set by fine-tuning a pre-trained model.

Intervenability (Marcinkevičs \BOthers., \APACyear2024) is a crucial element of CBMs as it allows the user to correct wrongly predicted concepts $\bm{\hat{c}}$ to ${\bm{c}}^{\prime}$ , which in turn affects the target prediction of the model $\hat{y}^{\prime}$ . If multiple concepts are intervened on, then the order of interventions is important. To this end, Sheth \BOthers. (\APACyear2022) and Shin \BOthers. (\APACyear2023) explore multiple policies according to which the order of concepts is determined. Chauhan \BOthers. (\APACyear2023) propose to combine predefined policies with learnable weighting parameters, while Espinosa Zarlenga \BOthers. (\APACyear2024) learn the policy itself. Steinmann \BOthers. (\APACyear2023) argue that instance-specific interventions are costly and store previous interventions in a memory to automatically reapply them for similar data points. Lastly, Collins \BOthers. (\APACyear2023) explore the advantages of including uncertainty rather than treating humans as oracles.

Our work models concept dependencies by parameterizing the bottleneck with a distribution. In a similar vein, Variational Autoencoders (Kingma \BBA Welling, \APACyear2014) parameterize the bottleneck with a normal distribution to model and generate new data. Stochastic Segmentation Networks (Monteiro \BOthers., \APACyear2020) parameterize the logits of a segmentation map with a non-diagonal normal distribution to capture the spatial correlations of pixels and model the aleatoric uncertainty. The modeling of uncertainty with a distribution is also explored by Bayesian Neural Networks (Neal, \APACyear1995) that learn a probability distribution over the neurons of a neural network.

3 Methods

We propose Stochastic Concept Bottleneck Models¹¹1The code is available in an anonymized repository here: https://anonymous.4open.science/r/scbm-A1AA/. (SCBM), a novel concept-based method that relaxes the implicit CBM assumption of independent concepts. SCBM captures the concept dependencies by learning their multivariate distribution. As a result, interventions become more effective and scalable, as a single intervention can influence multiple correlated concepts. A schematic overview of the proposed method is depicted in Figure 1 (b).

3.1 Model Formulation

To capture the concept dependencies, we model the concept logits $\bm{\eta}$ with a learned multivariate normal distribution. Modeling logits with a normal distribution has proven to be effective in the context of segmentation (Monteiro \BOthers., \APACyear2020). While Monteiro \BOthers. (\APACyear2020) use it to capture the spatial dependencies of pixels, we, instead, model the relations between concepts, where the properties of the normal distribution will prove useful. A neural network is trained to predict the distribution’s parameters $\bm{\eta}\mid{\bm{x}}\sim\mathcal{N}\left(\bm{\mu}({\bm{x}})),\bm{\Sigma}({\bm% {x}})\right)$ , where $\bm{\mu}({\bm{x}})\in\mathbb{R}^{C}$ , and $\bm{\Sigma}({\bm{x}})\in\mathbb{R}^{C\times C}$ . Thus, the traditional assumption of independent concepts $c_{i}\perp\!\!\!\perp c_{j}\mid{\bm{x}},\ \ \forall i\neq j$ is relaxed to $c_{i}\perp\!\!\!\perp c_{j}\mid\bm{\eta},\ \ \forall i\neq j$ , where the assumed normal distribution induces linear concept dependencies. The inductive bias of linearity is useful in practice as it is more robust to overfitting and computationally more scalable with respect to $C$ compared to its nonlinear alternative (Havasi \BOthers., \APACyear2022), as we will show in Section 5.

To learn the distribution, we minimize the negative log-likelihood

-\log p({\bm{c}}\mid{\bm{x}})=-\log\int p({\bm{c}}\mid\bm{\eta})p_{\bm{\phi}}(% \bm{\eta}\mid\bm{x})d\bm{\eta},

(1)

where $\bm{\phi}$ are the parameters of a neural network that predicts the distribution $\bm{\eta}\mid{\bm{x}}\sim\mathcal{N}\left(\bm{\mu}({\bm{x}})),\bm{\Sigma}({\bm% {x}})\right)$ . This integral is intractable due to the softmax operation applied in $p({\bm{c}}\mid\bm{\eta})$ . Thus, the integral is approximated by $M$ Monte-Carlo samples

-\log\int p({\bm{c}}\mid\bm{\eta})p_{\bm{\phi}}(\bm{\eta}\mid\bm{x})d\bm{\eta}% \approx-\log\frac{1}{M}\sum_{m=1}^{M}p({\bm{c}}\mid\bm{\eta}^{(m)}),\quad\bm{% \eta}^{(m)}\mid{\bm{x}}\sim\mathcal{N}\left(\bm{\mu}({\bm{x}})),\bm{\Sigma}({% \bm{x}})\right).

(2)

In order to learn $\bm{\phi}$ , we make use of the parameterization as normal distribution and employ the reparameterization trick $\bm{\eta}^{(m)}\mid{\bm{x}}=\bm{\mu}({\bm{x}})+\mathbf{L}({\bm{x}})\bm{% \epsilon}^{(m)},\quad\mathbf{L}({\bm{x}})\mathbf{L}({\bm{x}})^{T}=\bm{\Sigma}(% {\bm{x}}),\quad\bm{\epsilon}^{(m)}\sim\mathcal{N}\left(\bm{0},\bm{I}\right)$ such that gradients can be computed with respect to the parameters. Lastly, we incorporate the new relaxed conditional independence assumption

\log p({\bm{c}}\mid\bm{\eta})=\log\prod_{i=1}^{C}p(c_{i}\mid\eta_{i})=\sum_{i=% 1}^{C}\log p(c_{i}\mid\eta_{i}),

(3)

where $p(c_{i}\mid\eta_{i})$ describes a Bernoulli distribution parameterized by the sigmoid-transformed logits $\sigma(\eta_{i})$ . Combining the above considerations results in the following reformulation of the negative log-likelihood:

\displaystyle\begin{split}-\log p({\bm{c}}\mid{\bm{x}})\approx&-\log\frac{1}{M% }\sum_{m=1}^{M}p({\bm{c}}\mid\bm{\eta}^{(m)})\\ \propto&-\log\sum_{m=1}^{M}\exp\sum_{i=1}^{C}\log p(c_{i}\mid\eta_{i}^{(m)})\\ =&-\log\sum_{m=1}^{M}\exp\sum_{i=1}^{C}\left[-\mathrm{BCE}(c_{i},\sigma(\eta_{% i}^{(m)}))\right],\end{split}

(4)

where BCE stands for Binary Cross Entropy, and the logsumexp trick is used for numerical stability.

The distribution-based modeling procedure allows for efficient sampling, thus, enabling SCBM to train concept and target predictors jointly, sequentially, or independently. In contrast, the autoregressive alternative (Havasi \BOthers., \APACyear2022) requires independent training due to the computational complexity. We adopt a joint training scheme to obtain the benefits of end-to-end learning where concept and target predictors can adjust to each other. To prevent leakage, we follow Havasi \BOthers. (\APACyear2022) and train the model with the hard $\{0,1\}$ concept values as bottleneck rather than the logits used in the original CBM (Koh \BOthers., \APACyear2020). To this end, we employ the straight-through Gumbel-Softmax trick (Jang \BOthers., \APACyear2017; Maddison \BOthers., \APACyear2017) that approximates Bernoulli samples while being differentiable. The target predictor $g_{\bm{\psi}}$ is then learned by minimizing the negative log-likelihood

\displaystyle\begin{split}-\log p(y\mid{\bm{x}})=&-\log\sum_{{\bm{c}}\in% \mathcal{C}}p_{\bm{\psi}}(y\mid{\bm{c}})p({\bm{c}}\mid{\bm{x}})\\ \approx&-\log\frac{1}{M}\sum_{m=1}^{M}p_{\bm{\psi}}(y\mid{\bm{c}}^{(m)}),% \qquad{\bm{c}}^{(m)}\sim p({\bm{c}}\mid{\bm{x}}).\end{split}

(5)

Lastly, the learned dependencies are regularized by following Occam’s razor and to prevent overfitting. We take inspiration from the Graphical Lasso (Friedman \BOthers., \APACyear2008) and penalize the off-diagonal elements of the precision matrix $\bm{\Sigma}^{-1}$ .

By combining concept, target, and precision loss with weighting factors $\lambda_{1}$ and $\lambda_{2}$ , we arrive at the final loss function

-\log\sum_{m=1}^{M}\exp\sum_{i=1}^{C}-\mathrm{BCE}\left(c_{i},\sigma(\eta_{i}^% {(m)})\right)+\lambda_{1}\mathrm{CE}\left(y,\frac{1}{M}\sum_{m=1}^{M}g_{\bm{% \psi}}({\bm{c}}^{(m)})\right)+\lambda_{2}\sum_{i\neq j}\bm{\Sigma}({\bm{x}})^{% -1}_{i,j}.

(6)

3.2 Covariance Learning

The introduced amortized covariance matrix $\bm{\Sigma}({\bm{x}})$ provides the flexibility to tailor its predicted concept dependencies to each data point, making it adaptable to many data-generating mechanisms. For example, in the commonly used CUB (Wah \BOthers., \APACyear2011; Koh \BOthers., \APACyear2020), it can learn the class-wise concept structure present in the dataset. The explicit dependency representation inferred by the learned covariance matrix is useful as it provides insights into the learned correlations among the concepts, which is important for understanding and interpreting the model behavior.

However, an amortized covariance matrix comes at the price of not being able to visualize and interpret a unified concept structure on a dataset level. Depending on the need of the application, such a global structure might be preferable. Thus, we propose a variation of SCBM, where the covariance matrix is not amortized ( $\bm{\Sigma}({\bm{x}})$ ), but learned globally ( $\bm{\Sigma}$ ). An example of the global concept structure learned on CUB is shown in Figure 1 (c). This variation has the inductive bias of assuming a constant covariance matrix, whose utility depends on the underlying data-generating mechanism. We recommend using the more flexible, amortized version by default and only utilizing a global covariance if the strong assumption of fixed dependencies is reasonable. We will explore this empirically in more detail in Section 5.

3.3 Interventions

A distinguishing property of CBM-like methods is the user’s capacity to correct wrongly predicted concepts, which in turn affects the target prediction (Marcinkevičs \BOthers., \APACyear2024). For a big concept set, this intervention procedure can become quite laborious as a user has to inspect and manually intervene on each concept separately. SCBMs are designed to alleviate this need by utilizing the learned concept dependencies such that a single intervention affects all related concepts as modeled by the multivariate normal distribution.

The parameterization as a multivariate normal distribution allows for a quick, scalable intervention procedure. Given a set $\mathcal{S}\subset\{1,\ldots,C\}$ of concept interventions, the effect on the remaining concepts ${\bm{c}}_{\setminus\mathcal{S}}$ is computed via their logits $\bm{\eta}_{\setminus\mathcal{S}}$ by conditioning on the intervention logits $\bm{\eta}_{\mathcal{S}}^{\prime}$ , utilizing the known properties of the normal distribution

\displaystyle\begin{split}\bm{\eta}_{\setminus\mathcal{S}}\mid{\bm{x}},\bm{% \eta}^{\prime}_{\mathcal{S}}&\sim\mathcal{N}\left(\bm{\bar{\mu}}({\bm{x}}),\bm% {\overline{\Sigma}}({\bm{x}})\right),\\ \bm{\bar{\mu}}&=\bm{\mu}_{\setminus\mathcal{S}}+\bm{\Sigma}_{\setminus\mathcal% {S},\mathcal{S}}\bm{\Sigma}_{\mathcal{S},\mathcal{S}}^{-1}(\bm{\eta}^{\prime}_% {\mathcal{S}}-\bm{\mu}_{\mathcal{S}}),\\ \bm{\overline{\Sigma}}&=\bm{\Sigma}_{\setminus\mathcal{S},\setminus\mathcal{S}% }-\bm{\Sigma}_{\setminus\mathcal{S},\mathcal{S}}\bm{\Sigma}_{\mathcal{S},% \mathcal{S}}^{-1}\bm{\Sigma}_{\mathcal{S},\setminus\mathcal{S}}.\end{split}

(7)

For a standard CBM (Koh \BOthers., \APACyear2020), $\eta_{i}^{\prime}$ are set to the 5th (if $c_{i}=0$ ) or 95th (if $c_{i}=1$ ) percentile of the training distribution. Although this strategy is effective for SCBMs, see Appendix C.3, it presents certain limitations that result in a suboptimal intervention performance when interventions affect other concepts. For example, if the initially predicted $\mu_{i}$ was more extreme than the selected training percentile, the interventional shift guided by $\eta^{\prime}_{i}-\mu_{i}$ would point in the wrong direction. This, in turn, would cause $\bm{\eta}_{\setminus\mathcal{S}}$ to shift incorrectly. Thus, we pose the desideratum that an appropriate intervention strategy should determine $\eta_{i}^{\prime}$ such that $\eta_{i}^{\prime}-\mu_{i}\geq 0$ if $c_{i}=1$ , and $\eta_{i}^{\prime}-\mu_{i}\leq 0$ if $c_{i}=0$ . Additionally, $\eta_{i}^{\prime}-\mu_{i}$ should not be “too large” as to avoid that the intervention completely disregards the predicted $\bm{\mu}_{\setminus\mathcal{S}}$ .

Here manifests an additional benefit of the explicit distributional representation: the likelihood-based confidence region²²2A confidence region is the multivariate generalization of a confidence interval. provides a natural way of capturing the region of possible $\bm{\eta}_{\mathcal{S}}^{\prime}$ that fulfil our desiderata. Note that the confidence region takes concept dependencies into account when describing the area of possible $\bm{\eta}_{\mathcal{S}}^{\prime}$ . To determine the specific point within this region, we search for the values $\bm{\eta}_{\mathcal{S}}^{\prime}$ , which maximize the log-likelihood of the known, intervened-on concepts ${\bm{c}}_{\mathcal{S}}$ , implicitly focusing on concepts that the model predicts poorly.

\displaystyle\begin{split}\bm{\eta}_{\mathcal{S}}^{\prime}=\operatorname*{arg% \,max}_{\bm{\eta}_{\mathcal{S}}}&\log p({\bm{c}}_{\mathcal{S}}\mid\bm{\eta}_{% \mathcal{S}})\\ \operatorname*{s.\!t.}&-2\left(\log p(\bm{\eta}_{\mathcal{S}}\mid\bm{\mu}_{% \mathcal{S}},\bm{\Sigma}_{\mathcal{S},\mathcal{S}})-\log p(\bm{\mu}_{\mathcal{% S}}\mid\bm{\mu}_{\mathcal{S}},\bm{\Sigma}_{\mathcal{S},\mathcal{S}})\right)% \leq\chi^{2}_{d,1-\alpha}\\ &\eta_{i}^{\prime}-\mu_{i}\geq 0\text{ if }c_{i}=1,\quad\forall i\in\mathcal{S% }\\ &\eta_{i}^{\prime}-\mu_{i}\leq 0\text{ if }c_{i}=0,\quad\forall i\in\mathcal{S% },\end{split}

(8)

where $d=|\mathcal{S}|$ . The first inequality describes the confidence region. It is based on the logarithm of the likelihood ratio, which, after multiplying with $-2$ , asymptotically follows a $\chi^{2}$ distribution (Silvey, \APACyear1975). The last two inequalities restrict the region to the desired direction. Note that $\bm{\eta}_{\mathcal{S}}^{\prime}$ is computed to determine the conditional effect of the interventions on $\bm{\eta}_{\setminus\mathcal{S}}$ using Equation 7. When predicting $\hat{y}^{\prime}$ under interventions, the logits $\bm{\eta}_{\setminus\mathcal{S}}$ are then used for sampling the binary concept values ${\bm{c}}_{\setminus\mathcal{S}}$ while the intervened-on concepts ${\bm{c}}^{\prime}_{\mathcal{S}}$ are directly set to their known, binary value.

4 Experimental Setup

Datasets and Evaluation

We perform experiments on a variety of datasets to showcase the validity of our method. Inspired by Marcinkevičs \BOthers. (\APACyear2024), we introduce a synthetic tabular dataset with a data-generating mechanism that contains fixed concept dependencies we can regulate. In particular, the concept logits $\bm{\eta}$ are sampled from a randomly initialized positive definite covariance matrix and generate ${\bm{x}}$ . Binary concept values ${\bm{c}}$ are inferred from $\bm{\eta}$ and generate the target $y$ . We refer to Appendix A.1 for a more detailed description.

As a natural image classification benchmark, we evaluate the Caltech-UCSD Birds-200-2011 dataset (Wah \BOthers., \APACyear2011), comprised of bird photographs from 200 distinct classes. It includes 112 concepts, such as wing color and beak shape, shared across the same class instances as revised in the original CBM work (Koh \BOthers., \APACyear2020). Additionally, we explore another natural image classification task on CIFAR-10 (Krizhevsky \BOthers., \APACyear2009) with 10 classes. To mitigate the concept annotations requirement, the concepts are synthetically acquired in a similar fashion to the concept discovery literature. We adopt the 143 concept classes generated via GPT-3 (Brown \BOthers., \APACyear2020) in prior work (Oikarinen \BOthers., \APACyear2023). To obtain the binary concept values, we use the CLIP model (Radford \BOthers., \APACyear2021) to compute the similarity between each instance of an image with the text embedding of a specific concept and compare it to the similarity of its negative counterpart, i.e. not the concept. Appendix A.2 contains further details about the natural image datasets.

To compare methods, we evaluate the model performance based on the concept and target accuracy. We compute test performance before and after intervening on an increasing number of concepts. The order of concepts in the intervention is determined by an uncertainty-based policy (Shin \BOthers., \APACyear2023) that selects the concept whose predicted probability is closest to $0.5$ . We also show results for a random policy in Appendix C.1. Additionally, we evaluate the calibration of the predicted concept uncertainties that are being used for the uncertainty-based policy, with the Brier score (Brier, \APACyear1950) and the Expected Calibration Error (Naeini \BOthers., \APACyear2015; A. Kumar \BOthers., \APACyear2019).

Baselines

We evaluate the performance of our method in comparison with state-of-the-art models. Namely, we focus on the vanilla concept bottleneck model (CBM) by Koh \BOthers. (\APACyear2020) in its hard version (Havasi \BOthers., \APACyear2022), trained jointly using the straight-through Gumbel-Softmax trick (Jang \BOthers., \APACyear2017; Maddison \BOthers., \APACyear2017), as a sensical baseline to our binary modeling of concepts. Additionally, we explore the concept embedding model (CEM) by Espinosa Zarlenga \BOthers. (\APACyear2022) that learns two concept embeddings, $\bm{\hat{c}}_{i}^{+}$ and $\bm{\hat{c}}_{i}^{-}$ . These representations are used to predict the final concept probability with a learnable scoring function $\hat{p}_{i}=s(\bm{\hat{c}}_{i}^{+},\bm{\hat{c}}_{i}^{-})=\sigma(\mathbf{W}_{s}% [\bm{\hat{c}}_{i}^{+},\bm{\hat{c}}_{i}^{-}]^{T}+\mathbf{b}_{s})$ and are then combined on a final concept embedding $\bm{\hat{c}}_{i}=(\hat{p}_{i}\bm{\hat{c}}_{i}^{+}+(1-\hat{p}_{i})\bm{\hat{c}}_% {i}^{-})$ that is passed to the target predictor. Interventions are modeled by altering the concept probabilities $\hat{p}_{i}$ . Finally, we evaluate the autoregressive CBM structure proposed by Havasi \BOthers. (\APACyear2022), where concept dependencies are learned with an autoregressive structure. Here, each concept $c_{i}$ is predicted with a separate MLP that takes as input a shared latent representation of the input $f_{\bm{\theta}}({\bm{x}})$ and all previous concepts $c_{1},...,c_{i-1}$ . To obtain a good initialization of the autoregressive structure, it is pretrained for 50 epochs. As the Monte-Carlo sampling from the autoregressive structure is time-consuming, the target predictor $g_{\bm{\psi}}$ is trained independently using the ground-truth concepts as input. At intervention time, a normalized importance sampling algorithm is used to estimate the concept distribution.

Implementation Details

The model architectures are comprised of a backbone for concept prediction followed by a linear layer as head for an interpretable target prediction. Precise details and training configurations follow in Appendix B. To ensure the positive definiteness of the concept covariance matrix $\bm{\Sigma}$ , we parameterize it via its Cholesky decomposition $\bm{\Sigma}={\bm{L}}{\bm{L}}^{\top}$ . Thus, we solely predict the lower triangular Cholesky matrix ${\bm{L}}$ . We will evaluate two options for SCBMs: using a global ( $\bm{\Sigma}$ ) or an amortized covariance matrix $(\bm{\Sigma}({\bm{x}}))$ . For the amortized version, we set the weighting terms $\lambda_{1}$ and $\lambda_{2}$ of Equation 6 to 1. For the global version, we initialize it with the estimated empirical covariance matrix and set $\lambda_{2}=0$ , as we did not observe big differences when varying $\lambda_{2}$ . In Appendix C.2, we provide an ablation study, demonstrating that SCBMs are not very sensitive to the choice of $\lambda_{2}$ . At intervention time, we solve the optimization problem based on the $99\%$ -confidence region with the SLSQP algorithm (Kraft, \APACyear1988). In Appendix C.4, we provide an ablation with different confidence levels.

Table 1: Test-set concept and target accuracy (%) prior to interventions. Results are reported as averages and standard deviations of model performance across ten seeds. For each dataset and metric, the best-performing method is bolded and the runner-up is underlined.

Dataset	Method	Concept Accuracy	Target Accuracy
	Hard CBM	61.42 $\pm$ 0.07	58.38 $\pm$ 0.39
	CEM	61.42 $\pm$ 0.12	58.01 $\pm$ 0.49
Synthetic	Autoregressive CBM	62.17 $\pm$ 0.11	59.60 $\pm$ 0.62
	Global SCBM	61.57 $\pm$ 0.05	58.39 $\pm$ 0.53
	Amortized SCBM	62.41 $\pm$ 0.20	58.96 $\pm$ 0.38
	Hard CBM	94.97 $\pm$ 0.07	67.72 $\pm$ 0.57
	CEM	95.12 $\pm$ 0.07	69.60 $\pm$ 0.30
CUB	Autoregressive CBM	95.33 $\pm$ 0.07	69.24 $\pm$ 0.44
	Global SCBM	94.99 $\pm$ 0.09	68.19 $\pm$ 0.63
	Amortized SCBM	95.22 $\pm$ 0.09	69.87 $\pm$ 0.56
	Hard CBM	85.51 $\pm$ 0.04	69.73 $\pm$ 0.29
	CEM	85.12 $\pm$ 0.14	72.24 $\pm$ 0.33
CIFAR-10	Autoregressive CBM	85.31 $\pm$ 0.06	68.88 $\pm$ 0.47
	Global SCBM	85.86 $\pm$ 0.04	70.74 $\pm$ 0.29
	Amortized SCBM	86.00 $\pm$ 0.03	71.66 $\pm$ 0.25

5 Results

Test performance

In Table 1, we report the results of the concept and target accuracy prior to interventions. Overall, SCBM performs on par with the baseline methods, with no clear outperforming or underperforming technique throughout the datasets. This shows that the additional overhead of learning the concept dependencies does not negatively affect the predictive performance. We note that the amortized covariance variant consistently surpasses the globally learned matrix due to its ability to adjust the predicted concept dependency structure and uncertainty on an instance level. On the other hand, the global variant offers a unified understanding of the concept correlations, an example of which is presented in Figure 1 (c).

Table 2: Relative time it takes for one epoch in the CUB dataset when training on the training set, or evaluating on the test set, respectively.

Method	Training	Inference
Hard CBM	5x	1x
CEM	5x	1x
Autoregressive CBM	5x	14x
Global SCBM	5x	1x
Amortized SCBM	5x	1x

Notably, in CIFAR-10, even though the concept performance of CEM is the worst of all methods, it has the best target performance. This might suggest the presence of leakage in CEM’s embeddings, as in CIFAR-10, the concept set alone is not sufficient to predict the target, and learning additional information might be useful. In Table 2, we show the time it takes for training and testing of the methods. Here, it is evident that the autoregressive CBM of Havasi \BOthers. (\APACyear2022) suffers from a slow sampling process due to its autoregressive structure, while SCBMs retain the efficiency of CBMs.

Interventions

In this paragraph, we analyze the intervention performance of SCBMs and their baseline models, focusing on their effectiveness in modeling concept dependencies and improving target accuracy. Figure 2 shows the intervention curves across ten seeds, where the performance is measured based on the concept and target accuracy. The order of concepts to intervene on is determined by an uncertainty-based policy that makes use of the predicted probabilities. In Appendix C.1, we present the intervention performance if concepts were selected randomly. The intervention curves in the first row show that SCBMs are superior in modeling the concept dependencies, as evidenced by their significantly steeper intervention curves compared to the baseline methods. Furthermore, the second row of Figure 2 indicates that the strong concept modeling translates to a significant improvement in downstream performance, partly thanks to the intervention strategy introduced in Section 3.3. We note that especially for the most practical scenario of only a small number of interventions, SCBMs outperform their counterparts. Comparing the SCBM variants, the natural image datasets show an overall better intervention performance with the amortized covariance matrix, following the trend of Table 1, as it can capture the instance-wise correlation structure of the data. Only in the synthetic dataset, where the data-generating covariance matrix is fixed, does the global SCBM slightly outperform the amortized one. Thus, we advocate for the usage of the global variant only if the underlying assumption of a fixed covariance is reasonable. Lastly, the success of SCBMs on CIFAR-10, with CLIP-based concepts, shows our proposed method can work without human-annotated concepts.

Analyzing the performance of the autoregressive CBM, which also captures concept dependencies, we observe that they expectedly have a better intervention performance than the hard vanilla CBM, which does not take correlations into account. However, it becomes evident that, compared to the concept performance of SCBMs, their autoregressive structure does not capture the dependencies to the full extent. This shows in the target accuracy, where they only match or outperform SCBMs towards the full set of intervened concepts. We attribute the better performance on the full intervention set to the independent training procedure utilized by autoregressive CBMs, which comes at the cost of lower test performance in CIFAR-10. Arguably, in a realistic use-case scenario, such a high number of instance-level interventions is not sensible, and if it were, SCBMs could also be trained independently. Finally, the CEM shows reduced intervention performance as the expressive concept embeddings, which are prone to information leakage, seem to suboptimally adapt to the injected concept information.

Dataset	Method	Brier	ECE
	Hard CBM	28.79 $\pm$ 0.09	22.38 $\pm$ 0.15
	CEM	29.32 $\pm$ 0.08	23.55 $\pm$ 0.09
Synthetic	Autoregressive CBM	24.84 $\pm$ 0.32	13.54 $\pm$ 0.49
	Global SCBM	27.73 $\pm$ 0.09	20.10 $\pm$ 0.14
	Amortized SCBM	25.58 $\pm$ 0.20	15.57 $\pm$ 0.55
	Hard CBM	3.93 $\pm$ 0.05	2.44 $\pm$ 0.06
	CEM	4.04 $\pm$ 0.05	3.25 $\pm$ 0.07
CUB	Autoregressive CBM	3.75 $\pm$ 0.05	2.73 $\pm$ 0.05
	Global SCBM	3.87 $\pm$ 0.06	2.33 $\pm$ 0.09
	Amortized SCBM	3.64 $\pm$ 0.07	1.85 $\pm$ 0.08
	Hard CBM	10.42 $\pm$ 0.05	4.93 $\pm$ 0.17
	CEM	11.06 $\pm$ 0.16	7.11 $\pm$ 0.39
CIFAR-10	Autoregressive CBM	10.70 $\pm$ 0.05	6.07 $\pm$ 0.10
	Global SCBM	9.95 $\pm$ 0.02	2.88 $\pm$ 0.11
	Amortized SCBM	9.84 $\pm$ 0.02	2.22 $\pm$ 0.12

Modeling the concept distribution

A cornerstone of SCBMs is the explicit, distributional parameterization of concepts. This helps in understanding the data correlations and allows for visualization, as the example seen in Figure 1 (c). The explicit probabilistic modeling results in improved concept uncertainty estimates compared to the baseline CBM counterparts, as shown in Table 3, where lower metrics imply better estimates. This proves useful for interventions, where the uncertainty estimates can be leveraged for the choice of concept to intervene on, improving the target prediction more effectively and reducing the need for manual user inspection. In Figure 3, we compare the performance of randomly intervening versus intervening based on the predicted uncertainty. We observe that there is a big gap between the two policies, indicating the usefulness of the estimated probabilities. Nevertheless, note that intervening at random remains successful and supports the observations made in the previous paragraph, as shown in Appendix C.1.

6 Conclusion

In this paper, we introduced SCBMs, a new concept-based method that models concept dependencies with a multivariate normal distribution. We proposed a novel, effective intervention strategy that takes concept correlations into account and is based on the confidence region inferred from the distributional parameterization. We showed that our modeling approach retains CBMs’ training and inference speed, thus, being able to harness the benefits of end-to-end concept and target training. Additionally, the explicit parameterization offers the user a clearer understanding of the learned concept dependencies, providing deeper insights into how predictions and interventions are made. Empirically, we demonstrated that by modeling the concept dependencies, SCBMs offer a substantial improvement in intervention effectiveness, in concept as well as target accuracy, compared to related work. We showed that our method excels when iteratively intervening on the most uncertain concept predictions, sparing users from having to manually search through the concept set to identify necessary interventions. Additionally, our results indicate that learning the concept correlations does not decrease performance prior to interventions, in many cases even improving the performance over the baselines. Finally, the versatility of SCBMs is highlighted through their superior performance on CIFAR-10, where concept values are CLIP-based rather than human-annotated.

Limitations & Future Work

This work opens multiple new research avenues. A natural extension is to go beyond binary concepts, such as continuous domains with their corresponding adaptations of modeling the concept distribution. Additionally, addressing the quadratic memory complexity of the covariance matrix is essential for scaling to larger concept sets. Current interventions focus on editing the concept values. However, this work allows the editing of the learned dependency structure by adjusting the entries of the predicted covariance matrix, which could be explored. Lastly, to model additional information and reduce leakage, Koh \BOthers. (\APACyear2020); Havasi \BOthers. (\APACyear2022) propose the adoption of a side channel. The complementary effectiveness of incorporating the side channel in the covariance structure could be explored in the context of SCBMs.

References

Ansel \BOthers. (\APACyear2024) \APACinsertmetastaransel2024pytorch{APACrefauthors}Ansel, J., Yang, E., He, H., Gimelshein, N., Jain, A., Voznesensky, M.\BDBLothers \APACrefYearMonthDay2024. \BBOQ\APACrefatitlePyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation Pytorch 2: Faster machine learning through dynamic python bytecode transformation and graph compilation.\BBCQ \BIn \APACrefbtitleProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 Proceedings of the 29th acm international conference on architectural support for programming languages and operating systems, volume 2 (\BPGS 929–947). \PrintBackRefs\CurrentBib
Brier (\APACyear1950) \APACinsertmetastarbrier1950verification{APACrefauthors}Brier, G\BPBIW. \APACrefYearMonthDay1950. \BBOQ\APACrefatitleVerification of forecasts expressed in terms of probability Verification of forecasts expressed in terms of probability.\BBCQ \APACjournalVolNumPagesMonthly weather review7811–3. {APACrefURL} https://doi.org/10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2 \PrintBackRefs\CurrentBib
Brown \BOthers. (\APACyear2020) \APACinsertmetastarbrown2020language{APACrefauthors}Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J\BPBID., Dhariwal, P.\BDBLothers \APACrefYearMonthDay2020. \BBOQ\APACrefatitleLanguage models are few-shot learners Language models are few-shot learners.\BBCQ \APACjournalVolNumPagesAdvances in neural information processing systems331877–1901. \PrintBackRefs\CurrentBib
Chauhan \BOthers. (\APACyear2023) \APACinsertmetastarchauhan2023interactive{APACrefauthors}Chauhan, K., Tiwari, R., Freyberg, J., Shenoy, P.\BCBL \BBA Dvijotham, K. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleInteractive concept bottleneck models Interactive concept bottleneck models.\BBCQ \BIn \APACrefbtitleProceedings of the AAAI Conference on Artificial Intelligence Proceedings of the aaai conference on artificial intelligence (\BVOL 37, \BPGS 5948–5955). \PrintBackRefs\CurrentBib
Collins \BOthers. (\APACyear2023) \APACinsertmetastarCollins2023{APACrefauthors}Collins, K\BPBIM., Barker, M., Zarlenga, M\BPBIE., Raman, N., Bhatt, U., Jamnik, M.\BDBLDvijotham, K. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleHuman Uncertainty in Concept-Based AI Systems Human uncertainty in concept-based AI systems.\BBCQ \BIn F. Rossi, S. Das, J. Davis, K. Firth-Butterfield\BCBL \BBA A. John (\BEDS), \APACrefbtitleProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, AIES 2023, Montréal, QC, Canada, August 8-10, 2023 Proceedings of the 2023 AAAI/ACM conference on ai, ethics, and society, AIES 2023, montréal, qc, canada, august 8-10, 2023 (\BPGS 869–889). \APACaddressPublisherACM. \PrintBackRefs\CurrentBib
Doshi-Velez \BBA Kim (\APACyear2017) \APACinsertmetastardoshiRigorousScienceInterpretable2017{APACrefauthors}Doshi-Velez, F.\BCBT \BBA Kim, B. \APACrefYearMonthDay2017\APACmonth03. \APACrefbtitleTowards A Rigorous Science of Interpretable Machine Learning Towards A Rigorous Science of Interpretable Machine Learning (\BNUM arXiv:1702.08608). \APACaddressPublisherarXiv. {APACrefDOI} \doi10.48550/arXiv.1702.08608 \PrintBackRefs\CurrentBib
Espinosa Zarlenga \BOthers. (\APACyear2022) \APACinsertmetastarespinosa2022concept{APACrefauthors}Espinosa Zarlenga, M., Barbiero, P., Ciravegna, G., Marra, G., Giannini, F., Diligenti, M.\BDBLothers \APACrefYearMonthDay2022. \BBOQ\APACrefatitleConcept embedding models: Beyond the accuracy-explainability trade-off Concept embedding models: Beyond the accuracy-explainability trade-off.\BBCQ \BIn \APACrefbtitleAdvances in Neural Information Processing Systems Advances in neural information processing systems (\BVOL 35, \BPGS 21400–21413). \PrintBackRefs\CurrentBib
Espinosa Zarlenga \BOthers. (\APACyear2024) \APACinsertmetastarespinosa2024learning{APACrefauthors}Espinosa Zarlenga, M., Collins, K., Dvijotham, K., Weller, A., Shams, Z.\BCBL \BBA Jamnik, M. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleLearning to Receive Help: Intervention-Aware Concept Embedding Models Learning to receive help: Intervention-aware concept embedding models.\BBCQ \APACjournalVolNumPagesAdvances in Neural Information Processing Systems36. \PrintBackRefs\CurrentBib
Friedman \BOthers. (\APACyear2008) \APACinsertmetastarfriedman2008sparse{APACrefauthors}Friedman, J., Hastie, T.\BCBL \BBA Tibshirani, R. \APACrefYearMonthDay2008. \BBOQ\APACrefatitleSparse inverse covariance estimation with the graphical lasso Sparse inverse covariance estimation with the graphical lasso.\BBCQ \APACjournalVolNumPagesBiostatistics93432–441. \PrintBackRefs\CurrentBib
Havasi \BOthers. (\APACyear2022) \APACinsertmetastarHavasi2022{APACrefauthors}Havasi, M., Parbhoo, S.\BCBL \BBA Doshi-Velez, F. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleAddressing Leakage in Concept Bottleneck Models Addressing leakage in concept bottleneck models.\BBCQ \BIn A\BPBIH. Oh, A. Agarwal, D. Belgrave\BCBL \BBA K. Cho (\BEDS), \APACrefbtitleAdvances in Neural Information Processing Systems. Advances in neural information processing systems. {APACrefURL} https://openreview.net/forum?id=tglniD_fn9 \PrintBackRefs\CurrentBib
He \BOthers. (\APACyear2016) \APACinsertmetastarhe2016deep{APACrefauthors}He, K., Zhang, X., Ren, S.\BCBL \BBA Sun, J. \APACrefYearMonthDay2016. \BBOQ\APACrefatitleDeep residual learning for image recognition Deep residual learning for image recognition.\BBCQ \BIn \APACrefbtitleProceedings of the IEEE conference on computer vision and pattern recognition Proceedings of the ieee conference on computer vision and pattern recognition (\BPGS 770–778). \PrintBackRefs\CurrentBib
Heidemann \BOthers. (\APACyear2023) \APACinsertmetastarheidemann2023concept{APACrefauthors}Heidemann, L., Monnet, M.\BCBL \BBA Roscher, K. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleConcept correlation and its effects on concept-based models Concept correlation and its effects on concept-based models.\BBCQ \BIn \APACrefbtitleProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Proceedings of the ieee/cvf winter conference on applications of computer vision (\BPGS 4780–4788). \PrintBackRefs\CurrentBib
Jang \BOthers. (\APACyear2017) \APACinsertmetastarGumbel{APACrefauthors}Jang, E., Gu, S.\BCBL \BBA Poole, B. \APACrefYearMonthDay2017. \BBOQ\APACrefatitleCategorical Reparameterization with Gumbel-Softmax Categorical reparameterization with gumbel-softmax.\BBCQ \BIn \APACrefbtitle5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. 5th international conference on learning representations, ICLR 2017, toulon, france, april 24-26, 2017, conference track proceedings. \APACaddressPublisherOpenReview.net. {APACrefURL} https://openreview.net/forum?id=rkE3y85ee \PrintBackRefs\CurrentBib
B. Kim \BOthers. (\APACyear2018) \APACinsertmetastarKim2018{APACrefauthors}Kim, B., Wattenberg, M., Gilmer, J., Cai, C., Wexler, J., Viegas, F.\BCBL \BBA Sayres, R. \APACrefYearMonthDay2018. \BBOQ\APACrefatitleInterpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV).\BBCQ \BIn J. Dy \BBA A. Krause (\BEDS), \APACrefbtitleProceedings of the 35th International Conference on Machine Learning Proceedings of the 35th international conference on machine learning (\BVOL 80, \BPGS 2668–2677). \APACaddressPublisherPMLR. {APACrefURL} https://proceedings.mlr.press/v80/kim18d.html \PrintBackRefs\CurrentBib
E. Kim \BOthers. (\APACyear2023) \APACinsertmetastarkim2023probabilistic{APACrefauthors}Kim, E., Jung, D., Park, S., Kim, S.\BCBL \BBA Yoon, S. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleProbabilistic Concept Bottleneck Models Probabilistic concept bottleneck models.\BBCQ \BIn A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato\BCBL \BBA J. Scarlett (\BEDS), \APACrefbtitleProceedings of the 40th International Conference on Machine Learning Proceedings of the 40th international conference on machine learning (\BVOL 202, \BPGS 16521–16540). \APACaddressPublisherPMLR. {APACrefURL} https://proceedings.mlr.press/v202/kim23g.html \PrintBackRefs\CurrentBib
Kingma \BBA Ba (\APACyear2015) \APACinsertmetastarkingma2014adam{APACrefauthors}Kingma, D\BPBIP.\BCBT \BBA Ba, J. \APACrefYearMonthDay2015. \BBOQ\APACrefatitleAdam: A Method for Stochastic Optimization Adam: A method for stochastic optimization.\BBCQ \BIn Y. Bengio \BBA Y. LeCun (\BEDS), \APACrefbtitle3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. 3rd international conference on learning representations, ICLR 2015, san diego, ca, usa, may 7-9, 2015, conference track proceedings. {APACrefURL} http://arxiv.longhoe.net/abs/1412.6980 \PrintBackRefs\CurrentBib
Kingma \BBA Welling (\APACyear2014) \APACinsertmetastarkingma2013auto{APACrefauthors}Kingma, D\BPBIP.\BCBT \BBA Welling, M. \APACrefYearMonthDay2014. \BBOQ\APACrefatitleAuto-Encoding Variational Bayes Auto-encoding variational bayes.\BBCQ \BIn Y. Bengio \BBA Y. LeCun (\BEDS), \APACrefbtitle2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings. 2nd international conference on learning representations, ICLR 2014, banff, ab, canada, april 14-16, 2014, conference track proceedings. {APACrefURL} http://arxiv.longhoe.net/abs/1312.6114 \PrintBackRefs\CurrentBib
Koh \BOthers. (\APACyear2020) \APACinsertmetastarKoh2020{APACrefauthors}Koh, P\BPBIW., Nguyen, T., Tang, Y\BPBIS., Mussmann, S., Pierson, E., Kim, B.\BCBL \BBA Liang, P. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleConcept Bottleneck Models Concept bottleneck models.\BBCQ \BIn H\BPBID. III \BBA A. Singh (\BEDS), \APACrefbtitleProceedings of the 37th International Conference on Machine Learning Proceedings of the 37th international conference on machine learning (\BVOL 119, \BPGS 5338–5348). \APACaddressPublisherVirtualPMLR. {APACrefURL} https://proceedings.mlr.press/v119/koh20a.html \PrintBackRefs\CurrentBib
Kraft (\APACyear1988) \APACinsertmetastarkraft1988software{APACrefauthors}Kraft, D. \APACrefYearMonthDay1988. \BBOQ\APACrefatitleA software package for sequential quadratic programming A software package for sequential quadratic programming.\BBCQ \APACjournalVolNumPagesForschungsbericht- Deutsche Forschungs- und Versuchsanstalt fur Luft- und Raumfahrt. \PrintBackRefs\CurrentBib
Krizhevsky \BOthers. (\APACyear2009) \APACinsertmetastarkrizhevsky2009learning{APACrefauthors}Krizhevsky, A., Hinton, G.\BCBL \BOthersPeriod. \APACrefYearMonthDay2009. \BBOQ\APACrefatitleLearning multiple layers of features from tiny images Learning multiple layers of features from tiny images.\BBCQ \PrintBackRefs\CurrentBib
A. Kumar \BOthers. (\APACyear2019) \APACinsertmetastarkumar2019verified{APACrefauthors}Kumar, A., Liang, P\BPBIS.\BCBL \BBA Ma, T. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleVerified uncertainty calibration Verified uncertainty calibration.\BBCQ \APACjournalVolNumPagesAdvances in Neural Information Processing Systems32. \PrintBackRefs\CurrentBib
N. Kumar \BOthers. (\APACyear2009) \APACinsertmetastarKumar2009{APACrefauthors}Kumar, N., Berg, A\BPBIC., Belhumeur, P\BPBIN.\BCBL \BBA Nayar, S\BPBIK. \APACrefYearMonthDay2009. \BBOQ\APACrefatitleAttribute and simile classifiers for face verification Attribute and simile classifiers for face verification.\BBCQ \BIn \APACrefbtitle2009 IEEE 12th International Conference on Computer Vision 2009 ieee 12th international conference on computer vision (\BPGS 365–372). \APACaddressPublisherKyoto, JapanIEEE. {APACrefURL} https://doi.org/10.1109/ICCV.2009.5459250 \PrintBackRefs\CurrentBib
Lampert \BOthers. (\APACyear2009) \APACinsertmetastarLampert2009{APACrefauthors}Lampert, C\BPBIH., Nickisch, H.\BCBL \BBA Harmeling, S. \APACrefYearMonthDay2009. \BBOQ\APACrefatitleLearning to detect unseen object classes by between-class attribute transfer Learning to detect unseen object classes by between-class attribute transfer.\BBCQ \BIn \APACrefbtitle2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009 IEEE conference on computer vision and pattern recognition. \APACaddressPublisherMiami, FL, USAIEEE. {APACrefURL} https://doi.org/10.1109/CVPR.2009.5206594 \PrintBackRefs\CurrentBib
Leino \BOthers. (\APACyear2018) \APACinsertmetastarLeino2018{APACrefauthors}Leino, K., Sen, S., Datta, A., Fredrikson, M.\BCBL \BBA Li, L. \APACrefYearMonthDay2018. \BBOQ\APACrefatitleInfluence-Directed Explanations for Deep Convolutional Networks Influence-directed explanations for deep convolutional networks.\BBCQ \BIn \APACrefbtitle2018 IEEE International Test Conference (ITC). 2018 IEEE international test conference (ITC). \APACaddressPublisherIEEE. {APACrefURL} https://doi.org/10.1109/test.2018.8624792 \PrintBackRefs\CurrentBib
Lipton (\APACyear2016) \APACinsertmetastarliptonMythosModelInterpretability2016{APACrefauthors}Lipton, Z\BPBIC. \APACrefYearMonthDay2016\APACmonth06. \BBOQ\APACrefatitleThe Mythos of Model Interpretability The Mythos of Model Interpretability.\BBCQ \APACjournalVolNumPagesCommunications of the ACM611035–43. {APACrefDOI} \doi10.48550/arxiv.1606.03490 \PrintBackRefs\CurrentBib
Maddison \BOthers. (\APACyear2017) \APACinsertmetastarconcrete{APACrefauthors}Maddison, C\BPBIJ., Mnih, A.\BCBL \BBA Teh, Y\BPBIW. \APACrefYearMonthDay2017. \BBOQ\APACrefatitleThe Concrete Distribution: A Continuous Relaxation of Discrete Random Variables The concrete distribution: A continuous relaxation of discrete random variables.\BBCQ \BIn \APACrefbtitle5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. 5th international conference on learning representations, ICLR 2017, toulon, france, april 24-26, 2017, conference track proceedings. \APACaddressPublisherOpenReview.net. {APACrefURL} https://openreview.net/forum?id=S1jE5L5gl \PrintBackRefs\CurrentBib
Mahinpei \BOthers. (\APACyear2021) \APACinsertmetastarMahinpei2021{APACrefauthors}Mahinpei, A., Clark, J., Lage, I., Doshi-Velez, F.\BCBL \BBA Pan, W. \APACrefYearMonthDay2021. \APACrefbtitlePromises and Pitfalls of Black-Box Concept Learning Models. Promises and pitfalls of black-box concept learning models. {APACrefURL} https://doi.org/10.48550/arXiv.2106.13314 \APACrefnotearXiv:2106.13314 \PrintBackRefs\CurrentBib
Marcinkevičs \BOthers. (\APACyear2024) \APACinsertmetastarmarcinkevivcs2024beyond{APACrefauthors}Marcinkevičs, R., Laguna, S., Vandenhirtz, M.\BCBL \BBA Vogt, J\BPBIE. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleBeyond Concept Bottleneck Models: How to Make Black Boxes Intervenable? Beyond concept bottleneck models: How to make black boxes intervenable?\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2401.13544. \PrintBackRefs\CurrentBib
Marcinkevičs \BOthers. (\APACyear2024) \APACinsertmetastarMarcinkevics2023{APACrefauthors}Marcinkevičs, R., Reis Wolfertstetter, P., Klimiene, U., Chin-Cheong, K., Paschke, A., Zerres, J.\BDBLVogt, J\BPBIE. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleInterpretable and intervenable ultrasonography-based machine learning models for pediatric appendicitis Interpretable and intervenable ultrasonography-based machine learning models for pediatric appendicitis.\BBCQ \APACjournalVolNumPagesMedical Image Analysis91103042. {APACrefURL} https://www.sciencedirect.com/science/article/pii/S136184152300302X \PrintBackRefs\CurrentBib
Margeloiu \BOthers. (\APACyear2021) \APACinsertmetastarMargeloiu2021{APACrefauthors}Margeloiu, A., Ashman, M., Bhatt, U., Chen, Y., Jamnik, M.\BCBL \BBA Weller, A. \APACrefYearMonthDay2021. \APACrefbtitleDo Concept Bottleneck Models Learn as Intended? Do concept bottleneck models learn as intended? {APACrefURL} https://doi.org/10.48550/arXiv.2105.04289 \APACrefnotearXiv:2105.04289 \PrintBackRefs\CurrentBib
Monteiro \BOthers. (\APACyear2020) \APACinsertmetastarmonteiro2020stochastic{APACrefauthors}Monteiro, M., Le Folgoc, L., Coelho de Castro, D., Pawlowski, N., Marques, B., Kamnitsas, K.\BDBLGlocker, B. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleStochastic segmentation networks: Modelling spatially correlated aleatoric uncertainty Stochastic segmentation networks: Modelling spatially correlated aleatoric uncertainty.\BBCQ \BIn \APACrefbtitleAdvances in neural information processing systems Advances in neural information processing systems (\BVOL 33, \BPGS 12756–12767). \PrintBackRefs\CurrentBib
Naeini \BOthers. (\APACyear2015) \APACinsertmetastarnaeini2015obtaining{APACrefauthors}Naeini, M\BPBIP., Cooper, G.\BCBL \BBA Hauskrecht, M. \APACrefYearMonthDay2015. \BBOQ\APACrefatitleObtaining well calibrated probabilities using bayesian binning Obtaining well calibrated probabilities using bayesian binning.\BBCQ \BIn \APACrefbtitleProceedings of the AAAI conference on artificial intelligence Proceedings of the aaai conference on artificial intelligence (\BVOL 29). \PrintBackRefs\CurrentBib
Neal (\APACyear1995) \APACinsertmetastarneal2012bayesian{APACrefauthors}Neal, R\BPBIM. \APACrefYear1995. \APACrefbtitleBayesian learning for neural networks Bayesian learning for neural networks \APACtypeAddressSchool\BPhDUniversity of Toronto, Canada. {APACrefURL} https://librarysearch.library.utoronto.ca/permalink/01UTORONTO_INST/14bjeso/alma991106438365706196 \PrintBackRefs\CurrentBib
Oikarinen \BOthers. (\APACyear2023) \APACinsertmetastaroikarinen2023label{APACrefauthors}Oikarinen, T., Das, S., Nguyen, L\BPBIM.\BCBL \BBA Weng, T\BHBIW. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleLabel-free Concept Bottleneck Models Label-free concept bottleneck models.\BBCQ \BIn \APACrefbtitleThe 11th International Conference on Learning Representations. The 11th international conference on learning representations. {APACrefURL} https://openreview.net/forum?id=FlCg47MNvBA \PrintBackRefs\CurrentBib
Radford \BOthers. (\APACyear2021) \APACinsertmetastarradford2021learning{APACrefauthors}Radford, A., Kim, J\BPBIW., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S.\BDBLothers \APACrefYearMonthDay2021. \BBOQ\APACrefatitleLearning transferable visual models from natural language supervision Learning transferable visual models from natural language supervision.\BBCQ \BIn \APACrefbtitleInternational conference on machine learning International conference on machine learning (\BPGS 8748–8763). \PrintBackRefs\CurrentBib
Sheth \BOthers. (\APACyear2022) \APACinsertmetastarSheth2022{APACrefauthors}Sheth, I., Rahman, A\BPBIA., Sevyeri, L\BPBIR., Havaei, M.\BCBL \BBA Kahou, S\BPBIE. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleLearning from uncertain concepts via test time interventions Learning from uncertain concepts via test time interventions.\BBCQ \BIn \APACrefbtitleWorkshop on Trustworthy and Socially Responsible Machine Learning, NeurIPS 2022. Workshop on trustworthy and socially responsible machine learning, neurips 2022. {APACrefURL} https://openreview.net/forum?id=WVe3vok8Cc3 \PrintBackRefs\CurrentBib
Shin \BOthers. (\APACyear2023) \APACinsertmetastarShin2023{APACrefauthors}Shin, S., Jo, Y., Ahn, S.\BCBL \BBA Lee, N. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleA Closer Look at the Intervention Procedure of Concept Bottleneck Models A closer look at the intervention procedure of concept bottleneck models.\BBCQ \BIn A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato\BCBL \BBA J. Scarlett (\BEDS), \APACrefbtitleProceedings of the 40th International Conference on Machine Learning Proceedings of the 40th international conference on machine learning (\BVOL 202, \BPGS 31504–31520). \APACaddressPublisherPMLR. {APACrefURL} https://proceedings.mlr.press/v202/shin23a.html \PrintBackRefs\CurrentBib
Silvey (\APACyear1975) \APACinsertmetastarsilvey1975statistical{APACrefauthors}Silvey, S. \APACrefYear1975. \APACrefbtitleStatistical Inference Statistical inference. \APACaddressPublisherTaylor & Francis. {APACrefURL} https://books.google.ch/books?id=qIKLejbVMf4C \PrintBackRefs\CurrentBib
Steinmann \BOthers. (\APACyear2023) \APACinsertmetastarSteinmann2023{APACrefauthors}Steinmann, D., Stammer, W., Friedrich, F.\BCBL \BBA Kersting, K. \APACrefYearMonthDay2023. \APACrefbtitleLearning to Intervene on Concept Bottlenecks. Learning to intervene on concept bottlenecks. {APACrefURL} https://doi.org/10.48550/arXiv.2308.13453 \APACrefnotearXiv:2308.13453 \PrintBackRefs\CurrentBib
Wah \BOthers. (\APACyear2011) \APACinsertmetastarwah2011caltech{APACrefauthors}Wah, C., Branson, S., Welinder, P., Perona, P.\BCBL \BBA Belongie, S. \APACrefYearMonthDay2011. \BBOQ\APACrefatitleThe caltech-ucsd birds-200-2011 dataset The caltech-ucsd birds-200-2011 dataset.\BBCQ \PrintBackRefs\CurrentBib
Yuksekgonul \BOthers. (\APACyear2023) \APACinsertmetastarYuksekgonul2023{APACrefauthors}Yuksekgonul, M., Wang, M.\BCBL \BBA Zou, J. \APACrefYearMonthDay2023. \BBOQ\APACrefatitlePost-hoc Concept Bottleneck Models Post-hoc concept bottleneck models.\BBCQ \BIn \APACrefbtitleThe 11th International Conference on Learning Representations. The 11th international conference on learning representations. {APACrefURL} https://openreview.net/forum?id=nA5AZ8CEyow \PrintBackRefs\CurrentBib

Appendix A Dataset Details

In this section, we provide additional details on the datasets that are being used in the experiments.

A.1 Synthetic Data-Generating Mechanism

Here, we describe the data-generating mechanism of the synthetic dataset in more detail. Let $N$ , $p$ , and $C$ denote the number of independent data points $\left\{\left({\bm{x}}_{n},{\bm{c}}_{n},y_{n}\right)\right\}_{n=1}^{N}$ , covariates, and concepts, respectively. We set $N=50$ , $000$ , $p=1$ , $500$ , and $C=100$ , with a 60%-20%-20% train-validation-test split. The generative process is as follows:

1.

Randomly sample ${\bm{W}}\in\mathbb{R}^{C\times 10}$ s.t. $w_{i,j}\sim\mathcal{N}(0,1)$ for $1\leq i\leq C$ and $1\leq j\leq 10$ .
2.

Generate a positive definite matrix ${\bm{\Sigma}}\in\mathbb{R}^{C\times C}$ s.t. ${\bm{\Sigma}}={\bm{W}}{\bm{W}}^{T}+{\bm{D}}$ . Let ${\bm{D}}\in\mathbb{R}^{C\times C}$ s.t. ${\bm{D}}=\bm{\delta}{\bm{I}}$ , where $\delta_{i}\sim{\displaystyle{\mathcal{U}}_{[0,1]}}$ for $1\leq i\leq C$ .
3.

Randomly sample logits ${\bm{H}}\in\mathbb{R}^{N\times C}$ s.t. $\bm{\eta}_{n}\sim\mathcal{N}(\bm{0},{\bm{\Sigma}})$ for $1\leq n\leq N$ .
4.

Let $c_{n,i}=\mathbbm{1}_{\left\{\eta_{n,i}\geq 0\right\}}$ for $1\leq n\leq N$ and $1\leq i\leq C$ .
5.

Let $h:\>\mathbb{R}^{C}\rightarrow\mathbb{R}^{p}$ be a randomly initialised multilayer perceptron with ReLU nonlinearities.
6.

Let ${\bm{x}}_{n}=h\left(\bm{\eta}_{n}\right)+\bm{\epsilon}_{n}$ s.t. $\bm{\epsilon}_{n}\sim\mathcal{N}(\bm{0},{\bm{I}})$ for $1\leq n\leq N$ .
7.

Let $g:\>\mathbb{R}^{C}\rightarrow\mathbb{R}$ be a randomly initialized linear perceptron.
8.

Let $y_{n}=\mathbbm{1}_{\left\{\left(g\left({\bm{c}}_{n}\right)\geq y_{med}\right)% \right\}}$ for $1\leq n\leq N$ , where $y_{med}$ denotes the median of $g\left({\bm{c}}_{n}\right)$ .

A.2 Natural Image Datasets

Caltech-UCSD Birds-200-2011

We evaluate on the Caltech-UCSD Birds-200-2011 (CUB)³³3https://www.vision.caltech.edu/datasets/cub_200_2011/, no license available dataset (Wah \BOthers., \APACyear2011). It comprises 11,788 photographs from 200 distinct bird species annotated with 312 concepts, such as belly color and pattern. In this manuscript, we follow the original train-test split and revised the proposed dataset in the initial CBM work (Koh \BOthers., \APACyear2020). Here, only the 112 most widespread binary attributes are included in the final dataset, and concepts are shared across samples in identical classes. The images were resized to a resolution of 224 × 224 pixels. Finally, following the original proposed augmentations, we applied random horizontal flips, modified the brightness and saturation, and applied normalization during training.

CIFAR-10

CIFAR-10⁴⁴4https://www.cs.toronto.edu/~kriz/cifar.html, no license available (Krizhevsky \BOthers., \APACyear2009) is a natural image benchmark with 60,000 32x32 colour images and 10 classes. We kept the original train-test split, with 50,000 samples in the train set and a balanced total of 6,000 images per class. We generated 143 concept labels as described in Section 4 using large language and vision models. At training time, as for CUB, we applied augmentations including modifications to brightness and saturation, random horizontal flips and normalisation. Images were rescaled to a size of 224 × 224 pixels.

Appendix B Implementation Details

This section provides further implementation details of SCBM and the evaluated baselines. All methods were implemented using PyTorch (v 2.1.1) (Ansel \BOthers., \APACyear2024). All models are trained for 150 epochs for the synthetic and 300 epochs for the natural image datasets with the Adam optimizer (Kingma \BBA Ba, \APACyear2015) with a learning rate of $10^{-4}$ and a batch size of 64. For the independently trained autoregressive model, we split the training epochs into $2/3$ for the concept predictor and $1/3$ for the target predictor. For the methods requiring sampling, the number of Monte-Carlo samples is set to $M=100$ . For the synthetic tabular data, we use a fully connected neural network as backbone, with 3 non-linear layers, batch normalization, and dropout. For the CUB dataset, we use a pretrained ResNet-18 (He \BOthers., \APACyear2016), and for the lower-resolution CIFAR-10 a simple convolutional neural network with 2 convolutional layers followed by ReLU, Dropout, and a fully connected layer. For fairness in the comparisons, all baselines have the same model architecture choices and all experiments are performed over $10$ random seeds.

Resource Usage

For the experiments of the main paper, we used a cluster of mostly GeForce RTX 2080’s with 2 CPU workers. Over all methods, we estimate an average runtime of 8h per experiment. This amounts to 5 methods $\times$ 3 datasets $\times$ 10 seeds $\times$ 8 hours $=$ 1200 hours. Adding to that, the Ablation Figures required another 40 runs, amounting to a full total of 1520 hours of compute. Please note that we only report the numbers to generate the final results but not the development time, which we roughly estimate to be around 10 times bigger.

Appendix C Further Experiments

In this section, we show additional experiments to provide a more in-depth understanding of SCBM’s effectiveness. We ablate multiple hyperparameters to provide an understanding of how they influence the model performance.

C.1 Random Intervention Policy

In Figure 4, we present the intervention performance of SCBM and baseline methods. Compared to the uncertainty-based intervention policy of Figure 2, the intervention curves of all methods are less steep, confirming the usefulness of Shin \BOthers. (\APACyear2023)’s proposed policy. Following the previous statements, SCBMs still outperform baseline methods with the amortized beating the global variant for real-world datasets. We observe that in CIFAR-10 for the first interventions, an improvement in concept accuracy is not directly reflected in improved target prediction for SCBMs, which is likely due to the low signal-to-noise ratio of the CLIP-inferred concepts.

C.2 Regularization Strength

In Figure 5, we analyze the impact of the strength of $\lambda_{2}$ from Equation 6. Due to environmental considerations, we conducted experiments using only 5 seeds and limited the number of interventions to 20. Our findings indicate that SCBMs are not sensitive to the choice of $\lambda_{2}$ , except that the unregularized amortized variant exhibits slight patterns of overfitting.

C.3 Intervention Strategy

In Figure 6, we analyze the effect of the intervention strategy. Our findings indicate that while SCBMs are still effective with the proposed strategy from Koh \BOthers. (\APACyear2020), that sets the logits to the 5th (if $c_{i}=0$ ) or 95th (if $c_{i}=1$ ) percentile of the training distribution, our proposed strategy based on the confidence region results in stronger intervenability.

C.4 Confidence Region Level

In Figure 7, we analyze the effect of the level $1-\alpha$ of the likelihood-based confidence region. Our findings indicate that the SCBMs are not sensitive to the choice of $1-\alpha$ , with higher levels being slightly better in performance.