Stochastic Concept Bottleneck Models

Moritz Vandenhirtz, Sonia Laguna11footnotemark: 1, Ričards Marcinkevičs, Julia E. Vogt
Department of Computer Science
ETH Zurich
Switzerland
Equal contribution. Correspondence to [email protected]
Abstract

Concept Bottleneck Models (CBMs) have emerged as a promising interpretable method whose final prediction is based on intermediate, human-understandable concepts rather than the raw input. Through time-consuming manual interventions, a user can correct wrongly predicted concept values to enhance the model’s downstream performance. We propose Stochastic Concept Bottleneck Models (SCBMs), a novel approach that models concept dependencies. In SCBMs, a single-concept intervention affects all correlated concepts, thereby improving intervention effectiveness. Unlike previous approaches that model the concept relations via an autoregressive structure, we introduce an explicit, distributional parameterization that allows SCBMs to retain the CBMs’ efficient training and inference procedure. Additionally, we leverage the parameterization to derive an effective intervention strategy based on the confidence region. We show empirically on synthetic tabular and natural image datasets that our approach improves intervention effectiveness significantly. Notably, we showcase the versatility and usability of SCBMs by examining a setting with CLIP-inferred concepts, alleviating the need for manual concept annotations.

1 Introduction

In today’s world, machine learning plays a crucial role in making important decisions, from healthcare to finance and law. However, as these algorithms become more complex, understanding how they arrive at their decisions becomes increasingly challenging. This lack of interpretability is a significant concern, especially in situations where trustworthiness, transparency, and accountability are paramount (Lipton, \APACyear2016; Doshi-Velez \BBA Kim, \APACyear2017). Recent studies have focused on Concept Bottleneck Models (CBMs) (Koh \BOthers., \APACyear2020; Havasi \BOthers., \APACyear2022; Shin \BOthers., \APACyear2023), a class of models that predict human-understandable concepts upon which the final target prediction is based. CBMs offer interpretability since a user can inspect the predicted concept values to understand how the model arrives at its final target prediction. Moreover, if they disagree with a concept prediction, they can intervene by adjusting it to the right value, which in turn affects the target prediction.

For example, consider the yellow warbler in Figure 1 (a), where a user might notice that the binary concept ‘yellow primary color’ is mispredicted. Upon this realization, they can intervene on the CBM by setting its value to 1111, which increases the probability of the class yellow warbler. This way of interacting allows any untrained user to engage with the model to increase its predictive performance.

However, if the user input is that the primary color is yellow, should not the likelihood of a yellow belly increase too? This adaptation would increase the predicted likelihood of the correct class even more, as yellow warblers are characterized by their fully yellow body. Currently, vanilla CBMs do not exhibit this behavior as they do not use the intervened-on concepts to update their remaining concept predictions. This indicates that they suboptimally adapt to the additional knowledge gained. To this end, we propose to extend the concept predictions with the modeling of their dependencies, as depicted in Figure 1 (a,c).

Refer to caption
Figure 1: Overview of the proposed method for the CUB dataset. (a) A user intervenes on the concept of ‘primary color: yellow’. Unlike CBMs, our method then uses this information to adjust the predicted probability of correlated concepts, thereby affecting the target prediction. (b) Schematic overview of the intervention procedure. A user’s intervention 𝒄𝒮subscriptsuperscript𝒄𝒮{\bm{c}}^{\prime}_{\mathcal{S}}bold_italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT is used to infer the logits 𝜼𝒮subscript𝜼𝒮\bm{\eta}_{\setminus\mathcal{S}}bold_italic_η start_POSTSUBSCRIPT ∖ caligraphic_S end_POSTSUBSCRIPT of the remaining concepts. (c) Visualization of the learned global dependency structure as a correlation matrix for the 112 concepts of CUB (Wah \BOthers., \APACyear2011). Characterization of concepts on the left.

The proposed approach captures the concept dependencies by modeling the concept logits with a learnable non-diagonal normal distribution, which enables efficient, scalable computing of the effect of interventions on other concepts. By integrating concept correlations, we reduce the time and effort of having to laboriously intervene on many correlated variables and increase the efficacy of interventions on the downstream prediction. Thanks to the explicit distributional assumptions, the model is trained end-to-end, retaining the training and inference speed of classic CBMs as well as the benefits of training the concept and target predictor jointly. Moreover, we show that our method excels when querying user interventions based on predicted concept uncertainty (Shin \BOthers., \APACyear2023), further highlighting the practical utility of our approach as such policies spare users from manually sifting through the concepts to identify necessary interventions. Lastly, based on the distributional concept parameterization, we propose a novel approach for computing dependency-aware interventions through the likelihood-based confidence region.

Contributions

This work contributes to the line of research on concept bottleneck models in several ways. (i) We propose to capture and model concept dependencies with a multivariate normal distribution. (ii) We derive a novel intervention strategy based on the confidence region of the normal distribution that incorporates concept correlations. Using the learned concept dependencies during the intervention procedure allows for stronger interventional effectiveness. (iii) We provide a thorough empirical assessment of the proposed method on synthetic tabular and natural image data. Additionally, we combine our method with concept discovery where we alleviate the need for annotations by using CLIP-inferred concepts. In particular, we show the proposed method (a) discovers meaningful, interpretable patterns in the form of concept dependencies, (b) allows for fast, scalable inference, and (c) outperforms related work with respect to intervention effectiveness thanks to the proposed concept modeling and intervention strategy.

2 Background & Related Work

Concept bottleneck models (Koh \BOthers., \APACyear2020; Lampert \BOthers., \APACyear2009; N. Kumar \BOthers., \APACyear2009) are typically trained on data points (𝒙,𝒄,y)𝒙𝒄𝑦\left({\bm{x}},{\bm{c}},y\right)( bold_italic_x , bold_italic_c , italic_y ), comprising the covariates 𝒙𝒳𝒙𝒳{\bm{x}}\in\mathcal{X}bold_italic_x ∈ caligraphic_X, target y𝒴𝑦𝒴y\in\mathcal{Y}italic_y ∈ caligraphic_Y, and C𝐶Citalic_C annotated binary concepts 𝒄𝒞𝒄𝒞{\bm{c}}\in\mathcal{C}bold_italic_c ∈ caligraphic_C. Consider a neural network f𝜽subscript𝑓𝜽f_{\bm{\theta}}italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT parameterized by 𝜽𝜽\bm{\theta}bold_italic_θ and a slice g𝝍,hϕsubscript𝑔𝝍subscriptbold-italic-ϕ\left\langle g_{\bm{\psi}},h_{\bm{\phi}}\right\rangle⟨ italic_g start_POSTSUBSCRIPT bold_italic_ψ end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ⟩ (Leino \BOthers., \APACyear2018) s.t. y^=f𝜽(𝒙)=g𝝍(hϕ(𝒙))\hat{y}\mathrel{\hbox to0.0pt{\raisebox{1.29167pt}{$\cdot$}\hss}\raisebox{-1.2% 9167pt}{$\cdot$}}=f_{\bm{\theta}}\left({\bm{x}}\right)=g_{\bm{\psi}}\left(h_{% \bm{\phi}}\left({\bm{x}}\right)\right)over^ start_ARG italic_y end_ARG ⋅⋅ = italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x ) = italic_g start_POSTSUBSCRIPT bold_italic_ψ end_POSTSUBSCRIPT ( italic_h start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_italic_x ) ). CBMs enforce a concept bottleneck 𝒄^=hϕ(𝒙)\bm{\hat{c}}\mathrel{\hbox to0.0pt{\raisebox{1.29167pt}{$\cdot$}\hss}\raisebox% {-1.29167pt}{$\cdot$}}=h_{\bm{\phi}}({\bm{x}})overbold_^ start_ARG bold_italic_c end_ARG ⋅⋅ = italic_h start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_italic_x ) such that the model’s final output depends on the covariates 𝒙𝒙{\bm{x}}bold_italic_x solely through the predicted concepts 𝒄^bold-^𝒄\bm{\hat{c}}overbold_^ start_ARG bold_italic_c end_ARG.

While Koh \BOthers. (\APACyear2020) propose the soft CBM, where the concept logits parameterize the bottleneck, Havasi \BOthers. (\APACyear2022) argue that such a representation leads to leakage, where additional unwanted information in the concept representation is used to predict the target (Margeloiu \BOthers., \APACyear2021; Mahinpei \BOthers., \APACyear2021). Thus, they parameterize the bottleneck by binarized concept predictions and call it the hard CBM. Then, Havasi \BOthers. (\APACyear2022) equip the hard CBM with an autoregressive structure of the form ci|𝒙,𝒄<iconditionalsubscript𝑐𝑖𝒙subscript𝒄absent𝑖c_{i}|{\bm{x}},{\bm{c}}_{<i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_italic_x , bold_italic_c start_POSTSUBSCRIPT < italic_i end_POSTSUBSCRIPT, which is supposed to learn the concept dependencies. As such, the implicit autoregressive modeling of concept dependencies by Havasi \BOthers. (\APACyear2022) is the most related to the current work. Complementary to our work, Heidemann \BOthers. (\APACyear2023) analyze how a CBM’s performance is affected by concept correlations. Unlike approaches that restrict the bottleneck to prevent leakage, Concept Embedding Models (CEM) (Espinosa Zarlenga \BOthers., \APACyear2022) represent each concept with a predicted embedding vector from which the concept probabilities can be inferred, treating the problem akin to a multi-task setting. E. Kim \BOthers. (\APACyear2023) model the embedding with a normal distribution, assuming a diagonal covariance matrix, which prevents them from capturing concept dependencies. Recent works explored how a CBM-like structure can be enforced even without a concept-annotated training set. Yuksekgonul \BOthers. (\APACyear2023) transform a pre-trained model into a CBM via a concept bank from concept activation vectors and multimodal models (B. Kim \BOthers., \APACyear2018), while Oikarinen \BOthers. (\APACyear2023) query GPT-3 (Brown \BOthers., \APACyear2020) for the concept set 𝒞𝒞\mathcal{C}caligraphic_C and assign the values of the concept activations to each datapoint 𝒙𝒙{\bm{x}}bold_italic_x with CLIP (Radford \BOthers., \APACyear2021) similarities. Marcinkevičs \BOthers. (\APACyear2024) instead relax the need for a concept labeled training set to a smaller validation set by fine-tuning a pre-trained model.

Intervenability (Marcinkevičs \BOthers., \APACyear2024) is a crucial element of CBMs as it allows the user to correct wrongly predicted concepts 𝒄^bold-^𝒄\bm{\hat{c}}overbold_^ start_ARG bold_italic_c end_ARG to 𝒄superscript𝒄{\bm{c}}^{\prime}bold_italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, which in turn affects the target prediction of the model y^superscript^𝑦\hat{y}^{\prime}over^ start_ARG italic_y end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. If multiple concepts are intervened on, then the order of interventions is important. To this end, Sheth \BOthers. (\APACyear2022) and Shin \BOthers. (\APACyear2023) explore multiple policies according to which the order of concepts is determined. Chauhan \BOthers. (\APACyear2023) propose to combine predefined policies with learnable weighting parameters, while Espinosa Zarlenga \BOthers. (\APACyear2024) learn the policy itself. Steinmann \BOthers. (\APACyear2023) argue that instance-specific interventions are costly and store previous interventions in a memory to automatically reapply them for similar data points. Lastly, Collins \BOthers. (\APACyear2023) explore the advantages of including uncertainty rather than treating humans as oracles.

Our work models concept dependencies by parameterizing the bottleneck with a distribution. In a similar vein, Variational Autoencoders (Kingma \BBA Welling, \APACyear2014) parameterize the bottleneck with a normal distribution to model and generate new data. Stochastic Segmentation Networks (Monteiro \BOthers., \APACyear2020) parameterize the logits of a segmentation map with a non-diagonal normal distribution to capture the spatial correlations of pixels and model the aleatoric uncertainty. The modeling of uncertainty with a distribution is also explored by Bayesian Neural Networks (Neal, \APACyear1995) that learn a probability distribution over the neurons of a neural network.

3 Methods

We propose Stochastic Concept Bottleneck Models111The code is available in an anonymized repository here: https://anonymous.4open.science/r/scbm-A1AA/. (SCBM), a novel concept-based method that relaxes the implicit CBM assumption of independent concepts. SCBM captures the concept dependencies by learning their multivariate distribution. As a result, interventions become more effective and scalable, as a single intervention can influence multiple correlated concepts. A schematic overview of the proposed method is depicted in Figure 1 (b).

3.1 Model Formulation

To capture the concept dependencies, we model the concept logits 𝜼𝜼\bm{\eta}bold_italic_η with a learned multivariate normal distribution. Modeling logits with a normal distribution has proven to be effective in the context of segmentation (Monteiro \BOthers., \APACyear2020). While Monteiro \BOthers. (\APACyear2020) use it to capture the spatial dependencies of pixels, we, instead, model the relations between concepts, where the properties of the normal distribution will prove useful. A neural network is trained to predict the distribution’s parameters 𝜼𝒙𝒩(𝝁(𝒙)),𝚺(𝒙))\bm{\eta}\mid{\bm{x}}\sim\mathcal{N}\left(\bm{\mu}({\bm{x}})),\bm{\Sigma}({\bm% {x}})\right)bold_italic_η ∣ bold_italic_x ∼ caligraphic_N ( bold_italic_μ ( bold_italic_x ) ) , bold_Σ ( bold_italic_x ) ), where 𝝁(𝒙)C𝝁𝒙superscript𝐶\bm{\mu}({\bm{x}})\in\mathbb{R}^{C}bold_italic_μ ( bold_italic_x ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT, and 𝚺(𝒙)C×C𝚺𝒙superscript𝐶𝐶\bm{\Sigma}({\bm{x}})\in\mathbb{R}^{C\times C}bold_Σ ( bold_italic_x ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_C × italic_C end_POSTSUPERSCRIPT. Thus, the traditional assumption of independent concepts cicj𝒙,ijc_{i}\perp\!\!\!\perp c_{j}\mid{\bm{x}},\ \ \forall i\neq jitalic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟂ ⟂ italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∣ bold_italic_x , ∀ italic_i ≠ italic_j is relaxed to cicj𝜼,ijc_{i}\perp\!\!\!\perp c_{j}\mid\bm{\eta},\ \ \forall i\neq jitalic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟂ ⟂ italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∣ bold_italic_η , ∀ italic_i ≠ italic_j, where the assumed normal distribution induces linear concept dependencies. The inductive bias of linearity is useful in practice as it is more robust to overfitting and computationally more scalable with respect to C𝐶Citalic_C compared to its nonlinear alternative (Havasi \BOthers., \APACyear2022), as we will show in Section 5.

To learn the distribution, we minimize the negative log-likelihood

logp(𝒄𝒙)=logp(𝒄𝜼)pϕ(𝜼𝒙)𝑑𝜼,𝑝conditional𝒄𝒙𝑝conditional𝒄𝜼subscript𝑝bold-italic-ϕconditional𝜼𝒙differential-d𝜼-\log p({\bm{c}}\mid{\bm{x}})=-\log\int p({\bm{c}}\mid\bm{\eta})p_{\bm{\phi}}(% \bm{\eta}\mid\bm{x})d\bm{\eta},- roman_log italic_p ( bold_italic_c ∣ bold_italic_x ) = - roman_log ∫ italic_p ( bold_italic_c ∣ bold_italic_η ) italic_p start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_italic_η ∣ bold_italic_x ) italic_d bold_italic_η , (1)

where ϕbold-italic-ϕ\bm{\phi}bold_italic_ϕ are the parameters of a neural network that predicts the distribution 𝜼𝒙𝒩(𝝁(𝒙)),𝚺(𝒙))\bm{\eta}\mid{\bm{x}}\sim\mathcal{N}\left(\bm{\mu}({\bm{x}})),\bm{\Sigma}({\bm% {x}})\right)bold_italic_η ∣ bold_italic_x ∼ caligraphic_N ( bold_italic_μ ( bold_italic_x ) ) , bold_Σ ( bold_italic_x ) ). This integral is intractable due to the softmax operation applied in p(𝒄𝜼)𝑝conditional𝒄𝜼p({\bm{c}}\mid\bm{\eta})italic_p ( bold_italic_c ∣ bold_italic_η ). Thus, the integral is approximated by M𝑀Mitalic_M Monte-Carlo samples

logp(𝒄𝜼)pϕ(𝜼𝒙)d𝜼log1Mm=1Mp(𝒄𝜼(m)),𝜼(m)𝒙𝒩(𝝁(𝒙)),𝚺(𝒙)).-\log\int p({\bm{c}}\mid\bm{\eta})p_{\bm{\phi}}(\bm{\eta}\mid\bm{x})d\bm{\eta}% \approx-\log\frac{1}{M}\sum_{m=1}^{M}p({\bm{c}}\mid\bm{\eta}^{(m)}),\quad\bm{% \eta}^{(m)}\mid{\bm{x}}\sim\mathcal{N}\left(\bm{\mu}({\bm{x}})),\bm{\Sigma}({% \bm{x}})\right).- roman_log ∫ italic_p ( bold_italic_c ∣ bold_italic_η ) italic_p start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_italic_η ∣ bold_italic_x ) italic_d bold_italic_η ≈ - roman_log divide start_ARG 1 end_ARG start_ARG italic_M end_ARG ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_p ( bold_italic_c ∣ bold_italic_η start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ) , bold_italic_η start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ∣ bold_italic_x ∼ caligraphic_N ( bold_italic_μ ( bold_italic_x ) ) , bold_Σ ( bold_italic_x ) ) . (2)

In order to learn ϕbold-italic-ϕ\bm{\phi}bold_italic_ϕ, we make use of the parameterization as normal distribution and employ the reparameterization trick 𝜼(m)𝒙=𝝁(𝒙)+𝐋(𝒙)ϵ(m),𝐋(𝒙)𝐋(𝒙)T=𝚺(𝒙),ϵ(m)𝒩(𝟎,𝑰)formulae-sequenceconditionalsuperscript𝜼𝑚𝒙𝝁𝒙𝐋𝒙superscriptbold-italic-ϵ𝑚formulae-sequence𝐋𝒙𝐋superscript𝒙𝑇𝚺𝒙similar-tosuperscriptbold-italic-ϵ𝑚𝒩0𝑰\bm{\eta}^{(m)}\mid{\bm{x}}=\bm{\mu}({\bm{x}})+\mathbf{L}({\bm{x}})\bm{% \epsilon}^{(m)},\quad\mathbf{L}({\bm{x}})\mathbf{L}({\bm{x}})^{T}=\bm{\Sigma}(% {\bm{x}}),\quad\bm{\epsilon}^{(m)}\sim\mathcal{N}\left(\bm{0},\bm{I}\right)bold_italic_η start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ∣ bold_italic_x = bold_italic_μ ( bold_italic_x ) + bold_L ( bold_italic_x ) bold_italic_ϵ start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT , bold_L ( bold_italic_x ) bold_L ( bold_italic_x ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = bold_Σ ( bold_italic_x ) , bold_italic_ϵ start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ∼ caligraphic_N ( bold_0 , bold_italic_I ) such that gradients can be computed with respect to the parameters. Lastly, we incorporate the new relaxed conditional independence assumption

logp(𝒄𝜼)=logi=1Cp(ciηi)=i=1Clogp(ciηi),𝑝conditional𝒄𝜼superscriptsubscriptproduct𝑖1𝐶𝑝conditionalsubscript𝑐𝑖subscript𝜂𝑖superscriptsubscript𝑖1𝐶𝑝conditionalsubscript𝑐𝑖subscript𝜂𝑖\log p({\bm{c}}\mid\bm{\eta})=\log\prod_{i=1}^{C}p(c_{i}\mid\eta_{i})=\sum_{i=% 1}^{C}\log p(c_{i}\mid\eta_{i}),roman_log italic_p ( bold_italic_c ∣ bold_italic_η ) = roman_log ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT italic_p ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT roman_log italic_p ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , (3)

where p(ciηi)𝑝conditionalsubscript𝑐𝑖subscript𝜂𝑖p(c_{i}\mid\eta_{i})italic_p ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) describes a Bernoulli distribution parameterized by the sigmoid-transformed logits σ(ηi)𝜎subscript𝜂𝑖\sigma(\eta_{i})italic_σ ( italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). Combining the above considerations results in the following reformulation of the negative log-likelihood:

logp(𝒄𝒙)log1Mm=1Mp(𝒄𝜼(m))logm=1Mexpi=1Clogp(ciηi(m))=logm=1Mexpi=1C[BCE(ci,σ(ηi(m)))],𝑝conditional𝒄𝒙1𝑀superscriptsubscript𝑚1𝑀𝑝conditional𝒄superscript𝜼𝑚proportional-tosuperscriptsubscript𝑚1𝑀superscriptsubscript𝑖1𝐶𝑝conditionalsubscript𝑐𝑖superscriptsubscript𝜂𝑖𝑚superscriptsubscript𝑚1𝑀superscriptsubscript𝑖1𝐶delimited-[]BCEsubscript𝑐𝑖𝜎superscriptsubscript𝜂𝑖𝑚\displaystyle\begin{split}-\log p({\bm{c}}\mid{\bm{x}})\approx&-\log\frac{1}{M% }\sum_{m=1}^{M}p({\bm{c}}\mid\bm{\eta}^{(m)})\\ \propto&-\log\sum_{m=1}^{M}\exp\sum_{i=1}^{C}\log p(c_{i}\mid\eta_{i}^{(m)})\\ =&-\log\sum_{m=1}^{M}\exp\sum_{i=1}^{C}\left[-\mathrm{BCE}(c_{i},\sigma(\eta_{% i}^{(m)}))\right],\end{split}start_ROW start_CELL - roman_log italic_p ( bold_italic_c ∣ bold_italic_x ) ≈ end_CELL start_CELL - roman_log divide start_ARG 1 end_ARG start_ARG italic_M end_ARG ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_p ( bold_italic_c ∣ bold_italic_η start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL ∝ end_CELL start_CELL - roman_log ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT roman_exp ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT roman_log italic_p ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL = end_CELL start_CELL - roman_log ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT roman_exp ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT [ - roman_BCE ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_σ ( italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ) ) ] , end_CELL end_ROW (4)

where BCE stands for Binary Cross Entropy, and the logsumexp trick is used for numerical stability.

The distribution-based modeling procedure allows for efficient sampling, thus, enabling SCBM to train concept and target predictors jointly, sequentially, or independently. In contrast, the autoregressive alternative (Havasi \BOthers., \APACyear2022) requires independent training due to the computational complexity. We adopt a joint training scheme to obtain the benefits of end-to-end learning where concept and target predictors can adjust to each other. To prevent leakage, we follow Havasi \BOthers. (\APACyear2022) and train the model with the hard {0,1}01\{0,1\}{ 0 , 1 } concept values as bottleneck rather than the logits used in the original CBM (Koh \BOthers., \APACyear2020). To this end, we employ the straight-through Gumbel-Softmax trick (Jang \BOthers., \APACyear2017; Maddison \BOthers., \APACyear2017) that approximates Bernoulli samples while being differentiable. The target predictor g𝝍subscript𝑔𝝍g_{\bm{\psi}}italic_g start_POSTSUBSCRIPT bold_italic_ψ end_POSTSUBSCRIPT is then learned by minimizing the negative log-likelihood

logp(y𝒙)=log𝒄𝒞p𝝍(y𝒄)p(𝒄𝒙)log1Mm=1Mp𝝍(y𝒄(m)),𝒄(m)p(𝒄𝒙).\displaystyle\begin{split}-\log p(y\mid{\bm{x}})=&-\log\sum_{{\bm{c}}\in% \mathcal{C}}p_{\bm{\psi}}(y\mid{\bm{c}})p({\bm{c}}\mid{\bm{x}})\\ \approx&-\log\frac{1}{M}\sum_{m=1}^{M}p_{\bm{\psi}}(y\mid{\bm{c}}^{(m)}),% \qquad{\bm{c}}^{(m)}\sim p({\bm{c}}\mid{\bm{x}}).\end{split}start_ROW start_CELL - roman_log italic_p ( italic_y ∣ bold_italic_x ) = end_CELL start_CELL - roman_log ∑ start_POSTSUBSCRIPT bold_italic_c ∈ caligraphic_C end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT bold_italic_ψ end_POSTSUBSCRIPT ( italic_y ∣ bold_italic_c ) italic_p ( bold_italic_c ∣ bold_italic_x ) end_CELL end_ROW start_ROW start_CELL ≈ end_CELL start_CELL - roman_log divide start_ARG 1 end_ARG start_ARG italic_M end_ARG ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT bold_italic_ψ end_POSTSUBSCRIPT ( italic_y ∣ bold_italic_c start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ) , bold_italic_c start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ∼ italic_p ( bold_italic_c ∣ bold_italic_x ) . end_CELL end_ROW (5)

Lastly, the learned dependencies are regularized by following Occam’s razor and to prevent overfitting. We take inspiration from the Graphical Lasso (Friedman \BOthers., \APACyear2008) and penalize the off-diagonal elements of the precision matrix 𝚺1superscript𝚺1\bm{\Sigma}^{-1}bold_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT.

By combining concept, target, and precision loss with weighting factors λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, we arrive at the final loss function

logm=1Mexpi=1CBCE(ci,σ(ηi(m)))+λ1CE(y,1Mm=1Mg𝝍(𝒄(m)))+λ2ij𝚺(𝒙)i,j1.superscriptsubscript𝑚1𝑀superscriptsubscript𝑖1𝐶BCEsubscript𝑐𝑖𝜎superscriptsubscript𝜂𝑖𝑚subscript𝜆1CE𝑦1𝑀superscriptsubscript𝑚1𝑀subscript𝑔𝝍superscript𝒄𝑚subscript𝜆2subscript𝑖𝑗𝚺subscriptsuperscript𝒙1𝑖𝑗-\log\sum_{m=1}^{M}\exp\sum_{i=1}^{C}-\mathrm{BCE}\left(c_{i},\sigma(\eta_{i}^% {(m)})\right)+\lambda_{1}\mathrm{CE}\left(y,\frac{1}{M}\sum_{m=1}^{M}g_{\bm{% \psi}}({\bm{c}}^{(m)})\right)+\lambda_{2}\sum_{i\neq j}\bm{\Sigma}({\bm{x}})^{% -1}_{i,j}.- roman_log ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT roman_exp ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT - roman_BCE ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_σ ( italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ) ) + italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_CE ( italic_y , divide start_ARG 1 end_ARG start_ARG italic_M end_ARG ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT bold_italic_ψ end_POSTSUBSCRIPT ( bold_italic_c start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ) ) + italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i ≠ italic_j end_POSTSUBSCRIPT bold_Σ ( bold_italic_x ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT . (6)

3.2 Covariance Learning

The introduced amortized covariance matrix 𝚺(𝒙)𝚺𝒙\bm{\Sigma}({\bm{x}})bold_Σ ( bold_italic_x ) provides the flexibility to tailor its predicted concept dependencies to each data point, making it adaptable to many data-generating mechanisms. For example, in the commonly used CUB (Wah \BOthers., \APACyear2011; Koh \BOthers., \APACyear2020), it can learn the class-wise concept structure present in the dataset. The explicit dependency representation inferred by the learned covariance matrix is useful as it provides insights into the learned correlations among the concepts, which is important for understanding and interpreting the model behavior.

However, an amortized covariance matrix comes at the price of not being able to visualize and interpret a unified concept structure on a dataset level. Depending on the need of the application, such a global structure might be preferable. Thus, we propose a variation of SCBM, where the covariance matrix is not amortized (𝚺(𝒙)𝚺𝒙\bm{\Sigma}({\bm{x}})bold_Σ ( bold_italic_x )), but learned globally (𝚺𝚺\bm{\Sigma}bold_Σ). An example of the global concept structure learned on CUB is shown in Figure 1 (c). This variation has the inductive bias of assuming a constant covariance matrix, whose utility depends on the underlying data-generating mechanism. We recommend using the more flexible, amortized version by default and only utilizing a global covariance if the strong assumption of fixed dependencies is reasonable. We will explore this empirically in more detail in Section 5.

3.3 Interventions

A distinguishing property of CBM-like methods is the user’s capacity to correct wrongly predicted concepts, which in turn affects the target prediction (Marcinkevičs \BOthers., \APACyear2024). For a big concept set, this intervention procedure can become quite laborious as a user has to inspect and manually intervene on each concept separately. SCBMs are designed to alleviate this need by utilizing the learned concept dependencies such that a single intervention affects all related concepts as modeled by the multivariate normal distribution.

The parameterization as a multivariate normal distribution allows for a quick, scalable intervention procedure. Given a set 𝒮{1,,C}𝒮1𝐶\mathcal{S}\subset\{1,\ldots,C\}caligraphic_S ⊂ { 1 , … , italic_C } of concept interventions, the effect on the remaining concepts 𝒄𝒮subscript𝒄𝒮{\bm{c}}_{\setminus\mathcal{S}}bold_italic_c start_POSTSUBSCRIPT ∖ caligraphic_S end_POSTSUBSCRIPT is computed via their logits 𝜼𝒮subscript𝜼𝒮\bm{\eta}_{\setminus\mathcal{S}}bold_italic_η start_POSTSUBSCRIPT ∖ caligraphic_S end_POSTSUBSCRIPT by conditioning on the intervention logits 𝜼𝒮superscriptsubscript𝜼𝒮\bm{\eta}_{\mathcal{S}}^{\prime}bold_italic_η start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, utilizing the known properties of the normal distribution

𝜼𝒮𝒙,𝜼𝒮𝒩(𝝁¯(𝒙),𝚺¯(𝒙)),𝝁¯=𝝁𝒮+𝚺𝒮,𝒮𝚺𝒮,𝒮1(𝜼𝒮𝝁𝒮),𝚺¯=𝚺𝒮,𝒮𝚺𝒮,𝒮𝚺𝒮,𝒮1𝚺𝒮,𝒮.formulae-sequencesimilar-toconditionalsubscript𝜼𝒮𝒙subscriptsuperscript𝜼𝒮𝒩bold-¯𝝁𝒙bold-¯𝚺𝒙formulae-sequencebold-¯𝝁subscript𝝁𝒮subscript𝚺𝒮𝒮superscriptsubscript𝚺𝒮𝒮1subscriptsuperscript𝜼𝒮subscript𝝁𝒮bold-¯𝚺subscript𝚺𝒮𝒮subscript𝚺𝒮𝒮superscriptsubscript𝚺𝒮𝒮1subscript𝚺𝒮𝒮\displaystyle\begin{split}\bm{\eta}_{\setminus\mathcal{S}}\mid{\bm{x}},\bm{% \eta}^{\prime}_{\mathcal{S}}&\sim\mathcal{N}\left(\bm{\bar{\mu}}({\bm{x}}),\bm% {\overline{\Sigma}}({\bm{x}})\right),\\ \bm{\bar{\mu}}&=\bm{\mu}_{\setminus\mathcal{S}}+\bm{\Sigma}_{\setminus\mathcal% {S},\mathcal{S}}\bm{\Sigma}_{\mathcal{S},\mathcal{S}}^{-1}(\bm{\eta}^{\prime}_% {\mathcal{S}}-\bm{\mu}_{\mathcal{S}}),\\ \bm{\overline{\Sigma}}&=\bm{\Sigma}_{\setminus\mathcal{S},\setminus\mathcal{S}% }-\bm{\Sigma}_{\setminus\mathcal{S},\mathcal{S}}\bm{\Sigma}_{\mathcal{S},% \mathcal{S}}^{-1}\bm{\Sigma}_{\mathcal{S},\setminus\mathcal{S}}.\end{split}start_ROW start_CELL bold_italic_η start_POSTSUBSCRIPT ∖ caligraphic_S end_POSTSUBSCRIPT ∣ bold_italic_x , bold_italic_η start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_CELL start_CELL ∼ caligraphic_N ( overbold_¯ start_ARG bold_italic_μ end_ARG ( bold_italic_x ) , overbold_¯ start_ARG bold_Σ end_ARG ( bold_italic_x ) ) , end_CELL end_ROW start_ROW start_CELL overbold_¯ start_ARG bold_italic_μ end_ARG end_CELL start_CELL = bold_italic_μ start_POSTSUBSCRIPT ∖ caligraphic_S end_POSTSUBSCRIPT + bold_Σ start_POSTSUBSCRIPT ∖ caligraphic_S , caligraphic_S end_POSTSUBSCRIPT bold_Σ start_POSTSUBSCRIPT caligraphic_S , caligraphic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_η start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT - bold_italic_μ start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL overbold_¯ start_ARG bold_Σ end_ARG end_CELL start_CELL = bold_Σ start_POSTSUBSCRIPT ∖ caligraphic_S , ∖ caligraphic_S end_POSTSUBSCRIPT - bold_Σ start_POSTSUBSCRIPT ∖ caligraphic_S , caligraphic_S end_POSTSUBSCRIPT bold_Σ start_POSTSUBSCRIPT caligraphic_S , caligraphic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT caligraphic_S , ∖ caligraphic_S end_POSTSUBSCRIPT . end_CELL end_ROW (7)

For a standard CBM (Koh \BOthers., \APACyear2020), ηisuperscriptsubscript𝜂𝑖\eta_{i}^{\prime}italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are set to the 5th (if ci=0subscript𝑐𝑖0c_{i}=0italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0) or 95th (if ci=1subscript𝑐𝑖1c_{i}=1italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1) percentile of the training distribution. Although this strategy is effective for SCBMs, see Appendix C.3, it presents certain limitations that result in a suboptimal intervention performance when interventions affect other concepts. For example, if the initially predicted μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT was more extreme than the selected training percentile, the interventional shift guided by ηiμisubscriptsuperscript𝜂𝑖subscript𝜇𝑖\eta^{\prime}_{i}-\mu_{i}italic_η start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT would point in the wrong direction. This, in turn, would cause 𝜼𝒮subscript𝜼𝒮\bm{\eta}_{\setminus\mathcal{S}}bold_italic_η start_POSTSUBSCRIPT ∖ caligraphic_S end_POSTSUBSCRIPT to shift incorrectly. Thus, we pose the desideratum that an appropriate intervention strategy should determine ηisuperscriptsubscript𝜂𝑖\eta_{i}^{\prime}italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT such that ηiμi0superscriptsubscript𝜂𝑖subscript𝜇𝑖0\eta_{i}^{\prime}-\mu_{i}\geq 0italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 0 if ci=1subscript𝑐𝑖1c_{i}=1italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1, and ηiμi0superscriptsubscript𝜂𝑖subscript𝜇𝑖0\eta_{i}^{\prime}-\mu_{i}\leq 0italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ 0 if ci=0subscript𝑐𝑖0c_{i}=0italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0. Additionally, ηiμisuperscriptsubscript𝜂𝑖subscript𝜇𝑖\eta_{i}^{\prime}-\mu_{i}italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT should not be “too large” as to avoid that the intervention completely disregards the predicted 𝝁𝒮subscript𝝁𝒮\bm{\mu}_{\setminus\mathcal{S}}bold_italic_μ start_POSTSUBSCRIPT ∖ caligraphic_S end_POSTSUBSCRIPT.

Here manifests an additional benefit of the explicit distributional representation: the likelihood-based confidence region222A confidence region is the multivariate generalization of a confidence interval. provides a natural way of capturing the region of possible 𝜼𝒮superscriptsubscript𝜼𝒮\bm{\eta}_{\mathcal{S}}^{\prime}bold_italic_η start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT that fulfil our desiderata. Note that the confidence region takes concept dependencies into account when describing the area of possible 𝜼𝒮superscriptsubscript𝜼𝒮\bm{\eta}_{\mathcal{S}}^{\prime}bold_italic_η start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. To determine the specific point within this region, we search for the values 𝜼𝒮superscriptsubscript𝜼𝒮\bm{\eta}_{\mathcal{S}}^{\prime}bold_italic_η start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, which maximize the log-likelihood of the known, intervened-on concepts 𝒄𝒮subscript𝒄𝒮{\bm{c}}_{\mathcal{S}}bold_italic_c start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT, implicitly focusing on concepts that the model predicts poorly.

𝜼𝒮=argmax𝜼𝒮logp(𝒄𝒮𝜼𝒮)s.t.2(logp(𝜼𝒮𝝁𝒮,𝚺𝒮,𝒮)logp(𝝁𝒮𝝁𝒮,𝚺𝒮,𝒮))χd,1α2ηiμi0 if ci=1,i𝒮ηiμi0 if ci=0,i𝒮,\displaystyle\begin{split}\bm{\eta}_{\mathcal{S}}^{\prime}=\operatorname*{arg% \,max}_{\bm{\eta}_{\mathcal{S}}}&\log p({\bm{c}}_{\mathcal{S}}\mid\bm{\eta}_{% \mathcal{S}})\\ \operatorname*{s.\!t.}&-2\left(\log p(\bm{\eta}_{\mathcal{S}}\mid\bm{\mu}_{% \mathcal{S}},\bm{\Sigma}_{\mathcal{S},\mathcal{S}})-\log p(\bm{\mu}_{\mathcal{% S}}\mid\bm{\mu}_{\mathcal{S}},\bm{\Sigma}_{\mathcal{S},\mathcal{S}})\right)% \leq\chi^{2}_{d,1-\alpha}\\ &\eta_{i}^{\prime}-\mu_{i}\geq 0\text{ if }c_{i}=1,\quad\forall i\in\mathcal{S% }\\ &\eta_{i}^{\prime}-\mu_{i}\leq 0\text{ if }c_{i}=0,\quad\forall i\in\mathcal{S% },\end{split}start_ROW start_CELL bold_italic_η start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = start_OPERATOR roman_arg roman_max end_OPERATOR start_POSTSUBSCRIPT bold_italic_η start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_CELL start_CELL roman_log italic_p ( bold_italic_c start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ∣ bold_italic_η start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL roman_s . roman_t . end_CELL start_CELL - 2 ( roman_log italic_p ( bold_italic_η start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ∣ bold_italic_μ start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT , bold_Σ start_POSTSUBSCRIPT caligraphic_S , caligraphic_S end_POSTSUBSCRIPT ) - roman_log italic_p ( bold_italic_μ start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ∣ bold_italic_μ start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT , bold_Σ start_POSTSUBSCRIPT caligraphic_S , caligraphic_S end_POSTSUBSCRIPT ) ) ≤ italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d , 1 - italic_α end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 0 if italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 , ∀ italic_i ∈ caligraphic_S end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ 0 if italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 , ∀ italic_i ∈ caligraphic_S , end_CELL end_ROW (8)

where d=|𝒮|𝑑𝒮d=|\mathcal{S}|italic_d = | caligraphic_S |. The first inequality describes the confidence region. It is based on the logarithm of the likelihood ratio, which, after multiplying with 22-2- 2, asymptotically follows a χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT distribution (Silvey, \APACyear1975). The last two inequalities restrict the region to the desired direction. Note that 𝜼𝒮superscriptsubscript𝜼𝒮\bm{\eta}_{\mathcal{S}}^{\prime}bold_italic_η start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is computed to determine the conditional effect of the interventions on 𝜼𝒮subscript𝜼𝒮\bm{\eta}_{\setminus\mathcal{S}}bold_italic_η start_POSTSUBSCRIPT ∖ caligraphic_S end_POSTSUBSCRIPT using Equation 7. When predicting y^superscript^𝑦\hat{y}^{\prime}over^ start_ARG italic_y end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT under interventions, the logits 𝜼𝒮subscript𝜼𝒮\bm{\eta}_{\setminus\mathcal{S}}bold_italic_η start_POSTSUBSCRIPT ∖ caligraphic_S end_POSTSUBSCRIPT are then used for sampling the binary concept values 𝒄𝒮subscript𝒄𝒮{\bm{c}}_{\setminus\mathcal{S}}bold_italic_c start_POSTSUBSCRIPT ∖ caligraphic_S end_POSTSUBSCRIPT while the intervened-on concepts 𝒄𝒮subscriptsuperscript𝒄𝒮{\bm{c}}^{\prime}_{\mathcal{S}}bold_italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT are directly set to their known, binary value.

4 Experimental Setup

Datasets and Evaluation

We perform experiments on a variety of datasets to showcase the validity of our method. Inspired by Marcinkevičs \BOthers. (\APACyear2024), we introduce a synthetic tabular dataset with a data-generating mechanism that contains fixed concept dependencies we can regulate. In particular, the concept logits 𝜼𝜼\bm{\eta}bold_italic_η are sampled from a randomly initialized positive definite covariance matrix and generate 𝒙𝒙{\bm{x}}bold_italic_x. Binary concept values 𝒄𝒄{\bm{c}}bold_italic_c are inferred from 𝜼𝜼\bm{\eta}bold_italic_η and generate the target y𝑦yitalic_y. We refer to Appendix A.1 for a more detailed description.

As a natural image classification benchmark, we evaluate the Caltech-UCSD Birds-200-2011 dataset (Wah \BOthers., \APACyear2011), comprised of bird photographs from 200 distinct classes. It includes 112 concepts, such as wing color and beak shape, shared across the same class instances as revised in the original CBM work (Koh \BOthers., \APACyear2020). Additionally, we explore another natural image classification task on CIFAR-10 (Krizhevsky \BOthers., \APACyear2009) with 10 classes. To mitigate the concept annotations requirement, the concepts are synthetically acquired in a similar fashion to the concept discovery literature. We adopt the 143 concept classes generated via GPT-3 (Brown \BOthers., \APACyear2020) in prior work (Oikarinen \BOthers., \APACyear2023). To obtain the binary concept values, we use the CLIP model (Radford \BOthers., \APACyear2021) to compute the similarity between each instance of an image with the text embedding of a specific concept and compare it to the similarity of its negative counterpart, i.e. not the concept. Appendix A.2 contains further details about the natural image datasets.

To compare methods, we evaluate the model performance based on the concept and target accuracy. We compute test performance before and after intervening on an increasing number of concepts. The order of concepts in the intervention is determined by an uncertainty-based policy (Shin \BOthers., \APACyear2023) that selects the concept whose predicted probability is closest to 0.50.50.50.5. We also show results for a random policy in Appendix C.1. Additionally, we evaluate the calibration of the predicted concept uncertainties that are being used for the uncertainty-based policy, with the Brier score (Brier, \APACyear1950) and the Expected Calibration Error (Naeini \BOthers., \APACyear2015; A. Kumar \BOthers., \APACyear2019).

Baselines

We evaluate the performance of our method in comparison with state-of-the-art models. Namely, we focus on the vanilla concept bottleneck model (CBM) by Koh \BOthers. (\APACyear2020) in its hard version (Havasi \BOthers., \APACyear2022), trained jointly using the straight-through Gumbel-Softmax trick (Jang \BOthers., \APACyear2017; Maddison \BOthers., \APACyear2017), as a sensical baseline to our binary modeling of concepts. Additionally, we explore the concept embedding model (CEM) by Espinosa Zarlenga \BOthers. (\APACyear2022) that learns two concept embeddings, 𝒄^i+superscriptsubscriptbold-^𝒄𝑖\bm{\hat{c}}_{i}^{+}overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT and 𝒄^isuperscriptsubscriptbold-^𝒄𝑖\bm{\hat{c}}_{i}^{-}overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT. These representations are used to predict the final concept probability with a learnable scoring function p^i=s(𝒄^i+,𝒄^i)=σ(𝐖s[𝒄^i+,𝒄^i]T+𝐛s)subscript^𝑝𝑖𝑠superscriptsubscriptbold-^𝒄𝑖superscriptsubscriptbold-^𝒄𝑖𝜎subscript𝐖𝑠superscriptsuperscriptsubscriptbold-^𝒄𝑖superscriptsubscriptbold-^𝒄𝑖𝑇subscript𝐛𝑠\hat{p}_{i}=s(\bm{\hat{c}}_{i}^{+},\bm{\hat{c}}_{i}^{-})=\sigma(\mathbf{W}_{s}% [\bm{\hat{c}}_{i}^{+},\bm{\hat{c}}_{i}^{-}]^{T}+\mathbf{b}_{s})over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_s ( overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT , overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) = italic_σ ( bold_W start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT [ overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT , overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + bold_b start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) and are then combined on a final concept embedding 𝒄^i=(p^i𝒄^i++(1p^i)𝒄^i)subscriptbold-^𝒄𝑖subscript^𝑝𝑖superscriptsubscriptbold-^𝒄𝑖1subscript^𝑝𝑖superscriptsubscriptbold-^𝒄𝑖\bm{\hat{c}}_{i}=(\hat{p}_{i}\bm{\hat{c}}_{i}^{+}+(1-\hat{p}_{i})\bm{\hat{c}}_% {i}^{-})overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT + ( 1 - over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) that is passed to the target predictor. Interventions are modeled by altering the concept probabilities p^isubscript^𝑝𝑖\hat{p}_{i}over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Finally, we evaluate the autoregressive CBM structure proposed by Havasi \BOthers. (\APACyear2022), where concept dependencies are learned with an autoregressive structure. Here, each concept cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is predicted with a separate MLP that takes as input a shared latent representation of the input f𝜽(𝒙)subscript𝑓𝜽𝒙f_{\bm{\theta}}({\bm{x}})italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x ) and all previous concepts c1,,ci1subscript𝑐1subscript𝑐𝑖1c_{1},...,c_{i-1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_c start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT. To obtain a good initialization of the autoregressive structure, it is pretrained for 50 epochs. As the Monte-Carlo sampling from the autoregressive structure is time-consuming, the target predictor g𝝍subscript𝑔𝝍g_{\bm{\psi}}italic_g start_POSTSUBSCRIPT bold_italic_ψ end_POSTSUBSCRIPT is trained independently using the ground-truth concepts as input. At intervention time, a normalized importance sampling algorithm is used to estimate the concept distribution.

Implementation Details

The model architectures are comprised of a backbone for concept prediction followed by a linear layer as head for an interpretable target prediction. Precise details and training configurations follow in Appendix B. To ensure the positive definiteness of the concept covariance matrix 𝚺𝚺\bm{\Sigma}bold_Σ, we parameterize it via its Cholesky decomposition 𝚺=𝑳𝑳𝚺𝑳superscript𝑳top\bm{\Sigma}={\bm{L}}{\bm{L}}^{\top}bold_Σ = bold_italic_L bold_italic_L start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT. Thus, we solely predict the lower triangular Cholesky matrix 𝑳𝑳{\bm{L}}bold_italic_L. We will evaluate two options for SCBMs: using a global (𝚺𝚺\bm{\Sigma}bold_Σ) or an amortized covariance matrix (𝚺(𝒙))𝚺𝒙(\bm{\Sigma}({\bm{x}}))( bold_Σ ( bold_italic_x ) ). For the amortized version, we set the weighting terms λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT of Equation 6 to 1. For the global version, we initialize it with the estimated empirical covariance matrix and set λ2=0subscript𝜆20\lambda_{2}=0italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0, as we did not observe big differences when varying λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. In Appendix C.2, we provide an ablation study, demonstrating that SCBMs are not very sensitive to the choice of λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. At intervention time, we solve the optimization problem based on the 99%percent9999\%99 %-confidence region with the SLSQP algorithm (Kraft, \APACyear1988). In Appendix C.4, we provide an ablation with different confidence levels.

Table 1: Test-set concept and target accuracy (%) prior to interventions. Results are reported as averages and standard deviations of model performance across ten seeds. For each dataset and metric, the best-performing method is bolded and the runner-up is underlined.
Dataset Method Concept Accuracy Target Accuracy
Hard CBM 61.42 ±plus-or-minus\pm± 0.07 58.38 ±plus-or-minus\pm± 0.39
CEM 61.42 ±plus-or-minus\pm± 0.12 58.01 ±plus-or-minus\pm± 0.49
Synthetic Autoregressive CBM 62.17 ±plus-or-minus\pm± 0.11 59.60 ±plus-or-minus\pm± 0.62
Global SCBM 61.57 ±plus-or-minus\pm± 0.05 58.39 ±plus-or-minus\pm± 0.53
Amortized SCBM 62.41 ±plus-or-minus\pm± 0.20 58.96 ±plus-or-minus\pm± 0.38
Hard CBM 94.97 ±plus-or-minus\pm± 0.07 67.72 ±plus-or-minus\pm± 0.57
CEM 95.12 ±plus-or-minus\pm± 0.07 69.60 ±plus-or-minus\pm± 0.30
CUB Autoregressive CBM 95.33 ±plus-or-minus\pm± 0.07 69.24 ±plus-or-minus\pm± 0.44
Global SCBM 94.99 ±plus-or-minus\pm± 0.09 68.19 ±plus-or-minus\pm± 0.63
Amortized SCBM 95.22 ±plus-or-minus\pm± 0.09 69.87 ±plus-or-minus\pm± 0.56
Hard CBM 85.51 ±plus-or-minus\pm± 0.04 69.73 ±plus-or-minus\pm± 0.29
CEM 85.12 ±plus-or-minus\pm± 0.14 72.24 ±plus-or-minus\pm± 0.33
CIFAR-10 Autoregressive CBM 85.31 ±plus-or-minus\pm± 0.06 68.88 ±plus-or-minus\pm± 0.47
Global SCBM 85.86 ±plus-or-minus\pm± 0.04 70.74 ±plus-or-minus\pm± 0.29
Amortized SCBM 86.00 ±plus-or-minus\pm± 0.03 71.66 ±plus-or-minus\pm± 0.25

5 Results

Test performance

In Table 1, we report the results of the concept and target accuracy prior to interventions. Overall, SCBM performs on par with the baseline methods, with no clear outperforming or underperforming technique throughout the datasets. This shows that the additional overhead of learning the concept dependencies does not negatively affect the predictive performance. We note that the amortized covariance variant consistently surpasses the globally learned matrix due to its ability to adjust the predicted concept dependency structure and uncertainty on an instance level. On the other hand, the global variant offers a unified understanding of the concept correlations, an example of which is presented in Figure 1 (c).

Table 2: Relative time it takes for one epoch in the CUB dataset when training on the training set, or evaluating on the test set, respectively.
Method Training Inference
Hard CBM 5x 1x
CEM 5x 1x
Autoregressive CBM 5x 14x
Global SCBM 5x 1x
Amortized SCBM 5x 1x

Notably, in CIFAR-10, even though the concept performance of CEM is the worst of all methods, it has the best target performance. This might suggest the presence of leakage in CEM’s embeddings, as in CIFAR-10, the concept set alone is not sufficient to predict the target, and learning additional information might be useful. In Table 2, we show the time it takes for training and testing of the methods. Here, it is evident that the autoregressive CBM of Havasi \BOthers. (\APACyear2022) suffers from a slow sampling process due to its autoregressive structure, while SCBMs retain the efficiency of CBMs.

Refer to caption
Refer to caption
(a) Synthetic
Refer to caption
Refer to caption
(b) CUB
Refer to caption
Refer to caption
(c) CIFAR-10
Refer to caption
Figure 2: Performance after intervening on concepts in the order of highest predicted uncertainty. Concept and target accuracy (%) are shown in the first and second rows, respectively. Results are reported as averages and standard deviations of model performance across ten seeds.

Interventions

In this paragraph, we analyze the intervention performance of SCBMs and their baseline models, focusing on their effectiveness in modeling concept dependencies and improving target accuracy. Figure 2 shows the intervention curves across ten seeds, where the performance is measured based on the concept and target accuracy. The order of concepts to intervene on is determined by an uncertainty-based policy that makes use of the predicted probabilities. In Appendix C.1, we present the intervention performance if concepts were selected randomly. The intervention curves in the first row show that SCBMs are superior in modeling the concept dependencies, as evidenced by their significantly steeper intervention curves compared to the baseline methods. Furthermore, the second row of Figure 2 indicates that the strong concept modeling translates to a significant improvement in downstream performance, partly thanks to the intervention strategy introduced in Section 3.3. We note that especially for the most practical scenario of only a small number of interventions, SCBMs outperform their counterparts. Comparing the SCBM variants, the natural image datasets show an overall better intervention performance with the amortized covariance matrix, following the trend of Table 1, as it can capture the instance-wise correlation structure of the data. Only in the synthetic dataset, where the data-generating covariance matrix is fixed, does the global SCBM slightly outperform the amortized one. Thus, we advocate for the usage of the global variant only if the underlying assumption of a fixed covariance is reasonable. Lastly, the success of SCBMs on CIFAR-10, with CLIP-based concepts, shows our proposed method can work without human-annotated concepts.

Analyzing the performance of the autoregressive CBM, which also captures concept dependencies, we observe that they expectedly have a better intervention performance than the hard vanilla CBM, which does not take correlations into account. However, it becomes evident that, compared to the concept performance of SCBMs, their autoregressive structure does not capture the dependencies to the full extent. This shows in the target accuracy, where they only match or outperform SCBMs towards the full set of intervened concepts. We attribute the better performance on the full intervention set to the independent training procedure utilized by autoregressive CBMs, which comes at the cost of lower test performance in CIFAR-10. Arguably, in a realistic use-case scenario, such a high number of instance-level interventions is not sensible, and if it were, SCBMs could also be trained independently. Finally, the CEM shows reduced intervention performance as the expressive concept embeddings, which are prone to information leakage, seem to suboptimally adapt to the injected concept information.

Table 3: Test-set calibration (%) of concept predictions. Results are reported as averages and standard deviations of model performance across ten seeds. For each dataset and metric, the best-performing method is bolded and the runner-up is underlined. Lower is better.
Dataset Method Brier ECE
Hard CBM 28.79 ±plus-or-minus\pm± 0.09 22.38 ±plus-or-minus\pm± 0.15
CEM 29.32 ±plus-or-minus\pm± 0.08 23.55 ±plus-or-minus\pm± 0.09
Synthetic Autoregressive CBM 24.84 ±plus-or-minus\pm± 0.32 13.54 ±plus-or-minus\pm± 0.49
Global SCBM 27.73 ±plus-or-minus\pm± 0.09 20.10 ±plus-or-minus\pm± 0.14
Amortized SCBM 25.58 ±plus-or-minus\pm± 0.20 15.57 ±plus-or-minus\pm± 0.55
Hard CBM 3.93 ±plus-or-minus\pm± 0.05 2.44 ±plus-or-minus\pm± 0.06
CEM 4.04 ±plus-or-minus\pm± 0.05 3.25 ±plus-or-minus\pm± 0.07
CUB Autoregressive CBM 3.75 ±plus-or-minus\pm± 0.05 2.73 ±plus-or-minus\pm± 0.05
Global SCBM 3.87 ±plus-or-minus\pm± 0.06 2.33 ±plus-or-minus\pm± 0.09
Amortized SCBM 3.64 ±plus-or-minus\pm± 0.07 1.85 ±plus-or-minus\pm± 0.08
Hard CBM 10.42 ±plus-or-minus\pm± 0.05 4.93 ±plus-or-minus\pm± 0.17
CEM 11.06 ±plus-or-minus\pm± 0.16 7.11 ±plus-or-minus\pm± 0.39
CIFAR-10 Autoregressive CBM 10.70 ±plus-or-minus\pm± 0.05 6.07 ±plus-or-minus\pm± 0.10
Global SCBM 9.95 ±plus-or-minus\pm± 0.02 2.88 ±plus-or-minus\pm± 0.11
Amortized SCBM 9.84 ±plus-or-minus\pm± 0.02 2.22 ±plus-or-minus\pm± 0.12
Figure 3: Intervention performance of SCBMs measured in concept and target accuracy (%) on CUB for random and uncertainty-based policy.
Refer to caption
Refer to caption
Refer to caption

Modeling the concept distribution

A cornerstone of SCBMs is the explicit, distributional parameterization of concepts. This helps in understanding the data correlations and allows for visualization, as the example seen in Figure 1 (c). The explicit probabilistic modeling results in improved concept uncertainty estimates compared to the baseline CBM counterparts, as shown in Table 3, where lower metrics imply better estimates. This proves useful for interventions, where the uncertainty estimates can be leveraged for the choice of concept to intervene on, improving the target prediction more effectively and reducing the need for manual user inspection. In Figure 3, we compare the performance of randomly intervening versus intervening based on the predicted uncertainty. We observe that there is a big gap between the two policies, indicating the usefulness of the estimated probabilities. Nevertheless, note that intervening at random remains successful and supports the observations made in the previous paragraph, as shown in Appendix C.1.

6 Conclusion

In this paper, we introduced SCBMs, a new concept-based method that models concept dependencies with a multivariate normal distribution. We proposed a novel, effective intervention strategy that takes concept correlations into account and is based on the confidence region inferred from the distributional parameterization. We showed that our modeling approach retains CBMs’ training and inference speed, thus, being able to harness the benefits of end-to-end concept and target training. Additionally, the explicit parameterization offers the user a clearer understanding of the learned concept dependencies, providing deeper insights into how predictions and interventions are made. Empirically, we demonstrated that by modeling the concept dependencies, SCBMs offer a substantial improvement in intervention effectiveness, in concept as well as target accuracy, compared to related work. We showed that our method excels when iteratively intervening on the most uncertain concept predictions, sparing users from having to manually search through the concept set to identify necessary interventions. Additionally, our results indicate that learning the concept correlations does not decrease performance prior to interventions, in many cases even improving the performance over the baselines. Finally, the versatility of SCBMs is highlighted through their superior performance on CIFAR-10, where concept values are CLIP-based rather than human-annotated.

Limitations & Future Work

This work opens multiple new research avenues. A natural extension is to go beyond binary concepts, such as continuous domains with their corresponding adaptations of modeling the concept distribution. Additionally, addressing the quadratic memory complexity of the covariance matrix is essential for scaling to larger concept sets. Current interventions focus on editing the concept values. However, this work allows the editing of the learned dependency structure by adjusting the entries of the predicted covariance matrix, which could be explored. Lastly, to model additional information and reduce leakage, Koh \BOthers. (\APACyear2020); Havasi \BOthers. (\APACyear2022) propose the adoption of a side channel. The complementary effectiveness of incorporating the side channel in the covariance structure could be explored in the context of SCBMs.

References

  • Ansel \BOthers. (\APACyear2024) \APACinsertmetastaransel2024pytorch{APACrefauthors}Ansel, J., Yang, E., He, H., Gimelshein, N., Jain, A., Voznesensky, M.\BDBLothers  \APACrefYearMonthDay2024. \BBOQ\APACrefatitlePyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation Pytorch 2: Faster machine learning through dynamic python bytecode transformation and graph compilation.\BBCQ \BIn \APACrefbtitleProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 Proceedings of the 29th acm international conference on architectural support for programming languages and operating systems, volume 2 (\BPGS 929–947). \PrintBackRefs\CurrentBib
  • Brier (\APACyear1950) \APACinsertmetastarbrier1950verification{APACrefauthors}Brier, G\BPBIW.  \APACrefYearMonthDay1950. \BBOQ\APACrefatitleVerification of forecasts expressed in terms of probability Verification of forecasts expressed in terms of probability.\BBCQ \APACjournalVolNumPagesMonthly weather review7811–3. {APACrefURL} https://doi.org/10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2 \PrintBackRefs\CurrentBib
  • Brown \BOthers. (\APACyear2020) \APACinsertmetastarbrown2020language{APACrefauthors}Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J\BPBID., Dhariwal, P.\BDBLothers  \APACrefYearMonthDay2020. \BBOQ\APACrefatitleLanguage models are few-shot learners Language models are few-shot learners.\BBCQ \APACjournalVolNumPagesAdvances in neural information processing systems331877–1901. \PrintBackRefs\CurrentBib
  • Chauhan \BOthers. (\APACyear2023) \APACinsertmetastarchauhan2023interactive{APACrefauthors}Chauhan, K., Tiwari, R., Freyberg, J., Shenoy, P.\BCBL \BBA Dvijotham, K.  \APACrefYearMonthDay2023. \BBOQ\APACrefatitleInteractive concept bottleneck models Interactive concept bottleneck models.\BBCQ \BIn \APACrefbtitleProceedings of the AAAI Conference on Artificial Intelligence Proceedings of the aaai conference on artificial intelligence (\BVOL 37, \BPGS 5948–5955). \PrintBackRefs\CurrentBib
  • Collins \BOthers. (\APACyear2023) \APACinsertmetastarCollins2023{APACrefauthors}Collins, K\BPBIM., Barker, M., Zarlenga, M\BPBIE., Raman, N., Bhatt, U., Jamnik, M.\BDBLDvijotham, K.  \APACrefYearMonthDay2023. \BBOQ\APACrefatitleHuman Uncertainty in Concept-Based AI Systems Human uncertainty in concept-based AI systems.\BBCQ \BIn F. Rossi, S. Das, J. Davis, K. Firth-Butterfield\BCBL \BBA A. John (\BEDS), \APACrefbtitleProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, AIES 2023, Montréal, QC, Canada, August 8-10, 2023 Proceedings of the 2023 AAAI/ACM conference on ai, ethics, and society, AIES 2023, montréal, qc, canada, august 8-10, 2023 (\BPGS 869–889). \APACaddressPublisherACM. \PrintBackRefs\CurrentBib
  • Doshi-Velez \BBA Kim (\APACyear2017) \APACinsertmetastardoshiRigorousScienceInterpretable2017{APACrefauthors}Doshi-Velez, F.\BCBT \BBA Kim, B.  \APACrefYearMonthDay2017\APACmonth03. \APACrefbtitleTowards A Rigorous Science of Interpretable Machine Learning Towards A Rigorous Science of Interpretable Machine Learning (\BNUM arXiv:1702.08608). \APACaddressPublisherarXiv. {APACrefDOI} \doi10.48550/arXiv.1702.08608 \PrintBackRefs\CurrentBib
  • Espinosa Zarlenga \BOthers. (\APACyear2022) \APACinsertmetastarespinosa2022concept{APACrefauthors}Espinosa Zarlenga, M., Barbiero, P., Ciravegna, G., Marra, G., Giannini, F., Diligenti, M.\BDBLothers  \APACrefYearMonthDay2022. \BBOQ\APACrefatitleConcept embedding models: Beyond the accuracy-explainability trade-off Concept embedding models: Beyond the accuracy-explainability trade-off.\BBCQ \BIn \APACrefbtitleAdvances in Neural Information Processing Systems Advances in neural information processing systems (\BVOL 35, \BPGS 21400–21413). \PrintBackRefs\CurrentBib
  • Espinosa Zarlenga \BOthers. (\APACyear2024) \APACinsertmetastarespinosa2024learning{APACrefauthors}Espinosa Zarlenga, M., Collins, K., Dvijotham, K., Weller, A., Shams, Z.\BCBL \BBA Jamnik, M.  \APACrefYearMonthDay2024. \BBOQ\APACrefatitleLearning to Receive Help: Intervention-Aware Concept Embedding Models Learning to receive help: Intervention-aware concept embedding models.\BBCQ \APACjournalVolNumPagesAdvances in Neural Information Processing Systems36. \PrintBackRefs\CurrentBib
  • Friedman \BOthers. (\APACyear2008) \APACinsertmetastarfriedman2008sparse{APACrefauthors}Friedman, J., Hastie, T.\BCBL \BBA Tibshirani, R.  \APACrefYearMonthDay2008. \BBOQ\APACrefatitleSparse inverse covariance estimation with the graphical lasso Sparse inverse covariance estimation with the graphical lasso.\BBCQ \APACjournalVolNumPagesBiostatistics93432–441. \PrintBackRefs\CurrentBib
  • Havasi \BOthers. (\APACyear2022) \APACinsertmetastarHavasi2022{APACrefauthors}Havasi, M., Parbhoo, S.\BCBL \BBA Doshi-Velez, F.  \APACrefYearMonthDay2022. \BBOQ\APACrefatitleAddressing Leakage in Concept Bottleneck Models Addressing leakage in concept bottleneck models.\BBCQ \BIn A\BPBIH. Oh, A. Agarwal, D. Belgrave\BCBL \BBA K. Cho (\BEDS), \APACrefbtitleAdvances in Neural Information Processing Systems. Advances in neural information processing systems. {APACrefURL} https://openreview.net/forum?id=tglniD_fn9 \PrintBackRefs\CurrentBib
  • He \BOthers. (\APACyear2016) \APACinsertmetastarhe2016deep{APACrefauthors}He, K., Zhang, X., Ren, S.\BCBL \BBA Sun, J.  \APACrefYearMonthDay2016. \BBOQ\APACrefatitleDeep residual learning for image recognition Deep residual learning for image recognition.\BBCQ \BIn \APACrefbtitleProceedings of the IEEE conference on computer vision and pattern recognition Proceedings of the ieee conference on computer vision and pattern recognition (\BPGS 770–778). \PrintBackRefs\CurrentBib
  • Heidemann \BOthers. (\APACyear2023) \APACinsertmetastarheidemann2023concept{APACrefauthors}Heidemann, L., Monnet, M.\BCBL \BBA Roscher, K.  \APACrefYearMonthDay2023. \BBOQ\APACrefatitleConcept correlation and its effects on concept-based models Concept correlation and its effects on concept-based models.\BBCQ \BIn \APACrefbtitleProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Proceedings of the ieee/cvf winter conference on applications of computer vision (\BPGS 4780–4788). \PrintBackRefs\CurrentBib
  • Jang \BOthers. (\APACyear2017) \APACinsertmetastarGumbel{APACrefauthors}Jang, E., Gu, S.\BCBL \BBA Poole, B.  \APACrefYearMonthDay2017. \BBOQ\APACrefatitleCategorical Reparameterization with Gumbel-Softmax Categorical reparameterization with gumbel-softmax.\BBCQ \BIn \APACrefbtitle5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. 5th international conference on learning representations, ICLR 2017, toulon, france, april 24-26, 2017, conference track proceedings. \APACaddressPublisherOpenReview.net. {APACrefURL} https://openreview.net/forum?id=rkE3y85ee \PrintBackRefs\CurrentBib
  • B. Kim \BOthers. (\APACyear2018) \APACinsertmetastarKim2018{APACrefauthors}Kim, B., Wattenberg, M., Gilmer, J., Cai, C., Wexler, J., Viegas, F.\BCBL \BBA Sayres, R.  \APACrefYearMonthDay2018. \BBOQ\APACrefatitleInterpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV).\BBCQ \BIn J. Dy \BBA A. Krause (\BEDS), \APACrefbtitleProceedings of the 35th International Conference on Machine Learning Proceedings of the 35th international conference on machine learning (\BVOL 80, \BPGS 2668–2677). \APACaddressPublisherPMLR. {APACrefURL} https://proceedings.mlr.press/v80/kim18d.html \PrintBackRefs\CurrentBib
  • E. Kim \BOthers. (\APACyear2023) \APACinsertmetastarkim2023probabilistic{APACrefauthors}Kim, E., Jung, D., Park, S., Kim, S.\BCBL \BBA Yoon, S.  \APACrefYearMonthDay2023. \BBOQ\APACrefatitleProbabilistic Concept Bottleneck Models Probabilistic concept bottleneck models.\BBCQ \BIn A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato\BCBL \BBA J. Scarlett (\BEDS), \APACrefbtitleProceedings of the 40th International Conference on Machine Learning Proceedings of the 40th international conference on machine learning (\BVOL 202, \BPGS 16521–16540). \APACaddressPublisherPMLR. {APACrefURL} https://proceedings.mlr.press/v202/kim23g.html \PrintBackRefs\CurrentBib
  • Kingma \BBA Ba (\APACyear2015) \APACinsertmetastarkingma2014adam{APACrefauthors}Kingma, D\BPBIP.\BCBT \BBA Ba, J.  \APACrefYearMonthDay2015. \BBOQ\APACrefatitleAdam: A Method for Stochastic Optimization Adam: A method for stochastic optimization.\BBCQ \BIn Y. Bengio \BBA Y. LeCun (\BEDS), \APACrefbtitle3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. 3rd international conference on learning representations, ICLR 2015, san diego, ca, usa, may 7-9, 2015, conference track proceedings. {APACrefURL} http://arxiv.longhoe.net/abs/1412.6980 \PrintBackRefs\CurrentBib
  • Kingma \BBA Welling (\APACyear2014) \APACinsertmetastarkingma2013auto{APACrefauthors}Kingma, D\BPBIP.\BCBT \BBA Welling, M.  \APACrefYearMonthDay2014. \BBOQ\APACrefatitleAuto-Encoding Variational Bayes Auto-encoding variational bayes.\BBCQ \BIn Y. Bengio \BBA Y. LeCun (\BEDS), \APACrefbtitle2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings. 2nd international conference on learning representations, ICLR 2014, banff, ab, canada, april 14-16, 2014, conference track proceedings. {APACrefURL} http://arxiv.longhoe.net/abs/1312.6114 \PrintBackRefs\CurrentBib
  • Koh \BOthers. (\APACyear2020) \APACinsertmetastarKoh2020{APACrefauthors}Koh, P\BPBIW., Nguyen, T., Tang, Y\BPBIS., Mussmann, S., Pierson, E., Kim, B.\BCBL \BBA Liang, P.  \APACrefYearMonthDay2020. \BBOQ\APACrefatitleConcept Bottleneck Models Concept bottleneck models.\BBCQ \BIn H\BPBID. III \BBA A. Singh (\BEDS), \APACrefbtitleProceedings of the 37th International Conference on Machine Learning Proceedings of the 37th international conference on machine learning (\BVOL 119, \BPGS 5338–5348). \APACaddressPublisherVirtualPMLR. {APACrefURL} https://proceedings.mlr.press/v119/koh20a.html \PrintBackRefs\CurrentBib
  • Kraft (\APACyear1988) \APACinsertmetastarkraft1988software{APACrefauthors}Kraft, D.  \APACrefYearMonthDay1988. \BBOQ\APACrefatitleA software package for sequential quadratic programming A software package for sequential quadratic programming.\BBCQ \APACjournalVolNumPagesForschungsbericht- Deutsche Forschungs- und Versuchsanstalt fur Luft- und Raumfahrt. \PrintBackRefs\CurrentBib
  • Krizhevsky \BOthers. (\APACyear2009) \APACinsertmetastarkrizhevsky2009learning{APACrefauthors}Krizhevsky, A., Hinton, G.\BCBL \BOthersPeriod \APACrefYearMonthDay2009. \BBOQ\APACrefatitleLearning multiple layers of features from tiny images Learning multiple layers of features from tiny images.\BBCQ \PrintBackRefs\CurrentBib
  • A. Kumar \BOthers. (\APACyear2019) \APACinsertmetastarkumar2019verified{APACrefauthors}Kumar, A., Liang, P\BPBIS.\BCBL \BBA Ma, T.  \APACrefYearMonthDay2019. \BBOQ\APACrefatitleVerified uncertainty calibration Verified uncertainty calibration.\BBCQ \APACjournalVolNumPagesAdvances in Neural Information Processing Systems32. \PrintBackRefs\CurrentBib
  • N. Kumar \BOthers. (\APACyear2009) \APACinsertmetastarKumar2009{APACrefauthors}Kumar, N., Berg, A\BPBIC., Belhumeur, P\BPBIN.\BCBL \BBA Nayar, S\BPBIK.  \APACrefYearMonthDay2009. \BBOQ\APACrefatitleAttribute and simile classifiers for face verification Attribute and simile classifiers for face verification.\BBCQ \BIn \APACrefbtitle2009 IEEE 12th International Conference on Computer Vision 2009 ieee 12th international conference on computer vision (\BPGS 365–372). \APACaddressPublisherKyoto, JapanIEEE. {APACrefURL} https://doi.org/10.1109/ICCV.2009.5459250 \PrintBackRefs\CurrentBib
  • Lampert \BOthers. (\APACyear2009) \APACinsertmetastarLampert2009{APACrefauthors}Lampert, C\BPBIH., Nickisch, H.\BCBL \BBA Harmeling, S.  \APACrefYearMonthDay2009. \BBOQ\APACrefatitleLearning to detect unseen object classes by between-class attribute transfer Learning to detect unseen object classes by between-class attribute transfer.\BBCQ \BIn \APACrefbtitle2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009 IEEE conference on computer vision and pattern recognition. \APACaddressPublisherMiami, FL, USAIEEE. {APACrefURL} https://doi.org/10.1109/CVPR.2009.5206594 \PrintBackRefs\CurrentBib
  • Leino \BOthers. (\APACyear2018) \APACinsertmetastarLeino2018{APACrefauthors}Leino, K., Sen, S., Datta, A., Fredrikson, M.\BCBL \BBA Li, L.  \APACrefYearMonthDay2018. \BBOQ\APACrefatitleInfluence-Directed Explanations for Deep Convolutional Networks Influence-directed explanations for deep convolutional networks.\BBCQ \BIn \APACrefbtitle2018 IEEE International Test Conference (ITC). 2018 IEEE international test conference (ITC). \APACaddressPublisherIEEE. {APACrefURL} https://doi.org/10.1109/test.2018.8624792 \PrintBackRefs\CurrentBib
  • Lipton (\APACyear2016) \APACinsertmetastarliptonMythosModelInterpretability2016{APACrefauthors}Lipton, Z\BPBIC.  \APACrefYearMonthDay2016\APACmonth06. \BBOQ\APACrefatitleThe Mythos of Model Interpretability The Mythos of Model Interpretability.\BBCQ \APACjournalVolNumPagesCommunications of the ACM611035–43. {APACrefDOI} \doi10.48550/arxiv.1606.03490 \PrintBackRefs\CurrentBib
  • Maddison \BOthers. (\APACyear2017) \APACinsertmetastarconcrete{APACrefauthors}Maddison, C\BPBIJ., Mnih, A.\BCBL \BBA Teh, Y\BPBIW.  \APACrefYearMonthDay2017. \BBOQ\APACrefatitleThe Concrete Distribution: A Continuous Relaxation of Discrete Random Variables The concrete distribution: A continuous relaxation of discrete random variables.\BBCQ \BIn \APACrefbtitle5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. 5th international conference on learning representations, ICLR 2017, toulon, france, april 24-26, 2017, conference track proceedings. \APACaddressPublisherOpenReview.net. {APACrefURL} https://openreview.net/forum?id=S1jE5L5gl \PrintBackRefs\CurrentBib
  • Mahinpei \BOthers. (\APACyear2021) \APACinsertmetastarMahinpei2021{APACrefauthors}Mahinpei, A., Clark, J., Lage, I., Doshi-Velez, F.\BCBL \BBA Pan, W.  \APACrefYearMonthDay2021. \APACrefbtitlePromises and Pitfalls of Black-Box Concept Learning Models. Promises and pitfalls of black-box concept learning models. {APACrefURL} https://doi.org/10.48550/arXiv.2106.13314 \APACrefnotearXiv:2106.13314 \PrintBackRefs\CurrentBib
  • Marcinkevičs \BOthers. (\APACyear2024) \APACinsertmetastarmarcinkevivcs2024beyond{APACrefauthors}Marcinkevičs, R., Laguna, S., Vandenhirtz, M.\BCBL \BBA Vogt, J\BPBIE.  \APACrefYearMonthDay2024. \BBOQ\APACrefatitleBeyond Concept Bottleneck Models: How to Make Black Boxes Intervenable? Beyond concept bottleneck models: How to make black boxes intervenable?\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2401.13544. \PrintBackRefs\CurrentBib
  • Marcinkevičs \BOthers. (\APACyear2024) \APACinsertmetastarMarcinkevics2023{APACrefauthors}Marcinkevičs, R., Reis Wolfertstetter, P., Klimiene, U., Chin-Cheong, K., Paschke, A., Zerres, J.\BDBLVogt, J\BPBIE.  \APACrefYearMonthDay2024. \BBOQ\APACrefatitleInterpretable and intervenable ultrasonography-based machine learning models for pediatric appendicitis Interpretable and intervenable ultrasonography-based machine learning models for pediatric appendicitis.\BBCQ \APACjournalVolNumPagesMedical Image Analysis91103042. {APACrefURL} https://www.sciencedirect.com/science/article/pii/S136184152300302X \PrintBackRefs\CurrentBib
  • Margeloiu \BOthers. (\APACyear2021) \APACinsertmetastarMargeloiu2021{APACrefauthors}Margeloiu, A., Ashman, M., Bhatt, U., Chen, Y., Jamnik, M.\BCBL \BBA Weller, A.  \APACrefYearMonthDay2021. \APACrefbtitleDo Concept Bottleneck Models Learn as Intended? Do concept bottleneck models learn as intended? {APACrefURL} https://doi.org/10.48550/arXiv.2105.04289 \APACrefnotearXiv:2105.04289 \PrintBackRefs\CurrentBib
  • Monteiro \BOthers. (\APACyear2020) \APACinsertmetastarmonteiro2020stochastic{APACrefauthors}Monteiro, M., Le Folgoc, L., Coelho de Castro, D., Pawlowski, N., Marques, B., Kamnitsas, K.\BDBLGlocker, B.  \APACrefYearMonthDay2020. \BBOQ\APACrefatitleStochastic segmentation networks: Modelling spatially correlated aleatoric uncertainty Stochastic segmentation networks: Modelling spatially correlated aleatoric uncertainty.\BBCQ \BIn \APACrefbtitleAdvances in neural information processing systems Advances in neural information processing systems (\BVOL 33, \BPGS 12756–12767). \PrintBackRefs\CurrentBib
  • Naeini \BOthers. (\APACyear2015) \APACinsertmetastarnaeini2015obtaining{APACrefauthors}Naeini, M\BPBIP., Cooper, G.\BCBL \BBA Hauskrecht, M.  \APACrefYearMonthDay2015. \BBOQ\APACrefatitleObtaining well calibrated probabilities using bayesian binning Obtaining well calibrated probabilities using bayesian binning.\BBCQ \BIn \APACrefbtitleProceedings of the AAAI conference on artificial intelligence Proceedings of the aaai conference on artificial intelligence (\BVOL 29). \PrintBackRefs\CurrentBib
  • Neal (\APACyear1995) \APACinsertmetastarneal2012bayesian{APACrefauthors}Neal, R\BPBIM.  \APACrefYear1995.   \APACrefbtitleBayesian learning for neural networks Bayesian learning for neural networks \APACtypeAddressSchool\BPhDUniversity of Toronto, Canada.   {APACrefURL} https://librarysearch.library.utoronto.ca/permalink/01UTORONTO_INST/14bjeso/alma991106438365706196 \PrintBackRefs\CurrentBib
  • Oikarinen \BOthers. (\APACyear2023) \APACinsertmetastaroikarinen2023label{APACrefauthors}Oikarinen, T., Das, S., Nguyen, L\BPBIM.\BCBL \BBA Weng, T\BHBIW.  \APACrefYearMonthDay2023. \BBOQ\APACrefatitleLabel-free Concept Bottleneck Models Label-free concept bottleneck models.\BBCQ \BIn \APACrefbtitleThe 11th International Conference on Learning Representations. The 11th international conference on learning representations. {APACrefURL} https://openreview.net/forum?id=FlCg47MNvBA \PrintBackRefs\CurrentBib
  • Radford \BOthers. (\APACyear2021) \APACinsertmetastarradford2021learning{APACrefauthors}Radford, A., Kim, J\BPBIW., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S.\BDBLothers  \APACrefYearMonthDay2021. \BBOQ\APACrefatitleLearning transferable visual models from natural language supervision Learning transferable visual models from natural language supervision.\BBCQ \BIn \APACrefbtitleInternational conference on machine learning International conference on machine learning (\BPGS 8748–8763). \PrintBackRefs\CurrentBib
  • Sheth \BOthers. (\APACyear2022) \APACinsertmetastarSheth2022{APACrefauthors}Sheth, I., Rahman, A\BPBIA., Sevyeri, L\BPBIR., Havaei, M.\BCBL \BBA Kahou, S\BPBIE.  \APACrefYearMonthDay2022. \BBOQ\APACrefatitleLearning from uncertain concepts via test time interventions Learning from uncertain concepts via test time interventions.\BBCQ \BIn \APACrefbtitleWorkshop on Trustworthy and Socially Responsible Machine Learning, NeurIPS 2022. Workshop on trustworthy and socially responsible machine learning, neurips 2022. {APACrefURL} https://openreview.net/forum?id=WVe3vok8Cc3 \PrintBackRefs\CurrentBib
  • Shin \BOthers. (\APACyear2023) \APACinsertmetastarShin2023{APACrefauthors}Shin, S., Jo, Y., Ahn, S.\BCBL \BBA Lee, N.  \APACrefYearMonthDay2023. \BBOQ\APACrefatitleA Closer Look at the Intervention Procedure of Concept Bottleneck Models A closer look at the intervention procedure of concept bottleneck models.\BBCQ \BIn A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato\BCBL \BBA J. Scarlett (\BEDS), \APACrefbtitleProceedings of the 40th International Conference on Machine Learning Proceedings of the 40th international conference on machine learning (\BVOL 202, \BPGS 31504–31520). \APACaddressPublisherPMLR. {APACrefURL} https://proceedings.mlr.press/v202/shin23a.html \PrintBackRefs\CurrentBib
  • Silvey (\APACyear1975) \APACinsertmetastarsilvey1975statistical{APACrefauthors}Silvey, S.  \APACrefYear1975. \APACrefbtitleStatistical Inference Statistical inference. \APACaddressPublisherTaylor & Francis. {APACrefURL} https://books.google.ch/books?id=qIKLejbVMf4C \PrintBackRefs\CurrentBib
  • Steinmann \BOthers. (\APACyear2023) \APACinsertmetastarSteinmann2023{APACrefauthors}Steinmann, D., Stammer, W., Friedrich, F.\BCBL \BBA Kersting, K.  \APACrefYearMonthDay2023. \APACrefbtitleLearning to Intervene on Concept Bottlenecks. Learning to intervene on concept bottlenecks. {APACrefURL} https://doi.org/10.48550/arXiv.2308.13453 \APACrefnotearXiv:2308.13453 \PrintBackRefs\CurrentBib
  • Wah \BOthers. (\APACyear2011) \APACinsertmetastarwah2011caltech{APACrefauthors}Wah, C., Branson, S., Welinder, P., Perona, P.\BCBL \BBA Belongie, S.  \APACrefYearMonthDay2011. \BBOQ\APACrefatitleThe caltech-ucsd birds-200-2011 dataset The caltech-ucsd birds-200-2011 dataset.\BBCQ \PrintBackRefs\CurrentBib
  • Yuksekgonul \BOthers. (\APACyear2023) \APACinsertmetastarYuksekgonul2023{APACrefauthors}Yuksekgonul, M., Wang, M.\BCBL \BBA Zou, J.  \APACrefYearMonthDay2023. \BBOQ\APACrefatitlePost-hoc Concept Bottleneck Models Post-hoc concept bottleneck models.\BBCQ \BIn \APACrefbtitleThe 11th International Conference on Learning Representations. The 11th international conference on learning representations. {APACrefURL} https://openreview.net/forum?id=nA5AZ8CEyow \PrintBackRefs\CurrentBib

Appendix A Dataset Details

In this section, we provide additional details on the datasets that are being used in the experiments.

A.1 Synthetic Data-Generating Mechanism

Here, we describe the data-generating mechanism of the synthetic dataset in more detail. Let N𝑁Nitalic_N, p𝑝pitalic_p, and C𝐶Citalic_C denote the number of independent data points {(𝒙n,𝒄n,yn)}n=1Nsuperscriptsubscriptsubscript𝒙𝑛subscript𝒄𝑛subscript𝑦𝑛𝑛1𝑁\left\{\left({\bm{x}}_{n},{\bm{c}}_{n},y_{n}\right)\right\}_{n=1}^{N}{ ( bold_italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , bold_italic_c start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT, covariates, and concepts, respectively. We set N=50𝑁50N=50italic_N = 50,000000000000, p=1𝑝1p=1italic_p = 1,500500500500, and C=100𝐶100C=100italic_C = 100, with a 60%-20%-20% train-validation-test split. The generative process is as follows:

  1. 1.

    Randomly sample 𝑾C×10𝑾superscript𝐶10{\bm{W}}\in\mathbb{R}^{C\times 10}bold_italic_W ∈ blackboard_R start_POSTSUPERSCRIPT italic_C × 10 end_POSTSUPERSCRIPT s.t. wi,j𝒩(0,1)similar-tosubscript𝑤𝑖𝑗𝒩01w_{i,j}\sim\mathcal{N}(0,1)italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , 1 ) for 1iC1𝑖𝐶1\leq i\leq C1 ≤ italic_i ≤ italic_C and 1j101𝑗101\leq j\leq 101 ≤ italic_j ≤ 10.

  2. 2.

    Generate a positive definite matrix 𝚺C×C𝚺superscript𝐶𝐶{\bm{\Sigma}}\in\mathbb{R}^{C\times C}bold_Σ ∈ blackboard_R start_POSTSUPERSCRIPT italic_C × italic_C end_POSTSUPERSCRIPT s.t. 𝚺=𝑾𝑾T+𝑫𝚺𝑾superscript𝑾𝑇𝑫{\bm{\Sigma}}={\bm{W}}{\bm{W}}^{T}+{\bm{D}}bold_Σ = bold_italic_W bold_italic_W start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + bold_italic_D. Let 𝑫C×C𝑫superscript𝐶𝐶{\bm{D}}\in\mathbb{R}^{C\times C}bold_italic_D ∈ blackboard_R start_POSTSUPERSCRIPT italic_C × italic_C end_POSTSUPERSCRIPT s.t. 𝑫=𝜹𝑰𝑫𝜹𝑰{\bm{D}}=\bm{\delta}{\bm{I}}bold_italic_D = bold_italic_δ bold_italic_I, where δi𝒰[0,1]similar-tosubscript𝛿𝑖subscript𝒰01\delta_{i}\sim{\displaystyle{\mathcal{U}}_{[0,1]}}italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ caligraphic_U start_POSTSUBSCRIPT [ 0 , 1 ] end_POSTSUBSCRIPT for 1iC1𝑖𝐶1\leq i\leq C1 ≤ italic_i ≤ italic_C.

  3. 3.

    Randomly sample logits 𝑯N×C𝑯superscript𝑁𝐶{\bm{H}}\in\mathbb{R}^{N\times C}bold_italic_H ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_C end_POSTSUPERSCRIPT s.t. 𝜼n𝒩(𝟎,𝚺)similar-tosubscript𝜼𝑛𝒩0𝚺\bm{\eta}_{n}\sim\mathcal{N}(\bm{0},{\bm{\Sigma}})bold_italic_η start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∼ caligraphic_N ( bold_0 , bold_Σ ) for 1nN1𝑛𝑁1\leq n\leq N1 ≤ italic_n ≤ italic_N.

  4. 4.

    Let cn,i=𝟙{ηn,i0}subscript𝑐𝑛𝑖subscript1subscript𝜂𝑛𝑖0c_{n,i}=\mathbbm{1}_{\left\{\eta_{n,i}\geq 0\right\}}italic_c start_POSTSUBSCRIPT italic_n , italic_i end_POSTSUBSCRIPT = blackboard_1 start_POSTSUBSCRIPT { italic_η start_POSTSUBSCRIPT italic_n , italic_i end_POSTSUBSCRIPT ≥ 0 } end_POSTSUBSCRIPT for 1nN1𝑛𝑁1\leq n\leq N1 ≤ italic_n ≤ italic_N and 1iC1𝑖𝐶1\leq i\leq C1 ≤ italic_i ≤ italic_C.

  5. 5.

    Let h:Cp:superscript𝐶superscript𝑝h:\>\mathbb{R}^{C}\rightarrow\mathbb{R}^{p}italic_h : blackboard_R start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT be a randomly initialised multilayer perceptron with ReLU nonlinearities.

  6. 6.

    Let 𝒙n=h(𝜼n)+ϵnsubscript𝒙𝑛subscript𝜼𝑛subscriptbold-italic-ϵ𝑛{\bm{x}}_{n}=h\left(\bm{\eta}_{n}\right)+\bm{\epsilon}_{n}bold_italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_h ( bold_italic_η start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) + bold_italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT s.t. ϵn𝒩(𝟎,𝑰)similar-tosubscriptbold-italic-ϵ𝑛𝒩0𝑰\bm{\epsilon}_{n}\sim\mathcal{N}(\bm{0},{\bm{I}})bold_italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∼ caligraphic_N ( bold_0 , bold_italic_I ) for 1nN1𝑛𝑁1\leq n\leq N1 ≤ italic_n ≤ italic_N.

  7. 7.

    Let g:C:𝑔superscript𝐶g:\>\mathbb{R}^{C}\rightarrow\mathbb{R}italic_g : blackboard_R start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT → blackboard_R be a randomly initialized linear perceptron.

  8. 8.

    Let yn=𝟙{(g(𝒄n)ymed)}subscript𝑦𝑛subscript1𝑔subscript𝒄𝑛subscript𝑦𝑚𝑒𝑑y_{n}=\mathbbm{1}_{\left\{\left(g\left({\bm{c}}_{n}\right)\geq y_{med}\right)% \right\}}italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = blackboard_1 start_POSTSUBSCRIPT { ( italic_g ( bold_italic_c start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ≥ italic_y start_POSTSUBSCRIPT italic_m italic_e italic_d end_POSTSUBSCRIPT ) } end_POSTSUBSCRIPT for 1nN1𝑛𝑁1\leq n\leq N1 ≤ italic_n ≤ italic_N, where ymedsubscript𝑦𝑚𝑒𝑑y_{med}italic_y start_POSTSUBSCRIPT italic_m italic_e italic_d end_POSTSUBSCRIPT denotes the median of g(𝒄n)𝑔subscript𝒄𝑛g\left({\bm{c}}_{n}\right)italic_g ( bold_italic_c start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ).

A.2 Natural Image Datasets

Caltech-UCSD Birds-200-2011

We evaluate on the Caltech-UCSD Birds-200-2011 (CUB)333https://www.vision.caltech.edu/datasets/cub_200_2011/, no license available dataset (Wah \BOthers., \APACyear2011). It comprises 11,788 photographs from 200 distinct bird species annotated with 312 concepts, such as belly color and pattern. In this manuscript, we follow the original train-test split and revised the proposed dataset in the initial CBM work (Koh \BOthers., \APACyear2020). Here, only the 112 most widespread binary attributes are included in the final dataset, and concepts are shared across samples in identical classes. The images were resized to a resolution of 224 × 224 pixels. Finally, following the original proposed augmentations, we applied random horizontal flips, modified the brightness and saturation, and applied normalization during training.

CIFAR-10

CIFAR-10444https://www.cs.toronto.edu/~kriz/cifar.html, no license available (Krizhevsky \BOthers., \APACyear2009) is a natural image benchmark with 60,000 32x32 colour images and 10 classes. We kept the original train-test split, with 50,000 samples in the train set and a balanced total of 6,000 images per class. We generated 143 concept labels as described in Section 4 using large language and vision models. At training time, as for CUB, we applied augmentations including modifications to brightness and saturation, random horizontal flips and normalisation. Images were rescaled to a size of 224 × 224 pixels.

Appendix B Implementation Details

This section provides further implementation details of SCBM and the evaluated baselines. All methods were implemented using PyTorch (v 2.1.1) (Ansel \BOthers., \APACyear2024). All models are trained for 150 epochs for the synthetic and 300 epochs for the natural image datasets with the Adam optimizer (Kingma \BBA Ba, \APACyear2015) with a learning rate of 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT and a batch size of 64. For the independently trained autoregressive model, we split the training epochs into 2/3232/32 / 3 for the concept predictor and 1/3131/31 / 3 for the target predictor. For the methods requiring sampling, the number of Monte-Carlo samples is set to M=100𝑀100M=100italic_M = 100. For the synthetic tabular data, we use a fully connected neural network as backbone, with 3 non-linear layers, batch normalization, and dropout. For the CUB dataset, we use a pretrained ResNet-18 (He \BOthers., \APACyear2016), and for the lower-resolution CIFAR-10 a simple convolutional neural network with 2 convolutional layers followed by ReLU, Dropout, and a fully connected layer. For fairness in the comparisons, all baselines have the same model architecture choices and all experiments are performed over 10101010 random seeds.

Resource Usage

For the experiments of the main paper, we used a cluster of mostly GeForce RTX 2080’s with 2 CPU workers. Over all methods, we estimate an average runtime of 8h per experiment. This amounts to 5 methods ×\times× 3 datasets ×\times× 10 seeds ×\times× 8 hours === 1200 hours. Adding to that, the Ablation Figures required another 40 runs, amounting to a full total of 1520 hours of compute. Please note that we only report the numbers to generate the final results but not the development time, which we roughly estimate to be around 10 times bigger.

Appendix C Further Experiments

In this section, we show additional experiments to provide a more in-depth understanding of SCBM’s effectiveness. We ablate multiple hyperparameters to provide an understanding of how they influence the model performance.

C.1 Random Intervention Policy

Refer to caption
Refer to caption
(a) Synthetic
Refer to caption
Refer to caption
(b) CUB
Refer to caption
Refer to caption
(c) CIFAR-10
Refer to caption
Figure 4: Performance after intervening on concepts in random order. Concept and target accuracy (%) are shown in the first and second rows, respectively. Results are reported as averages and standard deviations of model performance across ten seeds.

In Figure 4, we present the intervention performance of SCBM and baseline methods. Compared to the uncertainty-based intervention policy of Figure 2, the intervention curves of all methods are less steep, confirming the usefulness of Shin \BOthers. (\APACyear2023)’s proposed policy. Following the previous statements, SCBMs still outperform baseline methods with the amortized beating the global variant for real-world datasets. We observe that in CIFAR-10 for the first interventions, an improvement in concept accuracy is not directly reflected in improved target prediction for SCBMs, which is likely due to the low signal-to-noise ratio of the CLIP-inferred concepts.

C.2 Regularization Strength

Refer to caption
Refer to caption
Refer to caption
Figure 5: Performance on CUB after intervening on concepts in the order of highest predicted uncertainty with differing regularization strengths. Concept and target accuracy (%) are shown in the first and second columns, respectively. Results are reported as averages and standard deviations of model performance across five seeds. For each SCBM variant, we choose a darker color, the higher the regularization strength of λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

In Figure 5, we analyze the impact of the strength of λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT from Equation 6. Due to environmental considerations, we conducted experiments using only 5 seeds and limited the number of interventions to 20. Our findings indicate that SCBMs are not sensitive to the choice of λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, except that the unregularized amortized variant exhibits slight patterns of overfitting.

C.3 Intervention Strategy

In Figure 6, we analyze the effect of the intervention strategy. Our findings indicate that while SCBMs are still effective with the proposed strategy from Koh \BOthers. (\APACyear2020), that sets the logits to the 5th (if ci=0subscript𝑐𝑖0c_{i}=0italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0) or 95th (if ci=1subscript𝑐𝑖1c_{i}=1italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1) percentile of the training distribution, our proposed strategy based on the confidence region results in stronger intervenability.

Refer to caption
Refer to caption
Refer to caption
Figure 6: Performance on CUB after intervening on concepts in the order of highest predicted uncertainty, comparing the proposed intervention strategy to Koh \BOthers. (\APACyear2020)’s intervention of setting the logits to the 5th or 95th empirical percentile of the training distribution. Concept and target accuracy (%) are shown in the first and second columns, respectively. Results are reported as averages and standard deviations of model performance across five seeds.

C.4 Confidence Region Level

Refer to caption
Refer to caption
Refer to caption
Figure 7: Performance on CUB after intervening on concepts in the order of highest predicted uncertainty with differing levels 1α1𝛼1-\alpha1 - italic_α of the confidence region. Concept and target accuracy (%) are shown in the first and second columns, respectively. Results are reported as averages and standard deviations of model performance across three seeds.

In Figure 7, we analyze the effect of the level 1α1𝛼1-\alpha1 - italic_α of the likelihood-based confidence region. Our findings indicate that the SCBMs are not sensitive to the choice of 1α1𝛼1-\alpha1 - italic_α, with higher levels being slightly better in performance.