Semi-supervised Concept Bottleneck Models

Lijie Hu1,2, Tianhao Huang1,2,3, Huanyi Xie1,2,4,
Chenyang Ren1,2,5, Zhengyu Hu1,2,6, Lu Yu7, and Di Wang1,2
1
Provable Responsible AI and Data Analytics (PRADA) Lab
2KAUST  3Nankai University  4Harbin Institute of Technology
5Shanghai Jiao Tong University  6HKUST  7Ant Group
Abstract

Concept Bottleneck Models (CBMs) have garnered increasing attention due to their ability to provide concept-based explanations for black-box deep learning models while achieving high final prediction accuracy using human-like concepts. However, the training of current CBMs heavily relies on the accuracy and richness of annotated concepts in the dataset. These concept labels are typically provided by experts, which can be costly and require significant resources and effort. Additionally, concept saliency maps frequently misalign with input saliency maps, causing concept predictions to correspond to irrelevant input features - an issue related to annotation alignment. To address these limitations, we propose a new framework called SSCBM (Semi-supervised Concept Bottleneck Model). Our SSCBM is suitable for practical situations where annotated data is scarce. By leveraging joint training on both labeled and unlabeled data and aligning the unlabeled data at the concept level, we effectively solve these issues. We proposed a strategy to generate pseudo labels and an alignment loss. Experiments demonstrate that our SSCBM is both effective and efficient. With only 20% labeled data, we achieve 93.19% (96.39% in a fully supervised setting) concept accuracy and 75.51% (79.82% in a fully supervised setting) prediction accuracy.

1 Introduction

Recently, deep learning models, such as ResNet he2016deep , often feature complex non-linear architectures, making it difficult for end-users to understand and trust their decisions. This lack of interpretability is a significant obstacle to their adoption, especially in critical fields such as healthcare thirunavukarasu2023large and finance li2023large , where transparency is crucial. Explainable artificial intelligence (XAI) models have been developed to meet the demand for transparency, providing insights into their behavior and internal mechanisms hu2023seat ; hu2023improving ; yang2024human ; hu2024hopfieldian . Concept Bottleneck Models (CBMs) koh2020concept are particularly notable among these XAI models for their ability to clarify the prediction process of end-to-end AI models. CBMs introduce a bottleneck layer that incorporates human-understandable concepts. During prediction, CBMs first predict concept labels from the original input and then use these predicted concepts in the bottleneck layer to determine the final classification label. This approach results in a self-explanatory decision-making process that users can comprehend.

Refer to caption Refer to caption Refer to caption
(a) Label Complete and Well-aligned (b) Label Incomplete (c) Misaligned
Figure 1: (a) A sample of sparrow class with complete concept labels. (b) A sample of sparrow class with incomplete concept labels. (c) A sample of misalignment between input features and concepts resulting from existing CBM methods. Our framework simultaneously utilizes both (a) and (b) types of data and addresses the issue of (c) through an alignment loss.

A major issue in original CBMs is the need for expert labeling, which is costly in practice. Some researchers address this problem through unsupervised learning. For example, oikarinen2023label proposes a Label-free CBM that transforms any neural network into an interpretable CBM without requiring labeled concept data while maintaining high accuracy. Similarly, Post-hoc Concept Bottleneck models yuksekgonul2022post can be applied to various neural networks without compromising performance, preserving interpretability advantages. However, these methods have three issues. First, those unsupervised methods heavily rely on large language models like GPT-3, which have reliability issues lai2023faithful . Second, the concepts extracted by these models lack evaluation metrics, undermining their interpretability. Third, the assumption that no concept labels are available is too stringent in practice. In reality, obtaining a small portion of concept labels is feasible and cost-effective. Therefore, we can maximize the use of this small labeled concept dataset. This is the motivation for introducing our framework, which focuses on the semi-supervised setting in CBM.

In this paper, we introduce a framework called the SSCBM (Semi-supervised Concept Bottleneck Model). Compared to a supervised setting, semi-supervised CBMs have two main challenges. First, obtaining concept embeddings requires concept labels, so we need to generate pseudo labels to obtain these concept embeddings. To achieve this, SSCBM uses a KNN-based algorithm to assign pseudo-concept labels for unlabeled data.

While such a simple pseudo-labeling method is effective and has acceptable classification accuracy, we also find that the concept saliency map often misaligns with the input saliency map, meaning concept predictions frequently correspond to irrelevant input features. This misalignment often arises from inaccurate concept annotations or unclear relationships between input features and concepts, which is closely related to the broader issue of annotation alignment. In fact, in the supervised setting, there is a similar misalignment issue furby2024can . Existing research seeks to improve alignment by connecting textual and image information hou2024concept . However, these methods only focus on the supervised setting and cannot be directly applied to our settings because our pseudo-labels are noisy. Our framework achieves excellent performance in both concept accuracy and concept saliency alignment by leveraging joint training on both labeled and unlabeled data and aligning the unlabeled data at the concept level. To achieve this, we leverage the relevance between the input image and the concept and get other pseudo-concept labels based on these similarity scores. Finally, we align these two types of pseudo-concept labels to give the concept encoder the ability to extract useful information from features while also inheriting the ability to align concept embeddings with the input.

Comprehensive experiments on benchmark datasets demonstrate that our SSCBM is both efficient and effective. Our contributions are summarized as follows.

  • We propose the SSCBM, a framework designed to solve the semi-supervised annotation problem, which holds practical significance in real-world applications. To the best of our knowledge, we are the first to tackle these two problems within a single framework, elucidating the behavior of CBMs through semi-supervised alignment.

  • Our framework addresses the semi-supervised annotation problem alongside the concept semantics alignment problem in a simple and clever manner. We first use the KNN algorithm to assign a pseudo-label to each unlabeled data, which has been experimentally proven to be simple and effective. Then, we compute a heatmap between concept embeddings and the input. After applying a threshold, we obtain the predicted alignment label. Finally, we optimize the alignment loss between these two pseudo-concept labels to mitigate the misalignment issue.

  • Comprehensive experiments demonstrate the superiority of our SSCBM in annotation and concept-saliency alignment, indicating its efficiency and effectiveness. With only 1% labeled data, we achieved 88.38% concept accuracy and 62.19% predicted accuracy. With 20% labeled data, we achieve 93.19% (96.39% in a fully supervised setting) concept accuracy and 75.51% (79.82% in a fully supervised setting) predicted accuracy.

2 Related Work

Concept Bottleneck Models.

Concept Bottleneck Model (CBM) koh2020concept is an innovative deep-learning approach for image classification and visual reasoning. By introducing a concept bottleneck layer into deep neural networks, CBMs enhance model generalization and interpretability by learning specific concepts. However, CBMs face two primary challenges: their performance often lags behind that of models without the bottleneck layer due to incomplete information extraction, and they rely heavily on laborious dataset annotation. Researchers have explored various solutions to these challenges. chauhan2023interactive extended CBMs into interactive prediction settings by introducing an interaction policy to determine which concepts to label, thereby improving final predictions. oikarinen2023label addressed CBM limitations by proposing a Label-free CBM, which transforms any neural network into an interpretable CBM without requiring labeled concept data, maintaining high accuracy. Post-hoc Concept Bottleneck models yuksekgonul2022post can be applied to various neural networks without compromising performance, preserving interpretability advantages. Related work in the image domain includes studies havasi2022addressing ; kim2023probabilistic ; keser2023interpretable ; sawada2022concept ; sheth2023auxiliary ; li2024text ; hu2024editable . In the graph concept field, magister2021gcexplainer provide a global interpretation for Graph Neural Networks (GNNs) by map** graphs into a concept space through clustering and offering a human-in-the-loop evaluation. xuanyuan2023global ; Barbiero2023interpretable extend this approach by incorporating both global and local explanations. For local explanations, they define a concept set, with each neuron represented as a vector with Boolean values indicating concept activation. However, existing works rarely consider semi-supervised settings, which are practical in real-world applications. Our framework addresses these issues effectively.

Semi-supervised Learning.

Semi-supervised learning (SSL) combines the two main tasks of machine learning: supervised learning and unsupervised learning van2020survey . It is typically applied in scenarios where labeled data is scarce. Examples include computer-aided diagnosis zhang2015semi ; 8651980 , medical image analysis HUYNH2022106628 ; CHEPLYGINA2019280 ; FILIPOVYCH20111109 , and drug discovery chen2016nllss . In these cases, collecting detailed annotated data by experts requires considerable time and effort. However, under the assumption of data distribution, unlabeled data can also assist in building better classifiers van2020survey . SSL, also known as self-labeling or self-teaching in its earliest forms, involves the model iteratively labeling a portion of the unlabeled data and adding it to the training set for the next round of training ouali2020overview . The expectation-maximization (EM) algorithm proposed by moon1996expectation uses both labeled and unlabeled data to produce maximum likelihood estimates of parameters. laine2016temporal and tarvainen2017mean focus on consistency regularization. II-model laine2016temporal combines both supervised cross-entropy loss and unsupervised consistency loss while perturbing the model and data based on the consistency constraint assumption. A temporal ensembling model integrates predictions from models at various time points. Mean teacher tarvainen2017mean addresses the slow updating issue of the temporal ensembling model on large datasets by averaging model weights instead of predicting labels. MixMatch NEURIPS2019_1cd138d0 unifies and refines the previous approaches of consistency regularization, entropy minimization, and traditional regularization into a single loss function, achieving excellent results. Pseudo labeling, as an effective tool for reducing the entropy of unlabeled data lee2013pseudo , has been increasingly attracting the attention of researchers in the field of semi-supervised learning. arazo2020pseudo proposes that directly using the model’s predictions as pseudo-labels can achieve good results. FixMatch sohn2020fixmatch predicts and retains the model, generating high-confidence pseudo-labels. pham2021meta continuously adjusts the teacher based on feedback from the student, thereby generating better pseudo-labels. While there has been a plethora of work in the semi-supervised learning field, the focus on semi-supervised concept bottleneck models remains largely unexplored. Our work focuses on this new area.

3 Preliminaries

Concept Bottleneck Models koh2020concept .

We consider a classification task with a concept set denoted as 𝒞={p1,,pk}𝒞subscript𝑝1subscript𝑝𝑘\mathcal{C}=\{p_{1},\cdots,p_{k}\}caligraphic_C = { italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } with each pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a concept given by experts or LLMs, and a training dataset represented 𝒟={(x(i),y(i),c(i))}i=1N𝒟superscriptsubscriptsuperscript𝑥𝑖superscript𝑦𝑖superscript𝑐𝑖𝑖1𝑁\mathcal{D}=\{(x^{(i)},y^{(i)},c^{(i)})\}_{i=1}^{N}caligraphic_D = { ( italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_c start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT. Here, for i[N]𝑖delimited-[]𝑁i\in[N]italic_i ∈ [ italic_N ], x(i)𝒳dsuperscript𝑥𝑖𝒳superscript𝑑x^{(i)}\in\mathcal{X}\subseteq\mathbb{R}^{d}italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∈ caligraphic_X ⊆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT represents the feature vector (e.g., an image’s pixels), y(i)𝒴lsuperscript𝑦𝑖𝒴superscript𝑙y^{(i)}\in\mathcal{Y}\subseteq\mathbb{R}^{l}italic_y start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∈ caligraphic_Y ⊆ blackboard_R start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT denotes the label (l𝑙litalic_l is the number of classes), c(i)=(ci1,,cik)ksuperscript𝑐𝑖superscriptsubscript𝑐𝑖1superscriptsubscript𝑐𝑖𝑘superscript𝑘{c}^{(i)}=(c_{i}^{1},\cdots,c_{i}^{k})\in\mathbb{R}^{k}italic_c start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT = ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , ⋯ , italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT represents the concept vector (a binary vector of length k𝑘kitalic_k, where each value indicates whether the input x(i)superscript𝑥𝑖x^{(i)}italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT contains that concept). In CBMs, the goal is to learn two representations: one called concept encoder that transforms the input space to the concept space, denoted as g:dk:𝑔superscript𝑑superscript𝑘g:\mathbb{R}^{d}\to\mathbb{R}^{k}italic_g : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT, and another called label predictor that maps the concept space to the downstream prediction space, denoted as f:kl:𝑓superscript𝑘superscript𝑙f:\mathbb{R}^{k}\to\mathbb{R}^{l}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT. Usually, the map f𝑓fitalic_f is linear. For any input x𝑥xitalic_x, we aim to ensure that its predicted concept vector c^=g(x)^𝑐𝑔𝑥\hat{c}=g(x)over^ start_ARG italic_c end_ARG = italic_g ( italic_x ) and prediction y^=f(g(x))^𝑦𝑓𝑔𝑥\hat{y}=f(g(x))over^ start_ARG italic_y end_ARG = italic_f ( italic_g ( italic_x ) ) are close to their underlying counterparts, thus capturing the essence of the original CBMs.

Concept Embedding Models espinosa2022concept .

As the original CBM relies solely on concept features to determine the model’s predictions, compared to canonical deep neural networks, it will degrade the prediction performance. To further improve the performance of CBMs, CEM was developed by espinosa2022concept . It achieves this by learning interpretable high-dimensional concept representations (i.e., concept embeddings), thus maintaining high task accuracy while obtaining concept representations that contain meaningful semantic information. For CEMs, we use the same setting as that of espinosa2022concept ; ismail2023concept . For each input x𝑥xitalic_x, the concept encoder learns k𝑘kitalic_k concept representations c^1,c^2,,c^ksubscript^𝑐1subscript^𝑐2subscript^𝑐𝑘\hat{c}_{1},\hat{c}_{2},\ldots,\hat{c}_{k}over^ start_ARG italic_c end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over^ start_ARG italic_c end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , over^ start_ARG italic_c end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, each corresponding to one of the k𝑘kitalic_k ground truth concepts in the training dataset. In CEMs, each concept cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is represented using two embeddings 𝒄^i+,𝒄^imsuperscriptsubscriptbold-^𝒄𝑖superscriptsubscriptbold-^𝒄𝑖superscript𝑚\bm{\hat{c}}_{i}^{+},\bm{\hat{c}}_{i}^{-}\in\mathbb{R}^{m}overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT , overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, each with specific semantics, i.e., the concept is TRUE (activate state) and concept is FALSE (negative state), where hyper-parameter m𝑚mitalic_m is the embedding dimension. We use a DNN ψ(x)𝜓𝑥\psi(x)italic_ψ ( italic_x ) to learn a latent representation 𝒉nh𝒉superscriptsubscript𝑛\bm{h}\in\mathbb{R}^{n_{h}}bold_italic_h ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, to be used as input of the CEM’s embedding generator, where nhsubscript𝑛n_{h}italic_n start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT is the dimension of the latent representation. CEM’s embedding generator ϕitalic-ϕ\phiitalic_ϕ feeds 𝒉𝒉\bm{h}bold_italic_h into two concept-specific fully connected layers in order to learn two concept embeddings in msuperscript𝑚\mathbb{R}^{m}blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT.

𝒄^i=ϕi(𝒉)=a(Wi𝒉+𝒃i).subscriptbold-^𝒄𝑖subscriptitalic-ϕ𝑖𝒉𝑎subscript𝑊𝑖𝒉subscript𝒃𝑖\bm{\hat{c}}_{i}=\phi_{i}\left(\bm{h}\right)=a\left(W_{i}\bm{h}+\bm{b}_{i}% \right).overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_h ) = italic_a ( italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_h + bold_italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) .

Then we use a differential scoring function s:2m[0,1]:𝑠superscript2𝑚01s:\mathbb{R}^{2m}\rightarrow\left[0,1\right]italic_s : blackboard_R start_POSTSUPERSCRIPT 2 italic_m end_POSTSUPERSCRIPT → [ 0 , 1 ], to achieve the alignment of concept embeddings 𝒄^i+,𝒄^isuperscriptsubscriptbold-^𝒄𝑖superscriptsubscriptbold-^𝒄𝑖\bm{\hat{c}}_{i}^{+},\bm{\hat{c}}_{i}^{-}overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT , overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT and ground-truth concepts cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. It is trained to predict the probability p^i:=s([𝒄^i+,𝒄^i])=σ(Ws[[𝒄^i+,𝒄^i]]+𝒃s)assignsubscript^𝑝𝑖𝑠superscriptsuperscriptsubscriptbold-^𝒄𝑖superscriptsubscriptbold-^𝒄𝑖top𝜎subscript𝑊𝑠delimited-[]superscriptsuperscriptsubscriptbold-^𝒄𝑖superscriptsubscriptbold-^𝒄𝑖topsubscript𝒃𝑠\hat{p}_{i}:=s\left(\left[\bm{\hat{c}}_{i}^{+},\bm{\hat{c}}_{i}^{-}\right]^{% \top}\right)=\sigma\left(W_{s}\left[\left[\bm{\hat{c}}_{i}^{+},\bm{\hat{c}}_{i% }^{-}\right]^{\top}\right]+\bm{b}_{s}\right)over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := italic_s ( [ overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT , overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) = italic_σ ( italic_W start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT [ [ overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT , overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] + bold_italic_b start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) of concept cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT being active in embedding space. We get the final concept embedding 𝒄^𝒊subscriptbold-^𝒄𝒊\bm{\hat{c}_{i}}overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT, as follows:

𝒄^i:=p^ic^i++(1p^i)c^i.assignsubscriptbold-^𝒄𝑖subscript^𝑝𝑖superscriptsubscript^𝑐𝑖1subscript^𝑝𝑖superscriptsubscript^𝑐𝑖\bm{\hat{c}}_{i}:=\hat{p}_{i}\hat{c}_{i}^{+}+(1-\hat{p}_{i})\hat{c}_{i}^{-}.overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT over^ start_ARG italic_c end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT + ( 1 - over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) over^ start_ARG italic_c end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT .

At this point, we understand that we can obtain high-quality concept embeddings rich in semantics through CEMs. In the subsequent section 4, we will effectively utilize these representations of concepts and further optimize their interpretability through our proposed framework SSCBM.

Semi-supervised Setting.

Now, we consider the setting of semi-supervised learning for concept bottleneck models. As mentioned earlier, a typical training dataset for CBMs can be represented as 𝒟={(x(i),y(i),c(i))}i=1N𝒟superscriptsubscriptsuperscript𝑥𝑖superscript𝑦𝑖superscript𝑐𝑖𝑖1𝑁\mathcal{D}=\{(x^{(i)},y^{(i)},c^{(i)})\}_{i=1}^{N}caligraphic_D = { ( italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_c start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT, where x(i)𝒳superscript𝑥𝑖𝒳x^{(i)}\in\mathcal{X}italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∈ caligraphic_X represents the input feature. However, in semi-supervised learning tasks, the set of feature vectors typically consists of two parts, 𝒳={𝒳L,𝒳U}𝒳subscript𝒳𝐿subscript𝒳𝑈\mathcal{X}=\{\mathcal{X}_{L},\mathcal{X}_{U}\}caligraphic_X = { caligraphic_X start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT , caligraphic_X start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT }, where 𝒳Lsubscript𝒳𝐿\mathcal{X}_{L}caligraphic_X start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT represents a small subset of labeled data and 𝒳Usubscript𝒳𝑈\mathcal{X}_{U}caligraphic_X start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT represents the remaining unlabeled data. Generally we assume |𝒳L||𝒳U|much-less-thansubscript𝒳𝐿subscript𝒳𝑈|\mathcal{X}_{L}|\ll|\mathcal{X}_{U}|| caligraphic_X start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT | ≪ | caligraphic_X start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT |. We assume that x(j)𝒳Lsuperscript𝑥𝑗subscript𝒳𝐿x^{(j)}\in\mathcal{X}_{L}italic_x start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT is labeled with a concept vector c(j)superscript𝑐𝑗c^{(j)}italic_c start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT and a label y(j)superscript𝑦𝑗y^{(j)}italic_y start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT, and for any x(i)𝒳superscript𝑥𝑖𝒳x^{(i)}\in\mathcal{X}italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∈ caligraphic_X, there only exists a corresponding label y(i)𝒴superscript𝑦𝑖𝒴y^{(i)}\in\mathcal{Y}italic_y start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∈ caligraphic_Y. Note that our method can be directly extended to the fully semi-supervised case where even the classification labels for feature vectors in 𝒳Usubscript𝒳𝑈\mathcal{X}_{U}caligraphic_X start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT are unknown.

Under these settings, given a training dataset 𝒟=𝒟L𝒟U𝒟subscript𝒟𝐿subscript𝒟𝑈\mathcal{D}=\mathcal{D}_{L}\cup\mathcal{D}_{U}caligraphic_D = caligraphic_D start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ∪ caligraphic_D start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT that includes both labeled and unlabeled data, the goal is to train a CBM using both the labeled data 𝒟Lsubscript𝒟𝐿\mathcal{D}_{L}caligraphic_D start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT and unlabeled data 𝒟Usubscript𝒟𝑈\mathcal{D}_{U}caligraphic_D start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT. This aims to get better map**s g:dk:𝑔superscript𝑑superscript𝑘g:\mathbb{R}^{d}\to\mathbb{R}^{k}italic_g : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT and f:kl:𝑓superscript𝑘superscript𝑙f:\mathbb{R}^{k}\to\mathbb{R}^{l}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT than those trained by using only labeled data, ultimately achieving higher task accuracy and its corresponding concept-based explanation.

4 Semi-supervised Concept Bottleneck Models

In this section, we will elaborate on the details of the proposed SSCBM framework, which is shown in Figure 2. SSCBM follows the main idea of CEM. Specifically, to learn a good concept encoder, we use different processing methods for labeled and unlabeled data. Labeled data first passes through a feature extractor ψ𝜓\psiitalic_ψ to be transformed into a latent representation 𝒉𝒉\bm{h}bold_italic_h, which then enters the concept embedding extractor to obtain the concept embeddings and predicted concept vector 𝒄^bold-^𝒄\bm{\hat{c}}overbold_^ start_ARG bold_italic_c end_ARG for the labeled data, which is compared with the ground truth concept to compute the concept loss. Additionally, the label predictor predicts 𝒚^bold-^𝒚\bm{\hat{y}}overbold_^ start_ARG bold_italic_y end_ARG based on 𝒄^bold-^𝒄\bm{\hat{c}}overbold_^ start_ARG bold_italic_c end_ARG, and calculates the task loss.

For unlabeled data, we first extract image features V𝑉Vitalic_V from the input using an image encoder. Then, we use the KNN algorithm to assign a pseudo-label 𝒄^𝒊𝒎𝒈subscriptbold-^𝒄𝒊𝒎𝒈\bm{\hat{c}_{img}}overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT bold_italic_i bold_italic_m bold_italic_g end_POSTSUBSCRIPT to each unlabeled data, which has been experimentally proven to be simple and effective. In the second step, we compute a heatmap between concept embeddings and the input. After applying a threshold, we obtain the predicted alignment label 𝒄^𝒂𝒍𝒊𝒈𝒏subscriptbold-^𝒄𝒂𝒍𝒊𝒈𝒏\bm{\hat{c}_{align}}overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT bold_italic_a bold_italic_l bold_italic_i bold_italic_g bold_italic_n end_POSTSUBSCRIPT. Finally, we compute the alignment loss between 𝒄^𝒊𝒎𝒈subscriptbold-^𝒄𝒊𝒎𝒈\bm{\hat{c}_{img}}overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT bold_italic_i bold_italic_m bold_italic_g end_POSTSUBSCRIPT and 𝒄^𝒂𝒍𝒊𝒈𝒏subscriptbold-^𝒄𝒂𝒍𝒊𝒈𝒏\bm{\hat{c}_{align}}overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT bold_italic_a bold_italic_l bold_italic_i bold_italic_g bold_italic_n end_POSTSUBSCRIPT. During each training epoch, we simultaneously compute these losses and update the model parameters based on the gradients.

Refer to caption
Figure 2: Overall framework of our proposed SSCBM.

4.1 Label Anchor: Concept Embedding Encoder

Concept Embeddings. As described in Section 3, we obtain high-dimensional concept representations with meaningful semantics based on CEMs. Thus, our concept encoder should extract useful information from both labeled and unlabeled data.

For the labeled training data 𝒟L={(x(i),y(i),c(i))}i=1|𝒟L|subscript𝒟𝐿superscriptsubscriptsuperscript𝑥𝑖superscript𝑦𝑖superscript𝑐𝑖𝑖1subscript𝒟𝐿\mathcal{D}_{L}=\{(x^{(i)},y^{(i)},c^{(i)})\}_{i=1}^{|\mathcal{D}_{L}|}caligraphic_D start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT = { ( italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_c start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | caligraphic_D start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT | end_POSTSUPERSCRIPT, we follow the original CEM espinosa2022concept , i.e., using a backbone network (e.g., ResNet50) to extract features 𝒉={ψ(x(i))}i=1|𝒟L|𝒉superscriptsubscript𝜓superscript𝑥𝑖𝑖1subscript𝒟𝐿\bm{h}=\{\psi(x^{(i)})\}_{i=1}^{|\mathcal{D}_{L}|}bold_italic_h = { italic_ψ ( italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | caligraphic_D start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT | end_POSTSUPERSCRIPT. Then, for each feature, it passes through a embedding generator to obtain concept embeddings 𝒄^im×ksubscriptbold-^𝒄𝑖superscript𝑚𝑘\bm{\hat{c}}_{i}\in\mathbb{R}^{m\times k}overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_k end_POSTSUPERSCRIPT for i[k]𝑖delimited-[]𝑘i\in[k]italic_i ∈ [ italic_k ]. After passing through fully connected layers and activation layers, we obtain the predicted binary concept vector 𝒄^kbold-^𝒄superscript𝑘\bm{\hat{c}}\in\mathbb{R}^{k}overbold_^ start_ARG bold_italic_c end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT for the labeled data. The specific process can be represented by the following expression:

𝒄^i(j),h(j)=σ(ϕ(ψ(x(j))),i=1,,k,j=1,,|𝒟L|,\bm{\hat{c}}^{(j)}_{i},h^{(j)}=\sigma(\phi(\psi(x^{(j)})),\quad i=1,\ldots,k,% \quad j=1,\ldots,|\mathcal{D}_{L}|,overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_h start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT = italic_σ ( italic_ϕ ( italic_ψ ( italic_x start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ) ) , italic_i = 1 , … , italic_k , italic_j = 1 , … , | caligraphic_D start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT | ,

where ψ𝜓\psiitalic_ψ, ϕitalic-ϕ\phiitalic_ϕ and σ𝜎\sigmaitalic_σ represent the backbone, embedding generator and activation function, respectively.

To enhance the interpretability of concept embeddings, we calculate the concept loss utilize binary cross-entropy to optimize the accuracy of concept predictions by computing csubscript𝑐\mathcal{L}_{c}caligraphic_L start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT based on predicted binary concept vector 𝒄^bold-^𝒄\bm{\hat{c}}overbold_^ start_ARG bold_italic_c end_ARG and ground truth concept labels 𝒄𝒄\bm{c}bold_italic_c:

c=BCE(𝒄^,𝒄),subscript𝑐𝐵𝐶𝐸bold-^𝒄𝒄\mathcal{L}_{c}=BCE(\bm{\hat{c}},\bm{c}),caligraphic_L start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = italic_B italic_C italic_E ( overbold_^ start_ARG bold_italic_c end_ARG , bold_italic_c ) , (1)

where BCE𝐵𝐶𝐸BCEitalic_B italic_C italic_E is the binary cross entropy loss.

Task Loss. Since our ultimate task is classification, we also need to incorporate the task loss for the final prediction performance. After obtaining the predicted concept 𝒄^bold-^𝒄\bm{\hat{c}}overbold_^ start_ARG bold_italic_c end_ARG, we use a label predictor to predict the final class 𝒚^bold-^𝒚\bm{\hat{y}}overbold_^ start_ARG bold_italic_y end_ARG. Then we can define the task loss function using the categorical cross-entropy loss to train our classification model as follows:

task=CE(𝒚^,𝒚),subscript𝑡𝑎𝑠𝑘𝐶𝐸bold-^𝒚𝒚\mathcal{L}_{task}=CE(\bm{\hat{y}},\bm{y}),caligraphic_L start_POSTSUBSCRIPT italic_t italic_a italic_s italic_k end_POSTSUBSCRIPT = italic_C italic_E ( overbold_^ start_ARG bold_italic_y end_ARG , bold_italic_y ) , (2)

Note that for the unlabeled data, we can also calculate the task loss since their class labels are known, in order to make full use of the data.

4.2 Unlabel Alignment: Image-Textual Semantics Alignment

Pseudo Labeling. Unlike the supervised data, CEM cannot directly extract useful information from the unlabeled data as the concept encoder is a supervised training architecture. Thus, in practical situations lacking labeled data, one direct approach is to get high-quality pseudo concept labels to train the model. Below, we will introduce the method we use to obtain pseudo concept labels.

Firstly, we can natually think of measuring the similarity between images by calculating the distance in the cosine space. Based on this idea, we can assign pseudo labels to unlabeled data by finding labeled data with similar image features to them. Specifically, for each unlabeled training data x𝒟U={(x(i),y(i))}i=1|𝒟U|𝑥subscript𝒟𝑈superscriptsubscriptsuperscript𝑥𝑖superscript𝑦𝑖𝑖1subscript𝒟𝑈x\in\mathcal{D}_{U}=\{(x^{(i)},y^{(i)})\}_{i=1}^{|\mathcal{D}_{U}|}italic_x ∈ caligraphic_D start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT = { ( italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | caligraphic_D start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT | end_POSTSUPERSCRIPT, we calculate its cosine distance with all labeled data points x(j)𝒟L={(x(j),y(j),c(j))}j=1|𝒟L|superscript𝑥𝑗subscript𝒟𝐿superscriptsubscriptsuperscript𝑥𝑗superscript𝑦𝑗superscript𝑐𝑗𝑗1subscript𝒟𝐿x^{(j)}\in\mathcal{D}_{L}=\{(x^{(j)},y^{(j)},c^{(j)})\}_{j=1}^{|\mathcal{D}_{L% }|}italic_x start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ∈ caligraphic_D start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT = { ( italic_x start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT , italic_c start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | caligraphic_D start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT | end_POSTSUPERSCRIPT and select the k𝑘kitalic_k samples with the smallest distances:

dist(x,x(j))=1xx(j)x2x(j)2,j=1,,|𝒟L|.formulae-sequencedist𝑥superscript𝑥𝑗1𝑥superscript𝑥𝑗subscriptnorm𝑥2subscriptnormsuperscript𝑥𝑗2𝑗1subscript𝒟𝐿\operatorname{dist}(x,x^{(j)})=1-\frac{x\cdot x^{(j)}}{\|x\|_{2}\cdot\|x^{(j)}% \|_{2}},\quad j=1,\ldots,|\mathcal{D}_{L}|.roman_dist ( italic_x , italic_x start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ) = 1 - divide start_ARG italic_x ⋅ italic_x start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT end_ARG start_ARG ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋅ ∥ italic_x start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG , italic_j = 1 , … , | caligraphic_D start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT | .

Then, we normalize the reciprocal of the cosine distance between the nearest k𝑘kitalic_k data points and x𝑥xitalic_x as weights. We use these weights to compute a weighted average of the concept labels of these k𝑘kitalic_k data points, obtaining the pseudo concept label for x𝑥xitalic_x. In this way, we can obtain pseudo concept labels 𝒄^𝒊𝒎𝒈subscriptbold-^𝒄𝒊𝒎𝒈\bm{\hat{c}_{img}}overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT bold_italic_i bold_italic_m bold_italic_g end_POSTSUBSCRIPT for all x𝒟U𝑥subscript𝒟𝑈x\in\mathcal{D}_{U}italic_x ∈ caligraphic_D start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT.

In our experiments, we find that directly feeding pseudo-concept labels generated by KNN to CEM has satisfactory performance. However, this simple labeling method can lead to alignment issues in the concept embeddings learned by the CEM’s concept encoder. Specifically, the predicted concepts 𝒄^bold-^𝒄\bm{\hat{c}}overbold_^ start_ARG bold_italic_c end_ARG might have no relation to the corresponding features V𝑉Vitalic_V in the image, hindering the effectiveness of CEM as a reliable interpretability tool. Moreover, due to misalignment, this will also degrade the prediction performance of the concept encoder. In the following, we aim to address such a misalignment issue.

Generating Concept Heatmaps. Our above pseudo concept labels via KNN have already contained useful information for prediction. Thus, our goal is to provide these labels with further information about their relation to the corresponding features. To achieve this, we first get another pseudo-concept label via these relations, which are calculated by the similarity between the concept embedding and the input image, namely concept heatmaps. Specifically, given an image x𝑥xitalic_x, we first have its feature map VH×W×m𝑉superscript𝐻𝑊𝑚V\in\mathbb{R}^{H\times W\times m}italic_V ∈ blackboard_R start_POSTSUPERSCRIPT italic_H × italic_W × italic_m end_POSTSUPERSCRIPT for each concept are extracted by V=Ω(x)𝑉Ω𝑥V=\Omega(x)italic_V = roman_Ω ( italic_x ), where ΩΩ\Omegaroman_Ω is the visual encoders, H𝐻Hitalic_H and W𝑊Witalic_W are the height and width of the feature map.

Given V𝑉Vitalic_V and the i𝑖iitalic_i-th concept embedding 𝒄isubscript𝒄𝑖\bm{c}_{i}bold_italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we can obtain a heatmap isubscript𝑖\mathcal{H}_{i}caligraphic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, i.e., a similarity matrix that measures the similarity between the concept and the image can be obtained by computing their cosine distance:

p,q,i=𝒆iVp,q𝒆iVp,q,p=1,,H,q=1,,Wformulae-sequencesubscript𝑝𝑞𝑖superscriptsubscript𝒆𝑖topsubscript𝑉𝑝𝑞normsubscript𝒆𝑖normsubscript𝑉𝑝𝑞formulae-sequence𝑝1𝐻𝑞1𝑊\mathcal{H}_{p,q,i}=\frac{\bm{e}_{i}^{\top}V_{p,q}}{||\bm{e}_{i}||\cdot||V_{p,% q}||},\quad p=1,\ldots,H,\quad q=1,\ldots,Wcaligraphic_H start_POSTSUBSCRIPT italic_p , italic_q , italic_i end_POSTSUBSCRIPT = divide start_ARG bold_italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_p , italic_q end_POSTSUBSCRIPT end_ARG start_ARG | | bold_italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | | ⋅ | | italic_V start_POSTSUBSCRIPT italic_p , italic_q end_POSTSUBSCRIPT | | end_ARG , italic_p = 1 , … , italic_H , italic_q = 1 , … , italic_W

where p,q𝑝𝑞p,qitalic_p , italic_q are the p𝑝pitalic_p-th and q𝑞qitalic_q-th positions in the heatmaps, and p,q,isubscript𝑝𝑞𝑖\mathcal{H}_{p,q,i}caligraphic_H start_POSTSUBSCRIPT italic_p , italic_q , italic_i end_POSTSUBSCRIPT represents a local similarity score between V𝑉Vitalic_V and 𝒆isubscript𝒆𝑖\bm{e}_{i}bold_italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Intuitively, isubscript𝑖\mathcal{H}_{i}caligraphic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents the relation of each part of the image with the i𝑖iitalic_i-the concept. Then, we derive heatmaps for all concepts, denoted as {1,2,,k}subscript1subscript2subscript𝑘\{\mathcal{H}_{1},\mathcal{H}_{2},\ldots,\mathcal{H}_{k}\}{ caligraphic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , caligraphic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT }.

Calculating Concept Scores and Concept Labels. As average pooling performs better in downstream classification tasks yan2023robust , we apply average pooling to the heatmaps to deduce the connection between the image and concepts: si=1PQp=1Pq=1Qp,q,isubscript𝑠𝑖1𝑃𝑄superscriptsubscript𝑝1𝑃superscriptsubscript𝑞1𝑄subscript𝑝𝑞𝑖s_{i}=\frac{1}{P\cdot Q}\sum_{p=1}^{P}\sum_{q=1}^{Q}\mathcal{H}_{p,q,i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_P ⋅ italic_Q end_ARG ∑ start_POSTSUBSCRIPT italic_p = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_q = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_Q end_POSTSUPERSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_p , italic_q , italic_i end_POSTSUBSCRIPT. Intuitively, sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the refined similarity score between the image and concept eisubscript𝑒𝑖{e}_{i}italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Thus, a concept vector 𝒔𝒔\bm{s}bold_italic_s can be obtained, representing the similarity between an image input x𝑥xitalic_x and the set of concepts: 𝒔=(s1,,sk)𝒔superscriptsubscript𝑠1subscript𝑠𝑘top\bm{s}=(s_{1},\ldots,s_{k})^{\top}bold_italic_s = ( italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT.

𝒔𝒔\bm{s}bold_italic_s can be considered as a soft concept label which is got by similarity. Next, we have to transform it into a hard concept label 𝒄^𝒂𝒍𝒊𝒈𝒏subscriptbold-^𝒄𝒂𝒍𝒊𝒈𝒏\bm{\hat{c}_{align}}overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT bold_italic_a bold_italic_l bold_italic_i bold_italic_g bold_italic_n end_POSTSUBSCRIPT. we determine the presence of a concept attribute in an image based on a threshold value derived from an experiment. If the value sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT exceeds this threshold, we consider the image to possess that specific concept attribute and set the concept label to be True. We can obtain predicted concept labels for all unlabeled data. We set the threshold as 0.6.

Alignment of Image. Based on our above discussions, on the one hand, the concept encoder should learn information from 𝒄^𝒊𝒎𝒈subscriptbold-^𝒄𝒊𝒎𝒈\bm{\hat{c}_{img}}overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT bold_italic_i bold_italic_m bold_italic_g end_POSTSUBSCRIPT. On the other hand, it should also get concept embeddings which can get good similarity-based concept labels 𝒄^𝒂𝒍𝒊𝒈𝒏subscriptbold-^𝒄𝒂𝒍𝒊𝒈𝒏\bm{\hat{c}_{align}}overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT bold_italic_a bold_italic_l bold_italic_i bold_italic_g bold_italic_n end_POSTSUBSCRIPT for alignment with the input image. Thus, we need a further alignment loss to achieve these two goals. Specifically, we compute the alignment loss as follows:

align=BCE(𝒄^𝒊𝒎𝒈,𝒄^𝒂𝒍𝒊𝒈𝒏).subscript𝑎𝑙𝑖𝑔𝑛𝐵𝐶𝐸subscriptbold-^𝒄𝒊𝒎𝒈subscriptbold-^𝒄𝒂𝒍𝒊𝒈𝒏\mathcal{L}_{align}=BCE(\bm{\hat{c}_{img}},\bm{\hat{c}_{align}}).caligraphic_L start_POSTSUBSCRIPT italic_a italic_l italic_i italic_g italic_n end_POSTSUBSCRIPT = italic_B italic_C italic_E ( overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT bold_italic_i bold_italic_m bold_italic_g end_POSTSUBSCRIPT , overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT bold_italic_a bold_italic_l bold_italic_i bold_italic_g bold_italic_n end_POSTSUBSCRIPT ) . (3)

4.3 Final Objective

In this section, we will discuss how we derive the process of network optimization. First, we have the concept loss csubscript𝑐\mathcal{L}_{c}caligraphic_L start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT in (1) for enhancing the interpretability of concept embeddings. Also, as the concept embeddings are input into the label predictor to output the final prediction, we also have a task loss between the predictions given by concept bottleneck and ground truth, which is shown in (2). In the context of binary classification tasks, we employ binary cross-entropy (BCE) as our loss function. For multi-class classification tasks, we use cross-entropy as the measure. Finally, to align the images with concept labels, we computed the alignment loss in (3). Formally, the overall loss function of our approach can be formulated as:

=task+λ1c+λ2align,subscript𝑡𝑎𝑠𝑘subscript𝜆1subscript𝑐subscript𝜆2subscript𝑎𝑙𝑖𝑔𝑛\mathcal{L}=\mathcal{L}_{task}+\lambda_{1}\cdot\mathcal{L}_{c}+\lambda_{2}% \cdot\mathcal{L}_{align},caligraphic_L = caligraphic_L start_POSTSUBSCRIPT italic_t italic_a italic_s italic_k end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ caligraphic_L start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋅ caligraphic_L start_POSTSUBSCRIPT italic_a italic_l italic_i italic_g italic_n end_POSTSUBSCRIPT , (4)

where λ1,λ2subscript𝜆1subscript𝜆2\lambda_{1},\lambda_{2}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are hyperparameters for the trade-off between interpretability and accuracy.

5 Experiments

In this section, we will conduct experimental studies on the performance of our framework. Specifically, we will evaluate the utility and interoperability. Details and additional results are in Appendix A due to space limit.

5.1 Experimental Settings

Datasets. We evaluate our methods on three real-world image tasks: CUB, CelebA, and AwA2. See Appendix A.1 for a detailed introduction.

Baseline Models. As there is no ground truth in the supervised setting, here we compare our SSCBM with two baselines in the supervised setting mentioned in Section 3. Concept Bottleneck Model (CBM) koh2020concept : We adopt the same setting and architecture as in the original CBM. Concept Embedding Model (CEM): We follow the same setting as in  espinosa2022concept .

Evaluation Metrics. To evaluate the utility, we consider the accuracy for both class and concept label prediction. Specifically, concept accuracy measures the model’s prediction accuracy for concepts: 𝒞acc=1N1ki=1Nj=1k𝕀(c^j(i)=cj(i)).subscript𝒞𝑎𝑐𝑐1𝑁1𝑘superscriptsubscript𝑖1𝑁superscriptsubscript𝑗1𝑘𝕀superscriptsubscript^𝑐𝑗𝑖superscriptsubscript𝑐𝑗𝑖\mathcal{C}_{acc}=\frac{1}{N}\cdot\frac{1}{k}\sum_{i=1}^{N}\sum_{j=1}^{k}% \mathbb{I}(\hat{c}_{j}^{(i)}=c_{j}^{(i)}).caligraphic_C start_POSTSUBSCRIPT italic_a italic_c italic_c end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ⋅ divide start_ARG 1 end_ARG start_ARG italic_k end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT blackboard_I ( over^ start_ARG italic_c end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT = italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) . Task accuracy measures the model’s performance in predicting downstream task classes: 𝒜acc=1Ni=1N𝕀(𝒚^(i)=𝒚(i)).subscript𝒜𝑎𝑐𝑐1𝑁superscriptsubscript𝑖1𝑁𝕀superscriptbold-^𝒚𝑖superscript𝒚𝑖\mathcal{A}_{acc}=\frac{1}{N}\sum_{i=1}^{N}\mathbb{I}(\bm{\hat{y}}^{(i)}=\bm{y% }^{(i)}).caligraphic_A start_POSTSUBSCRIPT italic_a italic_c italic_c end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT blackboard_I ( overbold_^ start_ARG bold_italic_y end_ARG start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT = bold_italic_y start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) .

To evaluate the interpretability, besides the concept accuracy (due to the structure of CBM, concept accuracy can also reveal the interpretability), similar to previous work koh2020concept ; espinosa2022concept , we will show visualization results. Moreover, we will evaluate the performance of the test-time intervention.

Implementation Details. All experiments are conducted on a Tesla V100S PCIe 32 GB GPU and an Intel Xeon Processor (Skylake, IBRS) CPU. See Appendix A.2 for more details.

5.2 Evalaution Results on Utility

Performance of SSCBM. We first study the performance of SSCBM. As we are focusing on the semi-supervised setting, we will use different ratios (ranging from 1% to 80%) of all data samples as the supervised data and the left ones will be the unsupervised data. The results are shown in Table 1. For the CUB dataset, the concept accuracy improves as the labeled ratio increases, peaking at 95.04% when the labeled ratio is 0.8. The class accuracy shows a consistent upward trend from 62.19% at a 1% labeled ratio to 79.27% at a 60% ratio, with a slight dip at 80%. It is notable that compared with the concept and task accuracy of CEM trained on the whole dataset, our SSCBM can already achieve comparable performance with only 40% of the supervised data.

Compared to CUB, we can see the performance on CeleA and AwA2 fluctuates, i.e., both concept accuracy and task accuracy could decrease when the labeled ratio increases. This is potentially due to the CUB dataset has a more diverse set of concepts (112 selected) compared to CelebA (only 6 selected espinosa2022concept ) and AwA2 (85 selected) and factors such as higher variability in the data or more complex feature relationships. However, we claim that these results are still very close to the CEM under fully supervised training.

Labeled Ratio CUB CelebA AwA2
Concept Task Concept Task Concept Task
0.01 88.38% 62.19% 85.21% 30.69% 89.74% 84.74%
0.05 90.69% 70.43% 83.51% 28.97% 91.90% 84.61%
0.1 91.98% 73.02% 84.86% 29.47% 92.97% 84.20%
0.2 93.19% 75.51% 84.97% 28.91% 89.81% 85.01%
0.4 94.37% 78.05% 84.77% 29.35% 89.81% 84.47%
0.6 94.83% 79.27% 84.81% 28.73% 94.82% 83.40%
0.8 95.04% 79.20% 84.86% 29.09% 95.07% 83.53%
CBM (full supervised training) 93.99% 67.33% 85.68% 31.16% 96.48% 88.71%
CEM (full supervised training) 96.39% 79.82% 85.46% 30.66% 95.91% 87.00%
Table 1: Results of concept and task accuracy for different datasets with different portions of supervised data.
Method CUB CelebA AwA2
Concept Class Concept Class Concept Class
CBM 88.73% 36.66% 68.10% 7.76% 70.48% 8.71%
CEM 87.06% 30.36% 70.95% 13.98% 84.36% 45.74%
SSCBM 91.98% 73.02% 84.86% 29.47% 92.97% 84.20%
Table 2: The accuracy of predicting concepts and class labels across different datasets with labeled ratio = 0.1. We have bolded the best results.

Comparison with Baselines. We then compare both the concept and class accuracy with baselines in three datasets when the labeled ratio is 10%. Results in Table 2 show that our method significantly outperforms other baselines when there is lacking of plenty supervised data. These results also confirm that the classical CBM and CEM are unsuitable for the semi-supervised setting. The success of SSCBM is because our alignment loss can increase the concept prediction accuracy, thus leading to an increase in class accuracy, which indicates that by leveraging joint training on both labeled and unlabeled data and aligning the unlabeled data at the concept level, we get more information.

5.3 Interpretability Evaluation

For interpretability, in the previous section, we have shown that the concept accuracy of SSCBM with a small labeled ratio is very close to that of CEM, indicating it inherits CEM’s interpretability. Besides the concept accuracy, note that in SSCBM, we also have a pseudo label revived by the alignment between the concept embedding and the input saliency map. We use the alignment loss to inherit such alignment. Thus, we will evaluate the alignment performance here to show the faithfulness of the interpretability given by SSCBM. We measure our alignment performance by comparing the correctness of the concept saliency map with concept semantics in Figure 3. See Appendix A.2 for detailed experimental procedures on generating saliency maps. Results show that our concept saliency map matches the concept semantics, indicating the effectiveness of our alignment loss. In Appendix C, we also provide our additional interpretability evaluation in Figure 7-18.

Refer to caption Refer to caption Refer to caption Refer to caption
Figure 3: The concept saliency map for the CUB dataset (savannah sparrow) demonstrates that our proposed SSCBM achieves meaningful alignment between the ground truth concepts and the input image features. The first image on the left is the original input image. The three images on the right show the aligned regions for different concepts using SSCBM. The text above each image indicates the specific concept, the ground truth concept label, and the prediction result given by SSCBM.

5.4 Test-time Intervention

Test-time intervention enables human users to interact with the model in the inference time. We test our test-time intervention by correcting the 10% to 100% ratio of the concept labels in the concept predictor. We adopt individual intervention for CelebA and AwA2, as there are no grouped concepts. For CUB, we perform the group intervention, i.e., intervene in the concepts with associated attribution. For example, the breast color::yellow, breast color::black, and breast color::white are the same concept group. So, we only need to correct the concept label in the group. We expect that the model performance will steadily increase along with the ratio of concept intervention, indicating that the model learned such correct label information and automatically corrected other labels.

Refer to caption Refer to caption
Figure 4: Left: Performance with different ratios of intervened concepts on CUB dataset. Right: An example of successful intervention.

Results in Figure 4 (a) demonstrate our model’s robustness and an increasing trend to learn the information of concept information, indicating our interpretability and model prediction performance. This lies in our loss of alignment in effectively learning the correct information pairs in unlabeled and labeled data. Results in Figure 4 (b) also show that by changing the wing color to brown, we successfully caused the model to predict the Great Crested Flycatcher instead of the Swainson Warbler. More results are in Appendix B.

5.5 Ablation Study

Model CUB CelebA AwA2
Concept Class Concept Class Concept Class
SSCBM (full model) 91.98% 73.02% 84.86% 29.47% 92.97% 84.20%
SSCBM (w/o 𝒄^𝒊𝒎𝒈subscriptbold-^𝒄𝒊𝒎𝒈\bm{\hat{c}_{img}}overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT bold_italic_i bold_italic_m bold_italic_g end_POSTSUBSCRIPT) 88.07% 68.16% 69.82% 26.72% 63.45% 70.74%
SSCBM (w/o 𝒄^𝒂𝒍𝒊𝒈𝒏subscriptbold-^𝒄𝒂𝒍𝒊𝒈𝒏\bm{\hat{c}_{align}}overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT bold_italic_a bold_italic_l bold_italic_i bold_italic_g bold_italic_n end_POSTSUBSCRIPT) 89.25% 68.67% 70.61% 27.61% 76.62% 69.68%
Table 3: Ablation study on the proposed SSCBM with the labeled data ratio = 0.1.

We will finally conduct ablation experiments. It is notable that when λ2=0subscript𝜆20\lambda_{2}=0italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0 in (4), SSCBM will be the original CEM. Thus, by comparing SSCBM with CEM in Table 2, we confirm that the alignment loss is essential for our method. We then give a finer study on the two kinds of pseudo-labels to demonstrate that each cone plays an indispensable role in bolstering the efficacy of SSCBM. Specifically, based on our method in Section 4.2, we have three types of pseudo labels available: pseudo concept label 𝒄^𝒊𝒎𝒈subscriptbold-^𝒄𝒊𝒎𝒈\bm{\hat{c}_{img}}overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT bold_italic_i bold_italic_m bold_italic_g end_POSTSUBSCRIPT, 𝒄^𝒂𝒍𝒊𝒈𝒏subscriptbold-^𝒄𝒂𝒍𝒊𝒈𝒏\bm{\hat{c}_{align}}overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT bold_italic_a bold_italic_l bold_italic_i bold_italic_g bold_italic_n end_POSTSUBSCRIPT, and label 𝒄^bold-^𝒄\bm{\hat{c}}overbold_^ start_ARG bold_italic_c end_ARG predicted by the concept embedding. We first remove 𝒄^𝒊𝒎𝒈subscriptbold-^𝒄𝒊𝒎𝒈\bm{\hat{c}_{img}}overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT bold_italic_i bold_italic_m bold_italic_g end_POSTSUBSCRIPT and calculate our alignment loss using 𝒄^𝒂𝒍𝒊𝒈𝒏subscriptbold-^𝒄𝒂𝒍𝒊𝒈𝒏\bm{\hat{c}_{align}}overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT bold_italic_a bold_italic_l bold_italic_i bold_italic_g bold_italic_n end_POSTSUBSCRIPT and 𝒄^bold-^𝒄\bm{\hat{c}}overbold_^ start_ARG bold_italic_c end_ARG; conversely, we remove 𝒄^𝒂𝒍𝒊𝒈𝒏subscriptbold-^𝒄𝒂𝒍𝒊𝒈𝒏\bm{\hat{c}_{align}}overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT bold_italic_a bold_italic_l bold_italic_i bold_italic_g bold_italic_n end_POSTSUBSCRIPT and calculate the alignment loss using 𝒄^𝒊𝒎𝒈subscriptbold-^𝒄𝒊𝒎𝒈\bm{\hat{c}_{img}}overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT bold_italic_i bold_italic_m bold_italic_g end_POSTSUBSCRIPT and 𝒄^bold-^𝒄\bm{\hat{c}}overbold_^ start_ARG bold_italic_c end_ARG.

From the results presented in Table 3, it is evident that removing the 𝒄^𝒊𝒎𝒈subscriptbold-^𝒄𝒊𝒎𝒈\bm{\hat{c}_{img}}overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT bold_italic_i bold_italic_m bold_italic_g end_POSTSUBSCRIPT component significantly degrades the performance of SSCBM at both the concept level and class level. This indicates that the pseudo-concept labels via KNN contain necessary information about the ground truth. The concept encoder needs to extract information from such labels to get better performance. This situation is similar when removing the 𝒄^𝒂𝒍𝒊𝒈𝒏subscriptbold-^𝒄𝒂𝒍𝒊𝒈𝒏\bm{\hat{c}_{align}}overbold_^ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT bold_italic_a bold_italic_l bold_italic_i bold_italic_g bold_italic_n end_POSTSUBSCRIPT component, indicating that aligning the concept embedding and the input saliency map can further extract useful information from the input image and, thus, is beneficial to improve the performance. Our observations underscore the high degree of joint effectiveness of two kinds of pseudo-concept labels within our objective function, collectively contributing to the enhancement of model prediction and concept label prediction.

6 Conclusion

The training of current CBMs heavily relies on the accuracy and richness of annotated concepts in the dataset. These concept labels are typically provided by experts, which can be costly and require significant resources and effort. Additionally, concept saliency maps frequently misalign with input saliency maps, causing concept predictions to correspond to irrelevant input features - an issue related to annotation alignment. In this problem, we propose SSCBM, a strategy to generate pseudo labels and an alignment loss to solve these two problems. Results show our effectiveness.

References

  • [1] Eric Arazo, Diego Ortego, Paul Albert, Noel E O’Connor, and Kevin McGuinness. Pseudo-labeling and confirmation bias in deep semi-supervised learning. In 2020 International joint conference on neural networks (IJCNN), pages 1–8. IEEE, 2020.
  • [2] Pietro Barbiero, Gabriele Ciravegna, Francesco Giannini, Mateo Espinosa Zarlenga, Lucie Charlotte Magister, Alberto Tonda, Pietro Lio, Frédéric Precioso, Mateja Jamnik, and Giuseppe Marra. Interpretable neural-symbolic concept reasoning. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pages 1801–1825. PMLR, 2023.
  • [3] David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin A Raffel. Mixmatch: A holistic approach to semi-supervised learning. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  • [4] Kushal Chauhan, Rishabh Tiwari, Jan Freyberg, Pradeep Shenoy, and Krishnamurthy Dvijotham. Interactive concept bottleneck models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37(5), pages 5948–5955, 2023.
  • [5] Asma Chebli, Akila Djebbar, and Hayet Farida Marouani. Semi-supervised learning for medical application: A survey. In 2018 International Conference on Applied Smart Systems (ICASS), pages 1–9, 2018.
  • [6] Xing Chen, Biao Ren, Ming Chen, Quanxin Wang, Lixin Zhang, and Guiying Yan. Nllss: predicting synergistic drug combinations based on semi-supervised learning. PLoS computational biology, 12(7):e1004975, 2016.
  • [7] Veronika Cheplygina, Marleen de Bruijne, and Josien P.W. Pluim. Not-so-supervised: A survey of semi-supervised, multi-instance, and transfer learning in medical image analysis. Medical Image Analysis, 54:280–296, 2019.
  • [8] Mateo Espinosa Zarlenga, Pietro Barbiero, Gabriele Ciravegna, Giuseppe Marra, Francesco Giannini, Michelangelo Diligenti, Zohreh Shams, Frederic Precioso, Stefano Melacci, Adrian Weller, et al. Concept embedding models: Beyond the accuracy-explainability trade-off. Advances in Neural Information Processing Systems, 35:21400–21413, 2022.
  • [9] Roman Filipovych and Christos Davatzikos. Semi-supervised pattern classification of medical images: Application to mild cognitive impairment (mci). NeuroImage, 55(3):1109–1119, 2011.
  • [10] Jack Furby, Daniel Cunnington, Dave Braines, and Alun Preece. Can we constrain concept bottleneck models to learn semantically meaningful input features? arXiv preprint arXiv:2402.00912, 2024.
  • [11] Marton Havasi, Sonali Parbhoo, and Finale Doshi-Velez. Addressing leakage in concept bottleneck models. Advances in Neural Information Processing Systems, 35:23386–23397, 2022.
  • [12] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  • [13] Xiangteng He and Yuxin Peng. Fine-grained visual-textual representation learning. IEEE Transactions on Circuits and Systems for Video Technology, 30(2):520–531, 2019.
  • [14] Junlin Hou, Jilan Xu, and Hao Chen. Concept-attention whitening for interpretable skin lesion diagnosis. arXiv preprint arXiv:2404.05997, 2024.
  • [15] Lijie Hu, Liang Liu, Shu Yang, Xin Chen, Hongru Xiao, Mengdi Li, Pan Zhou, Muhammad Asif Ali, and Di Wang. A hopfieldian view-based interpretation for chain-of-thought reasoning. arXiv preprint arXiv:2406.12255, 2024.
  • [16] Lijie Hu, Yixin Liu, Ninghao Liu, Mengdi Huai, Lichao Sun, and Di Wang. Improving faithfulness for vision transformers. arXiv preprint arXiv:2311.17983, 2023.
  • [17] Lijie Hu, Yixin Liu, Ninghao Liu, Mengdi Huai, Lichao Sun, and Di Wang. Seat: stable and explainable attention. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37(11), pages 12907–12915, 2023.
  • [18] Lijie Hu, Chenyang Ren, Zhengyu Hu, Cheng-Long Wang, and Di Wang. Editable concept bottleneck models. arXiv preprint arXiv:2405.15476, 2024.
  • [19] Tri Huynh, Aiden Nibali, and Zhen He. Semi-supervised learning for medical image classification using imbalanced training data. Computer Methods and Programs in Biomedicine, 216:106628, 2022.
  • [20] Aya Abdelsalam Ismail, Julius Adebayo, Hector Corrada Bravo, Stephen Ra, and Kyunghyun Cho. Concept bottleneck generative models. In The Twelfth International Conference on Learning Representations, 2023.
  • [21] Mert Keser, Gesina Schwalbe, Azarm Nowzad, and Alois Knoll. Interpretable model-agnostic plausibility verification for 2d object detectors using domain-invariant concept bottleneck models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3890–3899, 2023.
  • [22] Eunji Kim, Dahuin Jung, Sangha Park, Siwon Kim, and Sungroh Yoon. Probabilistic concept bottleneck models. arXiv preprint arXiv:2306.01574, 2023.
  • [23] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In International conference on machine learning, pages 5338–5348. PMLR, 2020.
  • [24] Songning Lai, Lijie Hu, Junxiao Wang, Laure Berti-Equille, and Di Wang. Faithful vision-language interpretation via concept bottleneck models. In The Twelfth International Conference on Learning Representations, 2023.
  • [25] Samuli Laine and Timo Aila. Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242, 2016.
  • [26] Dong-Hyun Lee et al. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on challenges in representation learning, ICML, volume 3(2), page 896. Atlanta, 2013.
  • [27] Jia Li, Lijie Hu, Zhixian He, **gfeng Zhang, Tianhang Zheng, and Di Wang. Text guided image editing with automatic concept locating and forgetting. arXiv preprint arXiv:2405.19708, 2024.
  • [28] Yinheng Li, Shaofei Wang, Han Ding, and Hang Chen. Large language models in finance: A survey. In Proceedings of the Fourth ACM International Conference on AI in Finance, pages 374–382, 2023.
  • [29] Ziwei Liu, ** Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015.
  • [30] Lucie Charlotte Magister, Dmitry Kazhdan, Vikash Singh, and Pietro Liò. Gcexplainer: Human-in-the-loop concept-based explanations for graph neural networks. arXiv preprint arXiv:2107.11889, 2021.
  • [31] Todd K Moon. The expectation-maximization algorithm. IEEE Signal processing magazine, 13(6):47–60, 1996.
  • [32] Li Niu, Qingtao Tang, Ashok Veeraraghavan, and Ashutosh Sabharwal. Learning from noisy web data with category-level supervision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7689–7698, 2018.
  • [33] Tuomas Oikarinen, Subhro Das, Lam Nguyen, and Lily Weng. Label-free concept bottleneck models. In International Conference on Learning Representations, 2023.
  • [34] Yassine Ouali, Céline Hudelot, and Myriam Tami. An overview of deep semi-supervised learning. arXiv preprint arXiv:2006.05278, 2020.
  • [35] Hieu Pham, Zihang Dai, Qizhe Xie, and Quoc V Le. Meta pseudo labels. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11557–11568, 2021.
  • [36] Yoshihide Sawada and Keigo Nakamura. Concept bottleneck model with additional unsupervised concepts. IEEE Access, 10:41758–41765, 2022.
  • [37] Ivaxi Sheth and Samira Ebrahimi Kahou. Auxiliary losses for learning generalizable concept-based models. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  • [38] Kihyuk Sohn, David Berthelot, Nicholas Carlini, Zizhao Zhang, Han Zhang, Colin A Raffel, Ekin Dogus Cubuk, Alexey Kurakin, and Chun-Liang Li. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Advances in neural information processing systems, 33:596–608, 2020.
  • [39] Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in neural information processing systems, 30, 2017.
  • [40] Arun James Thirunavukarasu, Darren Shu Jeng Ting, Kabilan Elangovan, Laura Gutierrez, Ting Fang Tan, and Daniel Shu Wei Ting. Large language models in medicine. Nature medicine, 29(8):1930–1940, 2023.
  • [41] Jesper E Van Engelen and Holger H Hoos. A survey on semi-supervised learning. Machine learning, 109(2):373–440, 2020.
  • [42] Han Xuanyuan, Pietro Barbiero, Dobrik Georgiev, Lucie Charlotte Magister, and Pietro Liò. Global concept-based interpretability for graph neural networks via neuron analysis. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37(9), pages 10675–10683, 2023.
  • [43] An Yan, Yu Wang, Yiwu Zhong, Zexue He, Petros Karypis, Zihan Wang, Chengyu Dong, Amilcare Gentili, Chun-Nan Hsu, **gbo Shang, et al. Robust and interpretable medical image classifiers via concept bottleneck models. arXiv preprint arXiv:2310.03182, 2023.
  • [44] Shu Yang, Lijie Hu, Lu Yu, Muhammad Asif Ali, and Di Wang. Human-ai interactions in the communication era: Autophagy makes large models achieving local optima. arXiv preprint arXiv:2402.11271, 2024.
  • [45] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc concept bottleneck models. In The Eleventh International Conference on Learning Representations, 2023.
  • [46] Gang Zhang, Shan-Xing Ou, Yong-Hui Huang, and Chun-Ru Wang. Semi-supervised learning methods for large scale healthcare data analysis. International Journal of Computers in Healthcare, 2(2):98–110, 2015.

Appendix A Details of Experimental Setup

A.1 Datasets

We evaluate our methods on three real-world image tasks: CUB, CelebA, and AwA2.

  • CUB [13]: the Caltech-UCSD Birds-200-2011 (CUB) dataset, which includes a total of 11,788 avian images. The objective is to accurately categorize these birds into one of 200 distinct species. Following [8], we use k = 112 binary bird attributes representing wing color, beak shape, etc.

  • CelebA [29]: the Large-scale CelebFaces Attributes dataset, in the CelebA task, there are 6 balanced incomplete concept annotations and each image can be one of the 256 classes.

  • AwA2 [32]: Animals with Attributes 2 consists of in total 37322 images distributed in 50 animal categories. The AwA2 also provides a category-attribute matrix, which contains an 85-dim attribute vector (e.g., color, stripe, furry, size, and habitat) for each category.

CUB CelebA AwA2
#Images 11,788 202,599 37,322
#Classes 200 10,177 50
#Concepts 312 40 85
Table 4: Statistics of the datasets used in our experiments.

A.2 Implementation Details

First, we resize the images to an input size of 299 x 299 (64 x 64 for CelebA). Subsequently, we employ ResNet34 [12] as the backbone to transform the input into latent code, followed by a fully connected layer to convert it into concept embeddings of size 16 (32 for CUB). During pseudo-labeling, we also utilize ResNet34 with the KNN algorithm with k = 2. We set λ1=1subscript𝜆11\lambda_{1}=1italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 and λ2=0.1subscript𝜆20.1\lambda_{2}=0.1italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.1 and utilize the SGD optimizer with a learning rate of 0.05 and a regularization coefficient of 5e-6. We train SSCBM for 100 epochs with a batch size of 256 (for AwA2, the batch size is 32 due to the large size of individual images). We repeat each experiment 5 times and report the average results.

To construct the concept saliency map, we first upsample the heatmaps {1,2,,k}subscript1subscript2subscript𝑘\{\mathcal{H}_{1},\mathcal{H}_{2},\ldots,\mathcal{H}_{k}\}{ caligraphic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , caligraphic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } calculated in Section 4.2 to the size H×W𝐻𝑊H\times Witalic_H × italic_W (the original image size). Then, we create a mask based on the value intensities, with higher values corresponding to darker colors.

Appendix B Test-time Intervention

Results in Figure 5 demonstrate our model’s robustness and an increasing trend to learn the information of concept information, indicating our interpretability and model prediction performance. This lies in our loss of alignment in effectively learning the correct information pairs in unlabeled and labeled data.

Refer to caption
Figure 5: Test-time Intervention on CUB and AwA2 dataset.

Here, we present some successful examples of Test-time Intervention illustrated in Figure 6. The first two on the left show examples from the CUB dataset. In the top left image, by changing the wing color to brown, we successfully caused the model to predict the Great Crested Flycatcher instead of the Swainson Warbler. In the bottom left, because the model initially failed to notice that the upper part of the bird was black, it misclassified it as Vesper Sparrow. Through test-time intervention, we successfully made it predicted the bird was a Grasshopper Sparrow. The results on the right side of the image are from the AwA2 dataset. We successfully made the model predict correctly by modifying concepts at test time. For example, in the top right image, by modifying the concept of ’fierce’ for the orca, we prevented it from being predicted as a horse. In the bottom right, we successfully made the model recognize the bat through the color of the bat.

Refer to caption
Figure 6: Examples of Test-time Intervention.

Appendix C Additional Interpretability Evaluation

We provide our additional interpretability evaluation in Figure 7 - 18 as follows.

Refer to caption Refer to caption Refer to caption Refer to caption
Figure 7: Concept saliency map on CUB dataset (cardinal) shows reasonable localization of the ground truth concept regions in the input image.
Refer to caption Refer to caption Refer to caption Refer to caption
Figure 8: Concept saliency map on CUB dataset (clark) shows reasonable localization of the ground truth concept regions in the input image.
Refer to caption Refer to caption Refer to caption Refer to caption
Figure 9: Concept saliency map on CUB dataset (bobolink) shows reasonable localization of the ground truth concept regions in the input image.
Refer to caption Refer to caption Refer to caption Refer to caption
Figure 10: Concept saliency map on CUB dataset (cape glossy starling) shows reasonable localization of the ground truth concept regions in the input image.
Refer to caption Refer to caption Refer to caption Refer to caption
Figure 11: Concept saliency map on CUB dataset (elegant tern) shows reasonable localization of the ground truth concept regions in the input image.
Refer to caption Refer to caption Refer to caption Refer to caption
Figure 12: Concept saliency map on CUB dataset (heermann gull) shows reasonable localization of the ground truth concept regions in the input image.
Refer to caption Refer to caption Refer to caption Refer to caption
Figure 13: Concept saliency map on CUB dataset (horned puffin) shows reasonable localization of the ground truth concept regions in the input image.
Refer to caption Refer to caption Refer to caption Refer to caption
Figure 14: Concept saliency map on CUB dataset (nashville warbler) shows reasonable localization of the ground truth concept regions in the input image.
Refer to caption Refer to caption Refer to caption Refer to caption
Figure 15: Concept saliency map on CUB dataset (slaty backed gull) shows reasonable localization of the ground truth concept regions in the input image.
Refer to caption Refer to caption Refer to caption Refer to caption
Figure 16: Concept saliency map on CUB dataset (white breasted nuthatch) shows reasonable localization of the ground truth concept regions in the input image.
Refer to caption Refer to caption Refer to caption Refer to caption
Figure 17: Concept saliency map on CUB dataset (white necked raven) shows reasonable localization of the ground truth concept regions in the input image.
Refer to caption Refer to caption Refer to caption Refer to caption
Figure 18: Concept saliency map on CUB dataset (white crowned sparrow) shows reasonable localization of the ground truth concept regions in the input image.

Appendix D Limitations

While we solve a small portion of annotation problems by semi-supervised learning, semi-supervised models may not be suitable for all types of tasks or datasets. It is more effective that the data distribution is smooth. However, this is the limitation of semi-supervised learning, not our methods.

Appendix E Broader Impact

The training of current CBMs heavily relies on the accuracy and richness of annotated concepts in the dataset. These concept labels are typically provided by experts, which can be costly and require significant resources and effort. Additionally, concept saliency maps frequently misalign with input saliency maps, causing concept predictions to correspond to irrelevant input features - an issue related to annotation alignment. In this problem, we propose SSCBM, a strategy to generate pseudo labels and an alignment loss to solve these two problems. Results show our effectiveness. This method has practical use in the real world.