Semi-supervised Concept Bottleneck Models

Lijie Hu^1,2, Tianhao Huang^1,2,3, Huanyi Xie^1,2,4,
Chenyang Ren^1,2,5, Zhengyu Hu^1,2,6, Lu Yu⁷, and Di Wang^1,2
¹Provable Responsible AI and Data Analytics (PRADA) Lab
²KAUST ³Nankai University ⁴Harbin Institute of Technology
⁵Shanghai Jiao Tong University ⁶HKUST ⁷Ant Group

Abstract

Concept Bottleneck Models (CBMs) have garnered increasing attention due to their ability to provide concept-based explanations for black-box deep learning models while achieving high final prediction accuracy using human-like concepts. However, the training of current CBMs heavily relies on the accuracy and richness of annotated concepts in the dataset. These concept labels are typically provided by experts, which can be costly and require significant resources and effort. Additionally, concept saliency maps frequently misalign with input saliency maps, causing concept predictions to correspond to irrelevant input features - an issue related to annotation alignment. To address these limitations, we propose a new framework called SSCBM (Semi-supervised Concept Bottleneck Model). Our SSCBM is suitable for practical situations where annotated data is scarce. By leveraging joint training on both labeled and unlabeled data and aligning the unlabeled data at the concept level, we effectively solve these issues. We proposed a strategy to generate pseudo labels and an alignment loss. Experiments demonstrate that our SSCBM is both effective and efficient. With only 20% labeled data, we achieve 93.19% (96.39% in a fully supervised setting) concept accuracy and 75.51% (79.82% in a fully supervised setting) prediction accuracy.

1 Introduction

Recently, deep learning models, such as ResNet he2016deep , often feature complex non-linear architectures, making it difficult for end-users to understand and trust their decisions. This lack of interpretability is a significant obstacle to their adoption, especially in critical fields such as healthcare thirunavukarasu2023large and finance li2023large , where transparency is crucial. Explainable artificial intelligence (XAI) models have been developed to meet the demand for transparency, providing insights into their behavior and internal mechanisms hu2023seat ; hu2023improving ; yang2024human ; hu2024hopfieldian . Concept Bottleneck Models (CBMs) koh2020concept are particularly notable among these XAI models for their ability to clarify the prediction process of end-to-end AI models. CBMs introduce a bottleneck layer that incorporates human-understandable concepts. During prediction, CBMs first predict concept labels from the original input and then use these predicted concepts in the bottleneck layer to determine the final classification label. This approach results in a self-explanatory decision-making process that users can comprehend.

Refer to caption — Figure 1: (a) A sample of sparrow class with complete concept labels. (b) A sample of sparrow class with incomplete concept labels. (c) A sample of misalignment between input features and concepts resulting from existing CBM methods. Our framework simultaneously utilizes both (a) and (b) types of data and addresses the issue of (c) through an alignment loss.

A major issue in original CBMs is the need for expert labeling, which is costly in practice. Some researchers address this problem through unsupervised learning. For example, oikarinen2023label proposes a Label-free CBM that transforms any neural network into an interpretable CBM without requiring labeled concept data while maintaining high accuracy. Similarly, Post-hoc Concept Bottleneck models yuksekgonul2022post can be applied to various neural networks without compromising performance, preserving interpretability advantages. However, these methods have three issues. First, those unsupervised methods heavily rely on large language models like GPT-3, which have reliability issues lai2023faithful . Second, the concepts extracted by these models lack evaluation metrics, undermining their interpretability. Third, the assumption that no concept labels are available is too stringent in practice. In reality, obtaining a small portion of concept labels is feasible and cost-effective. Therefore, we can maximize the use of this small labeled concept dataset. This is the motivation for introducing our framework, which focuses on the semi-supervised setting in CBM.

In this paper, we introduce a framework called the SSCBM (Semi-supervised Concept Bottleneck Model). Compared to a supervised setting, semi-supervised CBMs have two main challenges. First, obtaining concept embeddings requires concept labels, so we need to generate pseudo labels to obtain these concept embeddings. To achieve this, SSCBM uses a KNN-based algorithm to assign pseudo-concept labels for unlabeled data.

While such a simple pseudo-labeling method is effective and has acceptable classification accuracy, we also find that the concept saliency map often misaligns with the input saliency map, meaning concept predictions frequently correspond to irrelevant input features. This misalignment often arises from inaccurate concept annotations or unclear relationships between input features and concepts, which is closely related to the broader issue of annotation alignment. In fact, in the supervised setting, there is a similar misalignment issue furby2024can . Existing research seeks to improve alignment by connecting textual and image information hou2024concept . However, these methods only focus on the supervised setting and cannot be directly applied to our settings because our pseudo-labels are noisy. Our framework achieves excellent performance in both concept accuracy and concept saliency alignment by leveraging joint training on both labeled and unlabeled data and aligning the unlabeled data at the concept level. To achieve this, we leverage the relevance between the input image and the concept and get other pseudo-concept labels based on these similarity scores. Finally, we align these two types of pseudo-concept labels to give the concept encoder the ability to extract useful information from features while also inheriting the ability to align concept embeddings with the input.

Comprehensive experiments on benchmark datasets demonstrate that our SSCBM is both efficient and effective. Our contributions are summarized as follows.

•

We propose the SSCBM, a framework designed to solve the semi-supervised annotation problem, which holds practical significance in real-world applications. To the best of our knowledge, we are the first to tackle these two problems within a single framework, elucidating the behavior of CBMs through semi-supervised alignment.
•

Our framework addresses the semi-supervised annotation problem alongside the concept semantics alignment problem in a simple and clever manner. We first use the KNN algorithm to assign a pseudo-label to each unlabeled data, which has been experimentally proven to be simple and effective. Then, we compute a heatmap between concept embeddings and the input. After applying a threshold, we obtain the predicted alignment label. Finally, we optimize the alignment loss between these two pseudo-concept labels to mitigate the misalignment issue.
•

Comprehensive experiments demonstrate the superiority of our SSCBM in annotation and concept-saliency alignment, indicating its efficiency and effectiveness. With only 1% labeled data, we achieved 88.38% concept accuracy and 62.19% predicted accuracy. With 20% labeled data, we achieve 93.19% (96.39% in a fully supervised setting) concept accuracy and 75.51% (79.82% in a fully supervised setting) predicted accuracy.

2 Related Work

Concept Bottleneck Models.

Concept Bottleneck Model (CBM) koh2020concept is an innovative deep-learning approach for image classification and visual reasoning. By introducing a concept bottleneck layer into deep neural networks, CBMs enhance model generalization and interpretability by learning specific concepts. However, CBMs face two primary challenges: their performance often lags behind that of models without the bottleneck layer due to incomplete information extraction, and they rely heavily on laborious dataset annotation. Researchers have explored various solutions to these challenges. chauhan2023interactive extended CBMs into interactive prediction settings by introducing an interaction policy to determine which concepts to label, thereby improving final predictions. oikarinen2023label addressed CBM limitations by proposing a Label-free CBM, which transforms any neural network into an interpretable CBM without requiring labeled concept data, maintaining high accuracy. Post-hoc Concept Bottleneck models yuksekgonul2022post can be applied to various neural networks without compromising performance, preserving interpretability advantages. Related work in the image domain includes studies havasi2022addressing ; kim2023probabilistic ; keser2023interpretable ; sawada2022concept ; sheth2023auxiliary ; li2024text ; hu2024editable . In the graph concept field, magister2021gcexplainer provide a global interpretation for Graph Neural Networks (GNNs) by map** graphs into a concept space through clustering and offering a human-in-the-loop evaluation. xuanyuan2023global ; Barbiero2023interpretable extend this approach by incorporating both global and local explanations. For local explanations, they define a concept set, with each neuron represented as a vector with Boolean values indicating concept activation. However, existing works rarely consider semi-supervised settings, which are practical in real-world applications. Our framework addresses these issues effectively.

Semi-supervised Learning.

Semi-supervised learning (SSL) combines the two main tasks of machine learning: supervised learning and unsupervised learning van2020survey . It is typically applied in scenarios where labeled data is scarce. Examples include computer-aided diagnosis zhang2015semi ; 8651980 , medical image analysis HUYNH2022106628 ; CHEPLYGINA2019280 ; FILIPOVYCH20111109 , and drug discovery chen2016nllss . In these cases, collecting detailed annotated data by experts requires considerable time and effort. However, under the assumption of data distribution, unlabeled data can also assist in building better classifiers van2020survey . SSL, also known as self-labeling or self-teaching in its earliest forms, involves the model iteratively labeling a portion of the unlabeled data and adding it to the training set for the next round of training ouali2020overview . The expectation-maximization (EM) algorithm proposed by moon1996expectation uses both labeled and unlabeled data to produce maximum likelihood estimates of parameters. laine2016temporal and tarvainen2017mean focus on consistency regularization. II-model laine2016temporal combines both supervised cross-entropy loss and unsupervised consistency loss while perturbing the model and data based on the consistency constraint assumption. A temporal ensembling model integrates predictions from models at various time points. Mean teacher tarvainen2017mean addresses the slow updating issue of the temporal ensembling model on large datasets by averaging model weights instead of predicting labels. MixMatch NEURIPS2019_1cd138d0 unifies and refines the previous approaches of consistency regularization, entropy minimization, and traditional regularization into a single loss function, achieving excellent results. Pseudo labeling, as an effective tool for reducing the entropy of unlabeled data lee2013pseudo , has been increasingly attracting the attention of researchers in the field of semi-supervised learning. arazo2020pseudo proposes that directly using the model’s predictions as pseudo-labels can achieve good results. FixMatch sohn2020fixmatch predicts and retains the model, generating high-confidence pseudo-labels. pham2021meta continuously adjusts the teacher based on feedback from the student, thereby generating better pseudo-labels. While there has been a plethora of work in the semi-supervised learning field, the focus on semi-supervised concept bottleneck models remains largely unexplored. Our work focuses on this new area.

3 Preliminaries

Concept Bottleneck Models koh2020concept .

We consider a classification task with a concept set denoted as $\mathcal{C}=\{p_{1},\cdots,p_{k}\}$ with each $p_{i}$ is a concept given by experts or LLMs, and a training dataset represented $\mathcal{D}=\{(x^{(i)},y^{(i)},c^{(i)})\}_{i=1}^{N}$ . Here, for $i\in[N]$ , $x^{(i)}\in\mathcal{X}\subseteq\mathbb{R}^{d}$ represents the feature vector (e.g., an image’s pixels), $y^{(i)}\in\mathcal{Y}\subseteq\mathbb{R}^{l}$ denotes the label ( $l$ is the number of classes), ${c}^{(i)}=(c_{i}^{1},\cdots,c_{i}^{k})\in\mathbb{R}^{k}$ represents the concept vector (a binary vector of length $k$ , where each value indicates whether the input $x^{(i)}$ contains that concept). In CBMs, the goal is to learn two representations: one called concept encoder that transforms the input space to the concept space, denoted as $g:\mathbb{R}^{d}\to\mathbb{R}^{k}$ , and another called label predictor that maps the concept space to the downstream prediction space, denoted as $f:\mathbb{R}^{k}\to\mathbb{R}^{l}$ . Usually, the map $f$ is linear. For any input $x$ , we aim to ensure that its predicted concept vector $\hat{c}=g(x)$ and prediction $\hat{y}=f(g(x))$ are close to their underlying counterparts, thus capturing the essence of the original CBMs.

Concept Embedding Models espinosa2022concept .

As the original CBM relies solely on concept features to determine the model’s predictions, compared to canonical deep neural networks, it will degrade the prediction performance. To further improve the performance of CBMs, CEM was developed by espinosa2022concept . It achieves this by learning interpretable high-dimensional concept representations (i.e., concept embeddings), thus maintaining high task accuracy while obtaining concept representations that contain meaningful semantic information. For CEMs, we use the same setting as that of espinosa2022concept ; ismail2023concept . For each input $x$ , the concept encoder learns $k$ concept representations $\hat{c}_{1},\hat{c}_{2},\ldots,\hat{c}_{k}$ , each corresponding to one of the $k$ ground truth concepts in the training dataset. In CEMs, each concept $c_{i}$ is represented using two embeddings $\bm{\hat{c}}_{i}^{+},\bm{\hat{c}}_{i}^{-}\in\mathbb{R}^{m}$ , each with specific semantics, i.e., the concept is TRUE (activate state) and concept is FALSE (negative state), where hyper-parameter $m$ is the embedding dimension. We use a DNN $\psi(x)$ to learn a latent representation $\bm{h}\in\mathbb{R}^{n_{h}}$ , to be used as input of the CEM’s embedding generator, where $n_{h}$ is the dimension of the latent representation. CEM’s embedding generator $\phi$ feeds $\bm{h}$ into two concept-specific fully connected layers in order to learn two concept embeddings in $\mathbb{R}^{m}$ .

\bm{\hat{c}}_{i}=\phi_{i}\left(\bm{h}\right)=a\left(W_{i}\bm{h}+\bm{b}_{i}% \right).

Then we use a differential scoring function $s:\mathbb{R}^{2m}\rightarrow\left[0,1\right]$ , to achieve the alignment of concept embeddings $\bm{\hat{c}}_{i}^{+},\bm{\hat{c}}_{i}^{-}$ and ground-truth concepts $c_{i}$ . It is trained to predict the probability $\hat{p}_{i}:=s\left(\left[\bm{\hat{c}}_{i}^{+},\bm{\hat{c}}_{i}^{-}\right]^{% \top}\right)=\sigma\left(W_{s}\left[\left[\bm{\hat{c}}_{i}^{+},\bm{\hat{c}}_{i% }^{-}\right]^{\top}\right]+\bm{b}_{s}\right)$ of concept $c_{i}$ being active in embedding space. We get the final concept embedding $\bm{\hat{c}_{i}}$ , as follows:

\bm{\hat{c}}_{i}:=\hat{p}_{i}\hat{c}_{i}^{+}+(1-\hat{p}_{i})\hat{c}_{i}^{-}.

At this point, we understand that we can obtain high-quality concept embeddings rich in semantics through CEMs. In the subsequent section 4, we will effectively utilize these representations of concepts and further optimize their interpretability through our proposed framework SSCBM.

Semi-supervised Setting.

Now, we consider the setting of semi-supervised learning for concept bottleneck models. As mentioned earlier, a typical training dataset for CBMs can be represented as $\mathcal{D}=\{(x^{(i)},y^{(i)},c^{(i)})\}_{i=1}^{N}$ , where $x^{(i)}\in\mathcal{X}$ represents the input feature. However, in semi-supervised learning tasks, the set of feature vectors typically consists of two parts, $\mathcal{X}=\{\mathcal{X}_{L},\mathcal{X}_{U}\}$ , where $\mathcal{X}_{L}$ represents a small subset of labeled data and $\mathcal{X}_{U}$ represents the remaining unlabeled data. Generally we assume $|\mathcal{X}_{L}|\ll|\mathcal{X}_{U}|$ . We assume that $x^{(j)}\in\mathcal{X}_{L}$ is labeled with a concept vector $c^{(j)}$ and a label $y^{(j)}$ , and for any $x^{(i)}\in\mathcal{X}$ , there only exists a corresponding label $y^{(i)}\in\mathcal{Y}$ . Note that our method can be directly extended to the fully semi-supervised case where even the classification labels for feature vectors in $\mathcal{X}_{U}$ are unknown.

Under these settings, given a training dataset $\mathcal{D}=\mathcal{D}_{L}\cup\mathcal{D}_{U}$ that includes both labeled and unlabeled data, the goal is to train a CBM using both the labeled data $\mathcal{D}_{L}$ and unlabeled data $\mathcal{D}_{U}$ . This aims to get better map**s $g:\mathbb{R}^{d}\to\mathbb{R}^{k}$ and $f:\mathbb{R}^{k}\to\mathbb{R}^{l}$ than those trained by using only labeled data, ultimately achieving higher task accuracy and its corresponding concept-based explanation.

4 Semi-supervised Concept Bottleneck Models

In this section, we will elaborate on the details of the proposed SSCBM framework, which is shown in Figure 2. SSCBM follows the main idea of CEM. Specifically, to learn a good concept encoder, we use different processing methods for labeled and unlabeled data. Labeled data first passes through a feature extractor $\psi$ to be transformed into a latent representation $\bm{h}$ , which then enters the concept embedding extractor to obtain the concept embeddings and predicted concept vector $\bm{\hat{c}}$ for the labeled data, which is compared with the ground truth concept to compute the concept loss. Additionally, the label predictor predicts $\bm{\hat{y}}$ based on $\bm{\hat{c}}$ , and calculates the task loss.

For unlabeled data, we first extract image features $V$ from the input using an image encoder. Then, we use the KNN algorithm to assign a pseudo-label $\bm{\hat{c}_{img}}$ to each unlabeled data, which has been experimentally proven to be simple and effective. In the second step, we compute a heatmap between concept embeddings and the input. After applying a threshold, we obtain the predicted alignment label $\bm{\hat{c}_{align}}$ . Finally, we compute the alignment loss between $\bm{\hat{c}_{img}}$ and $\bm{\hat{c}_{align}}$ . During each training epoch, we simultaneously compute these losses and update the model parameters based on the gradients.

4.1 Label Anchor: Concept Embedding Encoder

Concept Embeddings. As described in Section 3, we obtain high-dimensional concept representations with meaningful semantics based on CEMs. Thus, our concept encoder should extract useful information from both labeled and unlabeled data.

For the labeled training data $\mathcal{D}_{L}=\{(x^{(i)},y^{(i)},c^{(i)})\}_{i=1}^{|\mathcal{D}_{L}|}$ , we follow the original CEM espinosa2022concept , i.e., using a backbone network (e.g., ResNet50) to extract features $\bm{h}=\{\psi(x^{(i)})\}_{i=1}^{|\mathcal{D}_{L}|}$ . Then, for each feature, it passes through a embedding generator to obtain concept embeddings $\bm{\hat{c}}_{i}\in\mathbb{R}^{m\times k}$ for $i\in[k]$ . After passing through fully connected layers and activation layers, we obtain the predicted binary concept vector $\bm{\hat{c}}\in\mathbb{R}^{k}$ for the labeled data. The specific process can be represented by the following expression:

\bm{\hat{c}}^{(j)}_{i},h^{(j)}=\sigma(\phi(\psi(x^{(j)})),\quad i=1,\ldots,k,% \quad j=1,\ldots,|\mathcal{D}_{L}|,

where $\psi$ , $\phi$ and $\sigma$ represent the backbone, embedding generator and activation function, respectively.

To enhance the interpretability of concept embeddings, we calculate the concept loss utilize binary cross-entropy to optimize the accuracy of concept predictions by computing $\mathcal{L}_{c}$ based on predicted binary concept vector $\bm{\hat{c}}$ and ground truth concept labels $\bm{c}$ :

\mathcal{L}_{c}=BCE(\bm{\hat{c}},\bm{c}),

(1)

where $BCE$ is the binary cross entropy loss.

Task Loss. Since our ultimate task is classification, we also need to incorporate the task loss for the final prediction performance. After obtaining the predicted concept $\bm{\hat{c}}$ , we use a label predictor to predict the final class $\bm{\hat{y}}$ . Then we can define the task loss function using the categorical cross-entropy loss to train our classification model as follows:

\mathcal{L}_{task}=CE(\bm{\hat{y}},\bm{y}),

(2)

Note that for the unlabeled data, we can also calculate the task loss since their class labels are known, in order to make full use of the data.

4.2 Unlabel Alignment: Image-Textual Semantics Alignment

Pseudo Labeling. Unlike the supervised data, CEM cannot directly extract useful information from the unlabeled data as the concept encoder is a supervised training architecture. Thus, in practical situations lacking labeled data, one direct approach is to get high-quality pseudo concept labels to train the model. Below, we will introduce the method we use to obtain pseudo concept labels.

Firstly, we can natually think of measuring the similarity between images by calculating the distance in the cosine space. Based on this idea, we can assign pseudo labels to unlabeled data by finding labeled data with similar image features to them. Specifically, for each unlabeled training data $x\in\mathcal{D}_{U}=\{(x^{(i)},y^{(i)})\}_{i=1}^{|\mathcal{D}_{U}|}$ , we calculate its cosine distance with all labeled data points $x^{(j)}\in\mathcal{D}_{L}=\{(x^{(j)},y^{(j)},c^{(j)})\}_{j=1}^{|\mathcal{D}_{L% }|}$ and select the $k$ samples with the smallest distances:

\operatorname{dist}(x,x^{(j)})=1-\frac{x\cdot x^{(j)}}{\|x\|_{2}\cdot\|x^{(j)}% \|_{2}},\quad j=1,\ldots,|\mathcal{D}_{L}|.

Then, we normalize the reciprocal of the cosine distance between the nearest $k$ data points and $x$ as weights. We use these weights to compute a weighted average of the concept labels of these $k$ data points, obtaining the pseudo concept label for $x$ . In this way, we can obtain pseudo concept labels $\bm{\hat{c}_{img}}$ for all $x\in\mathcal{D}_{U}$ .

In our experiments, we find that directly feeding pseudo-concept labels generated by KNN to CEM has satisfactory performance. However, this simple labeling method can lead to alignment issues in the concept embeddings learned by the CEM’s concept encoder. Specifically, the predicted concepts $\bm{\hat{c}}$ might have no relation to the corresponding features $V$ in the image, hindering the effectiveness of CEM as a reliable interpretability tool. Moreover, due to misalignment, this will also degrade the prediction performance of the concept encoder. In the following, we aim to address such a misalignment issue.

Generating Concept Heatmaps. Our above pseudo concept labels via KNN have already contained useful information for prediction. Thus, our goal is to provide these labels with further information about their relation to the corresponding features. To achieve this, we first get another pseudo-concept label via these relations, which are calculated by the similarity between the concept embedding and the input image, namely concept heatmaps. Specifically, given an image $x$ , we first have its feature map $V\in\mathbb{R}^{H\times W\times m}$ for each concept are extracted by $V=\Omega(x)$ , where $\Omega$ is the visual encoders, $H$ and $W$ are the height and width of the feature map.

Given $V$ and the $i$ -th concept embedding $\bm{c}_{i}$ , we can obtain a heatmap $\mathcal{H}_{i}$ , i.e., a similarity matrix that measures the similarity between the concept and the image can be obtained by computing their cosine distance:

\mathcal{H}_{p,q,i}=\frac{\bm{e}_{i}^{\top}V_{p,q}}{||\bm{e}_{i}||\cdot||V_{p,% q}||},\quad p=1,\ldots,H,\quad q=1,\ldots,W

where $p,q$ are the $p$ -th and $q$ -th positions in the heatmaps, and $\mathcal{H}_{p,q,i}$ represents a local similarity score between $V$ and $\bm{e}_{i}$ . Intuitively, $\mathcal{H}_{i}$ represents the relation of each part of the image with the $i$ -the concept. Then, we derive heatmaps for all concepts, denoted as $\{\mathcal{H}_{1},\mathcal{H}_{2},\ldots,\mathcal{H}_{k}\}$ .

Calculating Concept Scores and Concept Labels. As average pooling performs better in downstream classification tasks yan2023robust , we apply average pooling to the heatmaps to deduce the connection between the image and concepts: $s_{i}=\frac{1}{P\cdot Q}\sum_{p=1}^{P}\sum_{q=1}^{Q}\mathcal{H}_{p,q,i}$ . Intuitively, $s_{i}$ is the refined similarity score between the image and concept ${e}_{i}$ . Thus, a concept vector $\bm{s}$ can be obtained, representing the similarity between an image input $x$ and the set of concepts: $\bm{s}=(s_{1},\ldots,s_{k})^{\top}$ .

$\bm{s}$ can be considered as a soft concept label which is got by similarity. Next, we have to transform it into a hard concept label $\bm{\hat{c}_{align}}$ . we determine the presence of a concept attribute in an image based on a threshold value derived from an experiment. If the value $s_{i}$ exceeds this threshold, we consider the image to possess that specific concept attribute and set the concept label to be True. We can obtain predicted concept labels for all unlabeled data. We set the threshold as 0.6.

Alignment of Image. Based on our above discussions, on the one hand, the concept encoder should learn information from $\bm{\hat{c}_{img}}$ . On the other hand, it should also get concept embeddings which can get good similarity-based concept labels $\bm{\hat{c}_{align}}$ for alignment with the input image. Thus, we need a further alignment loss to achieve these two goals. Specifically, we compute the alignment loss as follows:

\mathcal{L}_{align}=BCE(\bm{\hat{c}_{img}},\bm{\hat{c}_{align}}).

(3)

4.3 Final Objective

In this section, we will discuss how we derive the process of network optimization. First, we have the concept loss $\mathcal{L}_{c}$ in (1) for enhancing the interpretability of concept embeddings. Also, as the concept embeddings are input into the label predictor to output the final prediction, we also have a task loss between the predictions given by concept bottleneck and ground truth, which is shown in (2). In the context of binary classification tasks, we employ binary cross-entropy (BCE) as our loss function. For multi-class classification tasks, we use cross-entropy as the measure. Finally, to align the images with concept labels, we computed the alignment loss in (3). Formally, the overall loss function of our approach can be formulated as:

\mathcal{L}=\mathcal{L}_{task}+\lambda_{1}\cdot\mathcal{L}_{c}+\lambda_{2}% \cdot\mathcal{L}_{align},

(4)

where $\lambda_{1},\lambda_{2}$ are hyperparameters for the trade-off between interpretability and accuracy.

5 Experiments

In this section, we will conduct experimental studies on the performance of our framework. Specifically, we will evaluate the utility and interoperability. Details and additional results are in Appendix A due to space limit.

5.1 Experimental Settings

Datasets. We evaluate our methods on three real-world image tasks: CUB, CelebA, and AwA2. See Appendix A.1 for a detailed introduction.

Baseline Models. As there is no ground truth in the supervised setting, here we compare our SSCBM with two baselines in the supervised setting mentioned in Section 3. Concept Bottleneck Model (CBM) koh2020concept : We adopt the same setting and architecture as in the original CBM. Concept Embedding Model (CEM): We follow the same setting as in espinosa2022concept .

Evaluation Metrics. To evaluate the utility, we consider the accuracy for both class and concept label prediction. Specifically, concept accuracy measures the model’s prediction accuracy for concepts: $\mathcal{C}_{acc}=\frac{1}{N}\cdot\frac{1}{k}\sum_{i=1}^{N}\sum_{j=1}^{k}% \mathbb{I}(\hat{c}_{j}^{(i)}=c_{j}^{(i)}).$ Task accuracy measures the model’s performance in predicting downstream task classes: $\mathcal{A}_{acc}=\frac{1}{N}\sum_{i=1}^{N}\mathbb{I}(\bm{\hat{y}}^{(i)}=\bm{y% }^{(i)}).$

To evaluate the interpretability, besides the concept accuracy (due to the structure of CBM, concept accuracy can also reveal the interpretability), similar to previous work koh2020concept ; espinosa2022concept , we will show visualization results. Moreover, we will evaluate the performance of the test-time intervention.

Implementation Details. All experiments are conducted on a Tesla V100S PCIe 32 GB GPU and an Intel Xeon Processor (Skylake, IBRS) CPU. See Appendix A.2 for more details.

5.2 Evalaution Results on Utility

Performance of SSCBM. We first study the performance of SSCBM. As we are focusing on the semi-supervised setting, we will use different ratios (ranging from 1% to 80%) of all data samples as the supervised data and the left ones will be the unsupervised data. The results are shown in Table 1. For the CUB dataset, the concept accuracy improves as the labeled ratio increases, peaking at 95.04% when the labeled ratio is 0.8. The class accuracy shows a consistent upward trend from 62.19% at a 1% labeled ratio to 79.27% at a 60% ratio, with a slight dip at 80%. It is notable that compared with the concept and task accuracy of CEM trained on the whole dataset, our SSCBM can already achieve comparable performance with only 40% of the supervised data.

Compared to CUB, we can see the performance on CeleA and AwA2 fluctuates, i.e., both concept accuracy and task accuracy could decrease when the labeled ratio increases. This is potentially due to the CUB dataset has a more diverse set of concepts (112 selected) compared to CelebA (only 6 selected espinosa2022concept ) and AwA2 (85 selected) and factors such as higher variability in the data or more complex feature relationships. However, we claim that these results are still very close to the CEM under fully supervised training.

Labeled Ratio	CUB		CelebA		AwA2
Labeled Ratio	Concept	Task	Concept	Task	Concept	Task
0.01	88.38%	62.19%	85.21%	30.69%	89.74%	84.74%
0.05	90.69%	70.43%	83.51%	28.97%	91.90%	84.61%
0.1	91.98%	73.02%	84.86%	29.47%	92.97%	84.20%
0.2	93.19%	75.51%	84.97%	28.91%	89.81%	85.01%
0.4	94.37%	78.05%	84.77%	29.35%	89.81%	84.47%
0.6	94.83%	79.27%	84.81%	28.73%	94.82%	83.40%
0.8	95.04%	79.20%	84.86%	29.09%	95.07%	83.53%
CBM (full supervised training)	93.99%	67.33%	85.68%	31.16%	96.48%	88.71%
CEM (full supervised training)	96.39%	79.82%	85.46%	30.66%	95.91%	87.00%

Table 1: Results of concept and task accuracy for different datasets with different portions of supervised data.

Method	CUB		CelebA		AwA2
Method	Concept	Class	Concept	Class	Concept	Class
CBM	88.73%	36.66%	68.10%	7.76%	70.48%	8.71%
CEM	87.06%	30.36%	70.95%	13.98%	84.36%	45.74%
SSCBM	91.98%	73.02%	84.86%	29.47%	92.97%	84.20%

Table 2: The accuracy of predicting concepts and class labels across different datasets with labeled ratio = 0.1. We have bolded the best results.

Comparison with Baselines. We then compare both the concept and class accuracy with baselines in three datasets when the labeled ratio is 10%. Results in Table 2 show that our method significantly outperforms other baselines when there is lacking of plenty supervised data. These results also confirm that the classical CBM and CEM are unsuitable for the semi-supervised setting. The success of SSCBM is because our alignment loss can increase the concept prediction accuracy, thus leading to an increase in class accuracy, which indicates that by leveraging joint training on both labeled and unlabeled data and aligning the unlabeled data at the concept level, we get more information.

5.3 Interpretability Evaluation

For interpretability, in the previous section, we have shown that the concept accuracy of SSCBM with a small labeled ratio is very close to that of CEM, indicating it inherits CEM’s interpretability. Besides the concept accuracy, note that in SSCBM, we also have a pseudo label revived by the alignment between the concept embedding and the input saliency map. We use the alignment loss to inherit such alignment. Thus, we will evaluate the alignment performance here to show the faithfulness of the interpretability given by SSCBM. We measure our alignment performance by comparing the correctness of the concept saliency map with concept semantics in Figure 3. See Appendix A.2 for detailed experimental procedures on generating saliency maps. Results show that our concept saliency map matches the concept semantics, indicating the effectiveness of our alignment loss. In Appendix C, we also provide our additional interpretability evaluation in Figure 7-18.

5.4 Test-time Intervention

Test-time intervention enables human users to interact with the model in the inference time. We test our test-time intervention by correcting the 10% to 100% ratio of the concept labels in the concept predictor. We adopt individual intervention for CelebA and AwA2, as there are no grouped concepts. For CUB, we perform the group intervention, i.e., intervene in the concepts with associated attribution. For example, the breast color::yellow, breast color::black, and breast color::white are the same concept group. So, we only need to correct the concept label in the group. We expect that the model performance will steadily increase along with the ratio of concept intervention, indicating that the model learned such correct label information and automatically corrected other labels.

Results in Figure 4 (a) demonstrate our model’s robustness and an increasing trend to learn the information of concept information, indicating our interpretability and model prediction performance. This lies in our loss of alignment in effectively learning the correct information pairs in unlabeled and labeled data. Results in Figure 4 (b) also show that by changing the wing color to brown, we successfully caused the model to predict the Great Crested Flycatcher instead of the Swainson Warbler. More results are in Appendix B.

5.5 Ablation Study

Model	CUB		CelebA		AwA2
Model	Concept	Class	Concept	Class	Concept	Class
SSCBM (full model)	91.98%	73.02%	84.86%	29.47%	92.97%	84.20%
SSCBM (w/o $\bm{\hat{c}_{img}}$ )	88.07%	68.16%	69.82%	26.72%	63.45%	70.74%
SSCBM (w/o $\bm{\hat{c}_{align}}$ )	89.25%	68.67%	70.61%	27.61%	76.62%	69.68%

Table 3: Ablation study on the proposed SSCBM with the labeled data ratio = 0.1.

We will finally conduct ablation experiments. It is notable that when $\lambda_{2}=0$ in (4), SSCBM will be the original CEM. Thus, by comparing SSCBM with CEM in Table 2, we confirm that the alignment loss is essential for our method. We then give a finer study on the two kinds of pseudo-labels to demonstrate that each cone plays an indispensable role in bolstering the efficacy of SSCBM. Specifically, based on our method in Section 4.2, we have three types of pseudo labels available: pseudo concept label $\bm{\hat{c}_{img}}$ , $\bm{\hat{c}_{align}}$ , and label $\bm{\hat{c}}$ predicted by the concept embedding. We first remove $\bm{\hat{c}_{img}}$ and calculate our alignment loss using $\bm{\hat{c}_{align}}$ and $\bm{\hat{c}}$ ; conversely, we remove $\bm{\hat{c}_{align}}$ and calculate the alignment loss using $\bm{\hat{c}_{img}}$ and $\bm{\hat{c}}$ .

From the results presented in Table 3, it is evident that removing the $\bm{\hat{c}_{img}}$ component significantly degrades the performance of SSCBM at both the concept level and class level. This indicates that the pseudo-concept labels via KNN contain necessary information about the ground truth. The concept encoder needs to extract information from such labels to get better performance. This situation is similar when removing the $\bm{\hat{c}_{align}}$ component, indicating that aligning the concept embedding and the input saliency map can further extract useful information from the input image and, thus, is beneficial to improve the performance. Our observations underscore the high degree of joint effectiveness of two kinds of pseudo-concept labels within our objective function, collectively contributing to the enhancement of model prediction and concept label prediction.

6 Conclusion

The training of current CBMs heavily relies on the accuracy and richness of annotated concepts in the dataset. These concept labels are typically provided by experts, which can be costly and require significant resources and effort. Additionally, concept saliency maps frequently misalign with input saliency maps, causing concept predictions to correspond to irrelevant input features - an issue related to annotation alignment. In this problem, we propose SSCBM, a strategy to generate pseudo labels and an alignment loss to solve these two problems. Results show our effectiveness.

References

[1] Eric Arazo, Diego Ortego, Paul Albert, Noel E O’Connor, and Kevin McGuinness. Pseudo-labeling and confirmation bias in deep semi-supervised learning. In 2020 International joint conference on neural networks (IJCNN), pages 1–8. IEEE, 2020.
[2] Pietro Barbiero, Gabriele Ciravegna, Francesco Giannini, Mateo Espinosa Zarlenga, Lucie Charlotte Magister, Alberto Tonda, Pietro Lio, Frédéric Precioso, Mateja Jamnik, and Giuseppe Marra. Interpretable neural-symbolic concept reasoning. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pages 1801–1825. PMLR, 2023.
[3] David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin A Raffel. Mixmatch: A holistic approach to semi-supervised learning. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
[4] Kushal Chauhan, Rishabh Tiwari, Jan Freyberg, Pradeep Shenoy, and Krishnamurthy Dvijotham. Interactive concept bottleneck models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37(5), pages 5948–5955, 2023.
[5] Asma Chebli, Akila Djebbar, and Hayet Farida Marouani. Semi-supervised learning for medical application: A survey. In 2018 International Conference on Applied Smart Systems (ICASS), pages 1–9, 2018.
[6] Xing Chen, Biao Ren, Ming Chen, Quanxin Wang, Lixin Zhang, and Guiying Yan. Nllss: predicting synergistic drug combinations based on semi-supervised learning. PLoS computational biology, 12(7):e1004975, 2016.
[7] Veronika Cheplygina, Marleen de Bruijne, and Josien P.W. Pluim. Not-so-supervised: A survey of semi-supervised, multi-instance, and transfer learning in medical image analysis. Medical Image Analysis, 54:280–296, 2019.
[8] Mateo Espinosa Zarlenga, Pietro Barbiero, Gabriele Ciravegna, Giuseppe Marra, Francesco Giannini, Michelangelo Diligenti, Zohreh Shams, Frederic Precioso, Stefano Melacci, Adrian Weller, et al. Concept embedding models: Beyond the accuracy-explainability trade-off. Advances in Neural Information Processing Systems, 35:21400–21413, 2022.
[9] Roman Filipovych and Christos Davatzikos. Semi-supervised pattern classification of medical images: Application to mild cognitive impairment (mci). NeuroImage, 55(3):1109–1119, 2011.
[10] Jack Furby, Daniel Cunnington, Dave Braines, and Alun Preece. Can we constrain concept bottleneck models to learn semantically meaningful input features? arXiv preprint arXiv:2402.00912, 2024.
[11] Marton Havasi, Sonali Parbhoo, and Finale Doshi-Velez. Addressing leakage in concept bottleneck models. Advances in Neural Information Processing Systems, 35:23386–23397, 2022.
[12] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
[13] Xiangteng He and Yuxin Peng. Fine-grained visual-textual representation learning. IEEE Transactions on Circuits and Systems for Video Technology, 30(2):520–531, 2019.
[14] Junlin Hou, Jilan Xu, and Hao Chen. Concept-attention whitening for interpretable skin lesion diagnosis. arXiv preprint arXiv:2404.05997, 2024.
[15] Lijie Hu, Liang Liu, Shu Yang, Xin Chen, Hongru Xiao, Mengdi Li, Pan Zhou, Muhammad Asif Ali, and Di Wang. A hopfieldian view-based interpretation for chain-of-thought reasoning. arXiv preprint arXiv:2406.12255, 2024.
[16] Lijie Hu, Yixin Liu, Ninghao Liu, Mengdi Huai, Lichao Sun, and Di Wang. Improving faithfulness for vision transformers. arXiv preprint arXiv:2311.17983, 2023.
[17] Lijie Hu, Yixin Liu, Ninghao Liu, Mengdi Huai, Lichao Sun, and Di Wang. Seat: stable and explainable attention. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37(11), pages 12907–12915, 2023.
[18] Lijie Hu, Chenyang Ren, Zhengyu Hu, Cheng-Long Wang, and Di Wang. Editable concept bottleneck models. arXiv preprint arXiv:2405.15476, 2024.
[19] Tri Huynh, Aiden Nibali, and Zhen He. Semi-supervised learning for medical image classification using imbalanced training data. Computer Methods and Programs in Biomedicine, 216:106628, 2022.
[20] Aya Abdelsalam Ismail, Julius Adebayo, Hector Corrada Bravo, Stephen Ra, and Kyunghyun Cho. Concept bottleneck generative models. In The Twelfth International Conference on Learning Representations, 2023.
[21] Mert Keser, Gesina Schwalbe, Azarm Nowzad, and Alois Knoll. Interpretable model-agnostic plausibility verification for 2d object detectors using domain-invariant concept bottleneck models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3890–3899, 2023.
[22] Eunji Kim, Dahuin Jung, Sangha Park, Siwon Kim, and Sungroh Yoon. Probabilistic concept bottleneck models. arXiv preprint arXiv:2306.01574, 2023.
[23] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In International conference on machine learning, pages 5338–5348. PMLR, 2020.
[24] Songning Lai, Lijie Hu, Junxiao Wang, Laure Berti-Equille, and Di Wang. Faithful vision-language interpretation via concept bottleneck models. In The Twelfth International Conference on Learning Representations, 2023.
[25] Samuli Laine and Timo Aila. Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242, 2016.
[26] Dong-Hyun Lee et al. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on challenges in representation learning, ICML, volume 3(2), page 896. Atlanta, 2013.
[27] Jia Li, Lijie Hu, Zhixian He, **gfeng Zhang, Tianhang Zheng, and Di Wang. Text guided image editing with automatic concept locating and forgetting. arXiv preprint arXiv:2405.19708, 2024.
[28] Yinheng Li, Shaofei Wang, Han Ding, and Hang Chen. Large language models in finance: A survey. In Proceedings of the Fourth ACM International Conference on AI in Finance, pages 374–382, 2023.
[29] Ziwei Liu, ** Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015.
[30] Lucie Charlotte Magister, Dmitry Kazhdan, Vikash Singh, and Pietro Liò. Gcexplainer: Human-in-the-loop concept-based explanations for graph neural networks. arXiv preprint arXiv:2107.11889, 2021.
[31] Todd K Moon. The expectation-maximization algorithm. IEEE Signal processing magazine, 13(6):47–60, 1996.
[32] Li Niu, Qingtao Tang, Ashok Veeraraghavan, and Ashutosh Sabharwal. Learning from noisy web data with category-level supervision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7689–7698, 2018.
[33] Tuomas Oikarinen, Subhro Das, Lam Nguyen, and Lily Weng. Label-free concept bottleneck models. In International Conference on Learning Representations, 2023.
[34] Yassine Ouali, Céline Hudelot, and Myriam Tami. An overview of deep semi-supervised learning. arXiv preprint arXiv:2006.05278, 2020.
[35] Hieu Pham, Zihang Dai, Qizhe Xie, and Quoc V Le. Meta pseudo labels. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11557–11568, 2021.
[36] Yoshihide Sawada and Keigo Nakamura. Concept bottleneck model with additional unsupervised concepts. IEEE Access, 10:41758–41765, 2022.
[37] Ivaxi Sheth and Samira Ebrahimi Kahou. Auxiliary losses for learning generalizable concept-based models. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
[38] Kihyuk Sohn, David Berthelot, Nicholas Carlini, Zizhao Zhang, Han Zhang, Colin A Raffel, Ekin Dogus Cubuk, Alexey Kurakin, and Chun-Liang Li. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Advances in neural information processing systems, 33:596–608, 2020.
[39] Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in neural information processing systems, 30, 2017.
[40] Arun James Thirunavukarasu, Darren Shu Jeng Ting, Kabilan Elangovan, Laura Gutierrez, Ting Fang Tan, and Daniel Shu Wei Ting. Large language models in medicine. Nature medicine, 29(8):1930–1940, 2023.
[41] Jesper E Van Engelen and Holger H Hoos. A survey on semi-supervised learning. Machine learning, 109(2):373–440, 2020.
[42] Han Xuanyuan, Pietro Barbiero, Dobrik Georgiev, Lucie Charlotte Magister, and Pietro Liò. Global concept-based interpretability for graph neural networks via neuron analysis. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37(9), pages 10675–10683, 2023.
[43] An Yan, Yu Wang, Yiwu Zhong, Zexue He, Petros Karypis, Zihan Wang, Chengyu Dong, Amilcare Gentili, Chun-Nan Hsu, **gbo Shang, et al. Robust and interpretable medical image classifiers via concept bottleneck models. arXiv preprint arXiv:2310.03182, 2023.
[44] Shu Yang, Lijie Hu, Lu Yu, Muhammad Asif Ali, and Di Wang. Human-ai interactions in the communication era: Autophagy makes large models achieving local optima. arXiv preprint arXiv:2402.11271, 2024.
[45] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc concept bottleneck models. In The Eleventh International Conference on Learning Representations, 2023.
[46] Gang Zhang, Shan-Xing Ou, Yong-Hui Huang, and Chun-Ru Wang. Semi-supervised learning methods for large scale healthcare data analysis. International Journal of Computers in Healthcare, 2(2):98–110, 2015.

Appendix A Details of Experimental Setup

A.1 Datasets

We evaluate our methods on three real-world image tasks: CUB, CelebA, and AwA2.

•

CUB [13]: the Caltech-UCSD Birds-200-2011 (CUB) dataset, which includes a total of 11,788 avian images. The objective is to accurately categorize these birds into one of 200 distinct species. Following [8], we use k = 112 binary bird attributes representing wing color, beak shape, etc.
•

CelebA [29]: the Large-scale CelebFaces Attributes dataset, in the CelebA task, there are 6 balanced incomplete concept annotations and each image can be one of the 256 classes.
•

AwA2 [32]: Animals with Attributes 2 consists of in total 37322 images distributed in 50 animal categories. The AwA2 also provides a category-attribute matrix, which contains an 85-dim attribute vector (e.g., color, stripe, furry, size, and habitat) for each category.

	CUB	CelebA	AwA2
#Images	11,788	202,599	37,322
#Classes	200	10,177	50
#Concepts	312	40	85

Table 4: Statistics of the datasets used in our experiments.

A.2 Implementation Details

First, we resize the images to an input size of 299 x 299 (64 x 64 for CelebA). Subsequently, we employ ResNet34 [12] as the backbone to transform the input into latent code, followed by a fully connected layer to convert it into concept embeddings of size 16 (32 for CUB). During pseudo-labeling, we also utilize ResNet34 with the KNN algorithm with k = 2. We set $\lambda_{1}=1$ and $\lambda_{2}=0.1$ and utilize the SGD optimizer with a learning rate of 0.05 and a regularization coefficient of 5e-6. We train SSCBM for 100 epochs with a batch size of 256 (for AwA2, the batch size is 32 due to the large size of individual images). We repeat each experiment 5 times and report the average results.

To construct the concept saliency map, we first upsample the heatmaps $\{\mathcal{H}_{1},\mathcal{H}_{2},\ldots,\mathcal{H}_{k}\}$ calculated in Section 4.2 to the size $H\times W$ (the original image size). Then, we create a mask based on the value intensities, with higher values corresponding to darker colors.

Appendix B Test-time Intervention

Results in Figure 5 demonstrate our model’s robustness and an increasing trend to learn the information of concept information, indicating our interpretability and model prediction performance. This lies in our loss of alignment in effectively learning the correct information pairs in unlabeled and labeled data.

Here, we present some successful examples of Test-time Intervention illustrated in Figure 6. The first two on the left show examples from the CUB dataset. In the top left image, by changing the wing color to brown, we successfully caused the model to predict the Great Crested Flycatcher instead of the Swainson Warbler. In the bottom left, because the model initially failed to notice that the upper part of the bird was black, it misclassified it as Vesper Sparrow. Through test-time intervention, we successfully made it predicted the bird was a Grasshopper Sparrow. The results on the right side of the image are from the AwA2 dataset. We successfully made the model predict correctly by modifying concepts at test time. For example, in the top right image, by modifying the concept of ’fierce’ for the orca, we prevented it from being predicted as a horse. In the bottom right, we successfully made the model recognize the bat through the color of the bat.

Appendix C Additional Interpretability Evaluation

We provide our additional interpretability evaluation in Figure 7 - 18 as follows.

Appendix D Limitations

While we solve a small portion of annotation problems by semi-supervised learning, semi-supervised models may not be suitable for all types of tasks or datasets. It is more effective that the data distribution is smooth. However, this is the limitation of semi-supervised learning, not our methods.


(a) Label Complete and Well-aligned	(b) Label Incomplete	(c) Misaligned