LatentExplainer: Explaining Latent Representations in Deep Generative Models with Multi-modal Foundation Models
Abstract
Deep generative models like VAEs and diffusion models have advanced various generation tasks by leveraging latent variables to learn data distributions and generate high-quality samples. Despite the field of explainable AI making strides in interpreting machine learning models, understanding latent variables in generative models remains challenging. This paper introduces LatentExplainer, a framework for automatically generating semantically meaningful explanations of latent variables in deep generative models. LatentExplainer tackles three main challenges: inferring the meaning of latent variables, aligning explanations with inductive biases, and handling varying degrees of explainability. By perturbing latent variables and interpreting changes in generated data, the framework provides a systematic approach to understanding and controlling the data generation process, enhancing the transparency and interpretability of deep generative models. We evaluate our proposed method on several real-world and synthetic datasets, and the results demonstrate superior performance in generating high-quality explanations of latent variables.
1 Introduction
Deep generative models, such as Variational Autoencoders (VAEs) Kingma and Welling (2013) and diffusion models Rombach et al. (2022), have become a state-of-the-art approach in various generation tasks Razavi et al. (2019); Ho et al. (2022); Yang et al. (2023). This method effectively leverages latent variables to learn underlying data distributions and generate high-quality samples by capturing the underlying structure of high-dimensional data in a low-dimensional semantic space Vahdat et al. (2021). To enhance performance, inductive biases are often enforced over latent variables. For instance, disentanglement is a rule of thumb which enforces orthogonality among different latent variables Ding et al. (2020). Moreover, sometimes latent variables can be grouped, leading to combination bias Klys et al. (2018); Guo et al. (2020). More recently, the controllability of deep generative models is desired, where latent variables are bound with specific properties of interest Wang et al. (2024). As the latent variables represent all the information in a lower dimension, they can be considered as an effective abstraction of data into key factors. Hence, it is highly beneficial to interpret the meaning of the latent variables to understand the key factors of the data better and to control the data generation process more effectively. For examples, as shown in Figure 1, by interpreting the latent variables of deep generative models such as VAEs for images, we can elicit key factors of the images for better understanding and control of the data generation.
The field of explainable artificial intelligence (XAI) has extensively investigated the interpretation of machine learning models Adadi and Berrada (2018). However, interpreting latent variables in deep generative models remains underexplored. Machine learning model explanation, a.k.a., post-hoc explanation, can be categorized into global and local explanations Gao et al. (2024). Global explanations focus on elucidating the entire model, while local explanations target the reasoning behind specific predictions. Global explanations are more challenging, with existing work mostly emphasizing attributions to identify which features are most important for model decision-making Saleem et al. (2022). The missing piece is understanding the meaning of features when they are unknown, which is very common in deep generative models. More recently, one category of global explanation methods, called concept-based explanations, aims to generate more human-understandable concepts as explanations Poeta et al. (2023). However, current concept-based methods often rely on human heuristics or predefined concept and feature space, limiting the expressiveness of the explanations and falling short of achieving truly automatic explanation generation Kim et al. (2018); Koh et al. (2020); Bai et al. (2022).
Despite the progress in XAI, interpreting latent variables in deep generative models presents significant challenges. First, these variables are not grounded in real-world concepts, and the black-box nature of the models prevents us from inferring the meaning of latent variables from observations. Second, explanations must adhere to the inductive biases imposed on the latent variables, which is essential yet difficult to ensure. For example, in disentangled latent variables, the semantic meanings should be orthogonal. Third, different latent variables have varying degrees of explainability. Some may be trivial to data generation and intrinsically lack semantic meaning. It is crucial to identify which latent variables are explainable and which do not need explanation.
To address the aforementioned challenges, we propose LatentExplainer, a novel and generic framework that automatically generates semantically-meaningful explanations of latent variables in deep generative models. To explain these variables and work around the black-box nature of the models (Challenge 1), we propose to perturbe each latent variable and explain the resulting changes in the generated data. Specifically, we perturb and decode each manipulated latent variable to produce the corresponding sequence of generated data samples. The trend in the sequence is leveraged to reflect the semantics of the latent variable to be explained. To align explanations with the intrinsic nature of the deep generative models (Challenge 2), we design a generic framework that formulates inductive biases on the Bayesian network of latent variable models into textual prompts. These prompts are understandable to humans and large foundation models, hence converting user-provided inductive bias formulas into corresponding prompts that a multimodal large language model can easily understand. To handle the varying degree of explainability in latent variables (Challenge 3), we propose to probe the confidence of the explanations to avoid hallucitations by estimating their uncertainty. This approach assesses whether the latent representations are interpretable and selects the most consistent explanations, ensuring accurate and meaningful interpretation of the latent variables.
2 Related Work
Deep Generative Models. Deep generative models are essential for modeling complex data distributions. Variational Autoencoders (VAEs) are prominent in this area, introduced by Kingma and Welling Kingma and Welling (2013). VAEs encode input data into a latent space and decode it back, optimizing a balance between reconstruction error and the Kullback-Leibler divergence Rezende et al. (2014). They have diverse applications, including image generation Yan et al. (2016), semi-supervised learning Kingma et al. (2014), and anomaly detection An and Cho (2015).
Diffusion models, proposed by Ho et al. Ho et al. (2020), generate data by a diffusion process that gradually adds noise to the data and then learns to reverse this process to recover the original data. These models have achieved high-fidelity image generation, surpassing GANs in quality and diversity. Latent diffusion models allow the model to operate in a lower-dimensional space, which significantly reduces computational requirements while maintaining the quality of the generated samples Rombach et al. (2022). Advances have made them applicable to text-to-image synthesis Nichol and Dhariwal (2021), audio generation Kong et al. (2020), and molecular design Xu et al. (2022).
Image Manipulation. Image manipulation using latent variables in generative models like VAEs and diffusion models is an important technique for modifying and enhancing generated images. A key method is latent traverse, which involves traversing different values of latent variables to achieve diverse manipulations in the generated outputs. This technique allows for precise control over the attributes in generated images, enabling adjustments Chen et al. (2016); Higgins et al. (2017). For example, latent traverse has been effectively employed to disentangle and control various attributes in generated images Brock et al. (2016); Zhu et al. (2016). Although latent traverse is used for visualization and editing purposes, it has not yet been widely explored as a tool for explaining the underlying generative processes in natural language.
Multi-modal Large Language Models. Multi-modal Large Language Models (MLLMs) integrate diverse data modalities, enhancing their ability to understand and generate complex information Yin et al. (2023); Bai et al. (2024). Notable models include CLIP, which learns visual and textual representations Radford et al. (2021), GPT-4o, which extends GPT-4v with better visual capabilities OpenAI (2024), and Gemini that is a family of highly capable multimodal models Team et al. (2023) to perform complex tasks that require understanding context across different modalities.
3 Preliminaries and Problem Formulation
Deep Generative Models. These are a class of models that learn a map** between observations and key underlying factors . These models are widely used in deep generative models such as VAEs and latent diffusion models. VAEs, for instance, introduce a probabilistic approach to encoding data by maximizing the evidence lower bound (ELBO) Kingma and Welling (2013):
where KL stands for Kullback–Leibler divergence. Latent diffusion models further refine the generation process by iteratively refining noise into structured data Rombach et al. (2022). These models effectively capture the underlying structure of data in a low-dimensional semantic space.
Latent variable manipulation of diffusion models aims at transversing the latent representation along the semantic latent direction . The perturbed vector , where is a hyper-parameter controlling the strength Park et al. (2023). An image sequence can then be generated by . The perturbations in the semantic latent direction lead to semantic changes in the generated image sequence.
Inductive Bias in Latent Variables. Inductive biases are commonly imposed on latent variables to enhance the performance and interpretability of deep generative models. These biases can be categorized into several types: Disentanglement Bias: It enforces orthogonality among different latent variables, ensuring that each latent variable captures a distinct factor of variation in the data Ding et al. (2020). Combination bias: Sometimes latent variables are grouped, leading to biases in how they interact and combine to represent complex data structures Klys et al. (2018). Conditional Bias: It emphasizes the relationship between specific properties of interest and the corresponding latent variables Guo et al. (2020). Other Biases: Various other biases can be introduced depending on the specific requirements of the model, such as sparsity or hierarchical organization, to guide the learning process more effectively.
Interpreting latent variables in deep generative models is extremely challenging due to: 1). Their lack of grounding in real-world concepts and the models’ black-box nature, which hinders meaning inference. 2). The necessity for explanations to conform to inductive biases, such as orthogonality in disentangled variables, which is difficult to ensure. 3). The varying degrees of explainability among latent variables, with some being trivial to data generation and lacking semantic meaning, making it essential to identify which variables require explanation.
Problem Formulation. We assume a dataset , where each sample consists of or , with and as properties of . The dataset is generated by M latent variables , where . can be a single latent variable or a group of correlated latent variables. Suppose we are given a generative model with a set of formulas with respect to , where represents an inductive bias that the generative model must satisfy. Our goal is to derive a textual sequence that explains the semantic meanings of the latent variable .
4 Proposed Method
4.1 Overview of LatentExplainer
This paper focuses on the tasks in explaining the semantics of latent variables in deep generative models. To interpret the semantics of latent variables and work around the blackbox nature of deep generative models, we propose to perturb each latent variable and explain the change it imposes on the generated data. To solve these challenges, we propose a novel LatentExplainer scheme. We perturb and subsequently decode the manipulated latent variables into generated data that are perceptible by humans such as an image. Through a series of perturbations on , a sequence of generated data samples can be obtained to reflect the changes in in Figure 2(a). When explaining latent variable models, it is crucial to fully leverage and align with the prior knowledge about them. To do this, we design a generic framework that can automatically formulate inductive bias of generative models into textual prompts. Specifically, we have summarized three common inductive biases and designed their symbol-to-word one-on-one prompts (Section 4.2). Our scheme can adaptively convert any user-provided inductive bias formulas into a corresponding prompt to provide more accurate explanations of the latent representations in Figure 2(b) (Section 4.3). Eventually, the explanations are selected through an uncertainty quantification approach to assess whether the latent representations are interpretable and select the most consistent explanations in Figure 2(c) (Section 4.4).
4.2 Inductive-bias-guided Prompt Framework
4.2.1 Generic Framework
In this section, we propose a generic framework that can verbalize the inductive bias in deep generative models into prompts for better latent variable explanations. The prevalent inductive biases in deep generative models are categorized into three types: disentanglement bias, conditional bias, and combination bias. Intrinsically, they are mathematical expressions of the conditional independence among the latent variables, which can confine their explanations but cannot be directly input into large foundation models.
Framework: Our framework proposes a principled, automatic way that translate the mathematical expression to textual prompts. The prompts include adaptive prompts and fixed prompts. The adaptive prompts are converted from the inductive bias formulas. The formula contains mathematical symbols that consist of mathematical variables and mathematical operators. We use the same color to represent the correspondence between mathematical symbols in the formulas and the text in the prompts. The translation mechanism is shown in Table 1.
Grammar # | Symbol | Prompt |
---|---|---|
1 | ( ) | pattern of change |
2 | ), | other variations |
3 | property of interests | |
4 | a group | |
5 | associated with | |
6 | not associated with | |
7 | = | same |
8 | change |
Fixed Ending: What is the pattern of change? Write in a sentence. If there is no clear pattern, just write "No clear explanation".
4.2.2 From Disentanglement Bias to Prompts
Disentanglement bias refers to the model’s ability to separate independent factors in the data Ding et al. (2020); Wu et al. (2023). The formula representing this bias focuses on ensuring that different latent variables correspond to different independent underlying factors. Independent factors would be invariant with respect to one another (Ridgeway, 2016). By disentangling these factors, researchers can better understand the underlying structure of the data and improve the model’s performance on tasks such as representation learning.
Formula:
The above formula is translated into the following prompting using the grammar #1,2,7.
Prompt: These two rows of images show the same pattern of change despite other variations.
4.2.3 From Combination Bias to Prompts
Combination bias involves understanding how different latent variables interact within groups and remain independent across groups Klys et al. (2018). This bias is significant as it helps in identifying how combinations of factors contribute to the overall data generation process. Recognizing these interactions enables researchers to design models that can generate more complex and realistic data by capturing intricate relationships within the data.
-
•
No inter-group correlation:
Formula:
The above formula is translated into the following prompting using the grammar #1,2,4,5,7. Prompt: The pattern of change is associated with a group. The first two rows of images show the same pattern of change despite other variations in another group.
-
•
Intra-group correlation:
The above formula is translated into the following prompting using the grammar #1,2,4,5,8.
Prompt: The pattern of change is associated with a group. The pattern of change in the last two rows of images should change given other variations.
4.2.4 From Conditional Bias to Prompts
Conditional bias focuses on the relationship between specific properties of interest and the corresponding latent variables. This bias is important because it allows models to generate data conditioned on particular attributes, enhancing the model’s ability to produce targeted and controlled outputs.
Formula:
The above formula is translated into the following prompting using the grammar #1,2,3,4,5,8.
Prompt: If the pattern of change is associated with the group of the property of interest, this image sequence will change as other variations in .
There may exists a latent variable that are independent of :
The above formula is translated into the following prompting using the grammar #1,2,3,4,6,7.
Prompt: If the pattern of change is not associated with the group of the property of interest, this image sequence will remain constant despite other variations in .
4.3 Automatic In-context Prompt Generation
By leveraging these three common inductive biases identified in generative models, we can automatically generate prompts that align with using in-context learning. The pseudo-code of this section can be found at Algorithm 1.
Our approach starts by extracting mathematical symbols from the given formula using the ExtractSymbols function (line 2). This function traverses the formula to identify and isolate the mathematical symbols.
Next, the algorithm initializes an empty dictionary semantics to store the semantic representations of these symbols (line 3). For each symbol, the pre-trained LLM extracts its semantic meaning based on few-shot examples and the optional input symbol information (lines 4-6).
Finally, the algorithm generates the prompt using the formula , the few-shot examples , and the gathered semantics (line 8). This step-by-step reasoning process ensures that the generated prompts are contextually relevant and aligned with the underlying formulas, which could reduce hallucination and enhance model performance and interoperability.
Model | Dataset | BLEU | ROUGE-L | METEOR | SPICE | Accuracy |
---|---|---|---|---|---|---|
DDPM w/ Inductive Bias Prompts | CelebA-HQ | 18.5 | 35.1 | 32.8 | 18.8 | 71.1 |
LSUN-Church | 30.9 | 44.3 | 36.4 | 29.2 | 78.9 | |
AFHQ-Dog | 25.9 | 38.0 | 41.5 | 29.9 | 78.9 | |
DDPM w/o Inductive Bias Prompts | CelebA-HQ | 0.2 | 4.1 | 4.3 | 1.8 | 13.8 |
LSUN-Church | 8.9 | 28.1 | 20.6 | 14.5 | 55.6 | |
AFHQ-Dog | 2.9 | 13.3 | 13.8 | 9.0 | 39.1 | |
-TCVAE w/ Inductive Bias Prompts | CelebA-HQ | 29.9 | 55.4 | 44.3 | 30.2 | 83.3 |
3DShapes | 25.4 | 49.1 | 32.7 | 22.8 | 88.9 | |
dSprites | 12.5 | 37.3 | 29.9 | 21.7 | 73.3 | |
-TCVAE w/o Inductive Bias Prompts | CelebA-HQ | 6.1 | 32.7 | 25.3 | 14.0 | 50.0 |
3DShapes | 5.4 | 31.2 | 22.3 | 10.1 | 88.9 | |
dSprites | 0.0 | 18.7 | 14.4 | 7.5 | 13.3 |
Model | Dataset | BLEU | ROUGE-L | METEOR | SPICE | Accuracy |
---|---|---|---|---|---|---|
CSVAE w/ Inductive Bias Prompts | CelebA-HQ | 25.3 | 45.3 | 39.8 | 27.0 | 93.3 |
3DShapes | 36.2 | 43.3 | 36.6 | 36.8 | 80.0 | |
dSprites | 16.0 | 35.9 | 29.2 | 21.8 | 66.7 | |
CSVAE w/o Inductive Bias Prompts | CelebA-HQ | 12.0 | 37.1 | 28.5 | 17.5 | 80.0 |
3DShapes | 14.2 | 28.6 | 25.1 | 11.8 | 50.0 | |
dSprites | 0.0 | 16.4 | 10.1 | 7.3 | 25.0 |
Model | Dataset | BLEU | ROUGE-L | METEOR | SPICE | Accuracy |
---|---|---|---|---|---|---|
Stable Diffusion w/ Inductive Bias Prompts | CelebA-HQ | 18.5 | 40.9 | 33.9 | 23.9 | 82.7 |
LSUN-Church | 18.3 | 38.1 | 32.5 | 25.0 | 97.3 | |
AFHQ-Cat | 13.3 | 29.7 | 23.6 | 20.1 | 88.0 | |
Stable Diffusion w/o Inductive Bias Prompts | CelebA-HQ | 5.9 | 24.3 | 21.1 | 12.2 | 45.3 |
LSUN-Church | 10.5 | 29.8 | 26.4 | 17.0 | 76.0 | |
AFHQ-Cat | 10.2 | 28.6 | 24.6 | 17.3 | 68.0 | |
CSVAE w/ Inductive Bias Prompts | CelebA-HQ | 24.2 | 42.9 | 36.2 | 19.6 | 100.0 |
3DShapes | 25.7 | 39.5 | 31.6 | 20.7 | 73.3 | |
dSprites | 21.2 | 35.0 | 28.7 | 20.7 | 53.3 | |
CSVAE w/o Inductive Bias Prompts | CelebA-HQ | 6.1 | 32.7 | 25.3 | 14.0 | 53.3 |
3DShapes | 5.4 | 31.2 | 22.3 | 10.1 | 80.0 | |
dSprites | 0.0 | 18.7 | 14.4 | 7.5 | 13.3 |
4.4 Uncertainty-aware Explanations
Uncertainty-aware methods can be applied to large language model responses Lin et al. (2023) and image explanations Zhao et al. (2024). To measure the uncertainty of the responses from GPT-4o, we sampled times from the GPT-4o to generate the responses The certainty score of the explanation is the average pairwise cosine similarity of the responses : where . We further use a threshold of the uncertainty score to assess the interpretability of the latent variables. The implementation details are in Appendix B.
5 Experiment
The experiments are conducted on a 64-bit machine with 24-core Intel 13th Gen Core i9-13900K @ 5.80GHz, 32GB memory and NVIDIA GeForce RTX 4090. We use GPT-4o as our MLLM and set and The code is available at https://github.com/mengdanzhu/LatentExplainer.
5.1 Dataset
We utilized five datasets to evaluate the performance of different generative models under three inductive biases: CelebA-HQ (Karras et al., 2018), AFHQ-Dog (Choi et al., 2020), LSUN-Church (Yu et al., 2015) for the unconditional and conditional diffusion models, and CelebA-HQ (Karras et al., 2018), 3DShapes (Burgess and Kim, 2018), and dSprites (Matthey et al., 2017) for the VAE models. The CelebA-HQ dataset is a high-quality version of the CelebA dataset, consisting of 30K images of celebrities, divided into 28K for training and 2K for testing. The LSUN-Church dataset contains large-scale images of church buildings. The AFHQ dataset includes high-quality images of animals, divided into three categories: cats, dogs, and wild animals. 3DShapes is a synthetic dataset contains images of 3D shapes with six factors of variation: floor hue, wall hue, object hue, object scale, object shape, and wall orientation, divided into 384K for training and 96K for testing. dSprites consists of 2D shapes (hearts, squares, ellipses) generated with five factors of variation: shape, scale, orientation, position X, and position Y, divided into 516K for training and 221K for testing.
5.2 Generative Models under Inductive Biases
We explore the latent space in generative models that satisfy the aforementioned three types of inductive biases. For each type, we present the relevant generative models that align with the corresponding inductive bias: (1) Disentanglement Bias: -TCVAE (Chen et al., 2018) explicitly penalizes the total correlation of the latent variables to disentangle the latent representations. Denoising Diffusion Probabilistic Model(DDPM) (Ho et al., 2020) adds Gaussian noise independently at each timestep in the forward process and eventually transforms into pure Gaussian noise, in which the covariance matrix is diagonal. This assumes the latent factors are independent; (2) Combination Bias: CSVAE (Klys et al., 2018) has two groups of latent variables and , where and are uncorrelated and the latent variables within the group are correlated; (3) Conditional Bias: CSVAE also satisfies conditional bias because one group of latent variables is associated with the properties while the other group of latent variables minimizes the mutual information with the properties. Stable Diffusion (Rombach et al., 2022) is a latent diffusion model to generate images conditioned on prompts.
Model | Method | BLEU | ROUGE-L | METEOR | SPICE | Accuracy |
---|---|---|---|---|---|---|
DDPM | LatentExplainer | 25.1 | 39.1 | 36.9 | 26.0 | 76.3 |
w/o Inductive Bias Prompts | 4.0 | 15.2 | 12.9 | 8.4 | 32.4 | |
w/o Uncertainty Quantification | 16.8 | 32.0 | 29.0 | 19.8 | 70.0 | |
w/o IB and w/o UQ | 2.9 | 13.8 | 11.8 | 7.1 | 32.2 | |
-TCVAE | LatentExplainer | 22.6 | 47.3 | 35.6 | 24.9 | 82.4 |
w/o Inductive Bias Prompts | 3.8 | 27.5 | 20.7 | 10.5 | 52.9 | |
w/o Uncertainty Quantification | 18.7 | 41.0 | 33.8 | 22.6 | 54.9 | |
w/o IB and w/o UQ | 3.5 | 27.5 | 19.4 | 10.7 | 47.1 |
Model | Dataset | BLEU | ROUGE-L | METEOR | SPICE | Accuracy |
---|---|---|---|---|---|---|
CSVAE | LatentExplainer | 25.8 | 41.5 | 35.2 | 28.5 | 81.0 |
w/o Inductive Bias Prompts | 8.7 | 27.4 | 21.2 | 12.2 | 54.6 | |
w/o Uncertainty Quantification | 16.2 | 35.6 | 27.6 | 22.8 | 66.7 | |
w/o IB and w/o UQ | 5.8 | 27.3 | 20.2 | 10.6 | 52.4 |
Model | Dataset | BLEU | ROUGE-L | METEOR | SPICE | Accuracy |
---|---|---|---|---|---|---|
Stable Diffusion | LatentExplainer | 16.7 | 36.2 | 30.0 | 23.0 | 89.3 |
w/o Inductive Bias Prompts | 8.9 | 27.6 | 24.0 | 15.5 | 63.1 | |
w/o Uncertainty Quantification | 14.9 | 32.8 | 27.0 | 22.0 | 88.4 | |
w/o IB and w/o UQ | 7.1 | 25.7 | 21.8 | 14.7 | 55.6 | |
CSVAE | LatentExplainer | 23.7 | 39.1 | 32.2 | 20.3 | 75.6 |
w/o Inductive Bias Prompts | 11.0 | 29.0 | 22.8 | 12.2 | 48.9 | |
w/o Uncertainty Quantification | 11.8 | 29.5 | 22.1 | 14.2 | 57.8 | |
w/o IB and w/o UQ | 6.5 | 23.0 | 17.0 | 8.5 | 44.4 |
5.3 Quantitative Analysis
For the quantitative explanation evaluation, we use BLEU Papineni et al. (2002), ROUGE-L Lin (2004), METEOR Banerjee and Lavie (2005), SPICE Anderson et al. (2016) as the automated metrics and human annotated accuracy to assess the explanations of latent variables. In Table 2, the explanations generated with the inductive bias prompts consistently outperform their counterparts without inductive bias prompts across all datasets and all metrics. Similar results can be found in Table 3 and Table 4. Their consistent results demonstrate that inductive bias is very important and necessary when explaining the latent representations of generative models. They also show that our framework can effectively design prompts for different inductive biases in generative models to improve the accuracy of latent explanations.
5.4 Qualitative Evaluation
To analyze the explanations for DDPM under disentanglement bias, we manipulate the latent representation along each latent direction and compare it with the one that first traverses along another latent direction and then traverses along the same latent direction. In Figure 3, for each latent direction, we pass two image sequences perturbed along that latent direction and the inductive bias prompt based on the disentanglement formula to MLLM to obtain the common latent explanation through the uncertainty quantification. In comparison, the explanations generated without the inductive bias prompt as shown in Figure 4 tend to show "no clear explanation" or explanations that align with one image sequence but do not reflect the common pattern in both image sequences. The addition of the inductive bias prompts can assist with ruling out the variation effects of other latent variables.
We also qualitatively evaluate the explanations for CSVAE under combination bias. We transverse each latent dimension and compare it with the one that first changes another latent variable in another group and then traverses the same latent variable, as well as the ones that first change another latent variable in the same group and then traverse the same latent variable. As Figure 10 depicts, our model can clearly show the explanations of latent variable as the color of the ground, the background color, and the shape of the object. The effect of removing inductive bias prompts is similar to the disentanglement bias.
In Figure 11, we provide the "young appearance" prompt to Stable Diffusion under conditional bias, and all three top latent directions are associated with the prompt. Compared with the one without the inductive bias prompts in Figure 12, the addition of inductive bias prompts can better identify the relation with the property of interest in the explanations as we can see the suggesting meaning with the property in addition to the description of the pattern in both and . More evaluation results can be found in Appendix C.
5.5 Ablation Study
To understand the impact of various components in our proposed LatentExplainer, we conducted a comprehensive ablation study across different models (DDPM, -TCVAE, Stable Diffusion, and CSVAE) and inductive biases (disentanglement, combination, and conditional). Specifically, we evaluated the removal of inductive bias prompts (IB), uncertainty quantification (UQ), and both components. The results are summarized in Tables 5, Tables 6, and Tables 7. Removing inductive bias prompts leads to a substantial drop in all generative models, indicating the importance of inductive bias in generating high-quality explanations. Likewise, the removal of uncertainty quantification also results in decreased performance in all generative models. Overall, the ablation studies confirm the necessity of both inductive bias prompts and uncertainty quantification in our LatentExplainer framework, demonstrating their significant contributions to improving explanatory performance across various generative models.
6 Conclusion
In this paper, we introduced LatentExplainer, a framework designed to generate semantically meaningful explanations of latent variables in deep generative models. Our experiments quantitatively revealed that the inclusion of inductive bias prompts significantly enhances the quality of explanations generated by LatentExplainer. In qualitative evaluations, LatentExplainer provided clear explanations for the latent variables, effectively reflecting superior performance when inductive bias prompts were utilized. The ablation studies further confirmed the necessity of inductive bias prompts and uncertainty quantification in producing accurate and consistent explanations.
7 Limitation
In this work, we use GPT-4o as our MLLM agent to explain the latent representations. Although GPT-4o has improved vision capabilities across most tasks, we still find it incapable of identifying intricate facial features, positions, and orientations. It is necessary to enhance visual understanding to improve the explanation quality for MLLM.
References
- Adadi and Berrada (2018) Amina Adadi and Mohammed Berrada. 2018. Peeking inside the black-box: a survey on explainable artificial intelligence (xai). IEEE access, 6:52138–52160.
- An and Cho (2015) **won An and Sungzoon Cho. 2015. Variational autoencoder based anomaly detection using reconstruction probability. Special Lecture on IE, 2(1):1–18.
- Anderson et al. (2016) Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould. 2016. Spice: Semantic propositional image caption evaluation. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V 14, pages 382–398. Springer.
- Bai et al. (2022) Andrew Bai, Chih-Kuan Yeh, Pradeep Ravikumar, Neil YC Lin, and Cho-Jui Hsieh. 2022. Concept gradient: Concept-based interpretation without linear assumption. arXiv preprint arXiv:2208.14966.
- Bai et al. (2024) Guangji Bai, Zheng Chai, Chen Ling, Shiyu Wang, Jiaying Lu, Nan Zhang, Tingwei Shi, Ziyang Yu, Mengdan Zhu, Yifei Zhang, et al. 2024. Beyond efficiency: A systematic survey of resource-efficient large language models. arXiv preprint arXiv:2401.00625.
- Banerjee and Lavie (2005) Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 65–72, Ann Arbor, Michigan. Association for Computational Linguistics.
- Brock et al. (2016) Andrew Brock, Theodore Lim, James M Ritchie, and Nick Weston. 2016. Neural photo editing with introspective adversarial networks. In International Conference on Learning Representations.
- Burgess and Kim (2018) Chris Burgess and Hyunjik Kim. 2018. 3d shapes dataset. https://github.com/deepmind/3dshapes-dataset/.
- Chang et al. (2023) Kai-Po Chang, Chi-Pin Huang, Wei-Yuan Cheng, Fu-En Yang, Chien-Yi Wang, Yung-Hsuan Lai, and Yu-Chiang Frank Wang. 2023. Rapper: Reinforced rationale-prompted paradigm for natural language explanation in visual question answering. In The Twelfth International Conference on Learning Representations.
- Chen et al. (2018) Ricky TQ Chen, Xuechen Li, Roger B Grosse, and David K Duvenaud. 2018. Isolating sources of disentanglement in variational autoencoders. Advances in neural information processing systems, 31.
- Chen et al. (2016) Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. 2016. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. arXiv preprint arXiv:1606.03657.
- Choi et al. (2020) Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. 2020. Stargan v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8188–8197.
- Ding et al. (2020) Zheng Ding, Yifan Xu, Weijian Xu, Gaurav Parmar, Yang Yang, Max Welling, and Zhuowen Tu. 2020. Guided variational autoencoder for disentanglement learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7920–7929.
- Gao et al. (2024) Yuyang Gao, Siyi Gu, Junji Jiang, Sungsoo Ray Hong, Dazhou Yu, and Liang Zhao. 2024. Going beyond xai: A systematic survey for explanation-guided learning. ACM Computing Surveys, 56(7):1–39.
- Guo et al. (2020) Xiaojie Guo, Yuanqi Du, and Liang Zhao. 2020. Property controllable variational autoencoder via invertible mutual dependence. In International Conference on Learning Representations.
- Higgins et al. (2017) Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. 2017. beta-vae: Learning basic visual concepts with a constrained variational framework. International Conference on Learning Representations.
- Ho et al. (2020) Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. arXiv preprint arXiv:2006.11239.
- Ho et al. (2022) Jonathan Ho, Chitwan Saharia, William Chan, David J Fleet, Mohammad Norouzi, and Tim Salimans. 2022. Cascaded diffusion models for high fidelity image generation. Journal of Machine Learning Research, 23(47):1–33.
- Karras et al. (2018) Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive growing of GANs for improved quality, stability, and variation. In International Conference on Learning Representations.
- Kim et al. (2018) Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. 2018. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR.
- Kingma et al. (2014) Diederik P Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and Max Welling. 2014. Semi-supervised learning with deep generative models. In Advances in Neural Information Processing Systems, pages 3581–3589.
- Kingma and Welling (2013) Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
- Klys et al. (2018) Jack Klys, Jake Snell, and Richard Zemel. 2018. Learning latent subspaces in variational autoencoders. Advances in neural information processing systems, 31.
- Koh et al. (2020) Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. 2020. Concept bottleneck models. In International conference on machine learning, pages 5338–5348. PMLR.
- Kong et al. (2020) Zhaojiang Kong, Wei **, Jianfu Huang, Kexin Zhao, and Bryan Catanzaro. 2020. Diffwave: A versatile diffusion model for audio synthesis. arXiv preprint arXiv:2009.09761.
- Lin (2004) Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
- Lin et al. (2023) Zhen Lin, Shubhendu Trivedi, and Jimeng Sun. 2023. Generating with confidence: Uncertainty quantification for black-box large language models. arXiv preprint arXiv:2305.19187.
- Matthey et al. (2017) Loic Matthey, Irina Higgins, Demis Hassabis, and Alexander Lerchner. 2017. dsprites: Disentanglement testing sprites dataset. https://github.com/deepmind/dsprites-dataset/.
- Nichol and Dhariwal (2021) Alexander Nichol and Prafulla Dhariwal. 2021. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741.
- OpenAI (2024) OpenAI. 2024. Gpt-4o. https://openai.com/index/hello-gpt-4o/.
- Papineni et al. (2002) Kishore Papineni, Salim Roukos, Todd Ward, and Wei-**g Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
- Park et al. (2023) Yong-Hyun Park, Mingi Kwon, Jaewoong Choi, Junghyo Jo, and Youngjung Uh. 2023. Understanding the latent space of diffusion models through the lens of riemannian geometry. Advances in Neural Information Processing Systems, 36:24129–24142.
- Poeta et al. (2023) Eleonora Poeta, Gabriele Ciravegna, Eliana Pastor, Tania Cerquitelli, and Elena Baralis. 2023. Concept-based explainable artificial intelligence: A survey. arXiv preprint arXiv:2312.12936.
- Radford et al. (2021) Alec Radford, Jong Wook Kim, Aditya Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. International Conference on Machine Learning, pages 8748–8763.
- Razavi et al. (2019) Ali Razavi, Aaron Van den Oord, and Oriol Vinyals. 2019. Generating diverse high-fidelity images with vq-vae-2. Advances in neural information processing systems, 32.
- Rezende et al. (2014) Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic backpropagation and approximate inference in deep generative models. arXiv preprint arXiv:1401.4082.
- Ridgeway (2016) Karl Ridgeway. 2016. A survey of inductive biases for factorial representation-learning. arXiv preprint arXiv:1612.05299.
- Rombach et al. (2022) Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695.
- Saleem et al. (2022) Rabia Saleem, Bo Yuan, Fatih Kurugollu, Ashiq Anjum, and Lu Liu. 2022. Explaining deep neural networks: A survey on the global interpretation methods. Neurocomputing, 513:165–180.
- Sammani et al. (2022) Fawaz Sammani, Tanmoy Mukherjee, and Nikos Deligiannis. 2022. Nlx-gpt: A model for natural language explanations in vision and vision-language tasks. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8322–8332.
- Team et al. (2023) Gemini Team, Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, et al. 2023. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805.
- Vahdat et al. (2021) Arash Vahdat, Karsten Kreis, and Jan Kautz. 2021. Score-based generative modeling in latent space. Advances in neural information processing systems, 34:11287–11302.
- Wang et al. (2024) Shiyu Wang, Yuanqi Du, Xiaojie Guo, Bo Pan, Zhaohui Qin, and Liang Zhao. 2024. Controllable data generation by deep learning: A review. ACM Computing Surveys, 56(9):1–38.
- Wu et al. (2023) Qiucheng Wu, Yujian Liu, Handong Zhao, A**kya Kale, Trung Bui, Tong Yu, Zhe Lin, Yang Zhang, and Shiyu Chang. 2023. Uncovering the disentanglement capability in text-to-image diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1900–1910.
- Xu et al. (2022) Mengzhao Xu, Lantao Zhang, Shitong Luo, Wei Liu, Pan Zhou, Heng Xie, and Jian Wu. 2022. Geodiff: A geometric diffusion model for molecular conformation generation. In International Conference on Learning Representations.
- Yan et al. (2016) Xinchen Yan, Jimei Yang, Kihyuk Sohn, and Honglak Lee. 2016. Attribute2image: Conditional image generation from visual attributes. In European Conference on Computer Vision, pages 776–791.
- Yang et al. (2023) Ruihan Yang, Prakhar Srivastava, and Stephan Mandt. 2023. Diffusion probabilistic modeling for video generation. Entropy, 25(10):1469.
- Yin et al. (2023) Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu, and Enhong Chen. 2023. A survey on multimodal large language models. arXiv preprint arXiv:2306.13549.
- Yu et al. (2015) Fisher Yu, Yinda Zhang, Shuran Song, Ari Seff, and Jianxiong Xiao. 2015. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365.
- Zhao et al. (2024) Qilong Zhao, Yifei Zhang, Mengdan Zhu, Siyi Gu, Yuyang Gao, Xiaofeng Yang, and Liang Zhao. 2024. Due: Dynamic uncertainty-aware explanation supervision via 3d imputation. arXiv preprint arXiv:2403.10831.
- Zhu et al. (2016) Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, and Alexei A Efros. 2016. Generative visual manipulation on the natural image manifold. In European Conference on Computer Vision, pages 597–613.
- Zhu et al. (2024) Mengdan Zhu, Zhenke Liu, Bo Pan, Abhinav Angirekula, and Liang Zhao. 2024. Explaining latent representations of generative models with large multimodal models. arXiv preprint arXiv:2402.01858.
Appendix A Human Annotaions
The ground-truth annotations of explanations are performed by four different annotators from the United States and China and the results are then aggregated to calculate the automated evaluation metrics. To obtain the accuracy, the annotators have to select one out of four choices(yes, weak yes, weak no, and no) as a response for whether the explanation justifies the pattern of change in the image sequence. The four levels of rating are then mapped to the scores of 1, 2/3, 1/3, and 0 respectively. The scores are then averaged among data samples in the datasets to have the accuracy values, following the evaluation process in Sammani et al. (2022); Chang et al. (2023).
Appendix B Selection of Threshold of the Certainty Scores
We follow our previous work Zhu et al. (2024) to find the threshold, we denote the true label of the interpretability of each latent variable as 1 if at least two annotators out of three can see a clear pattern in the generated images, otherwise we denote it as 0. The certainty score of each response is its predicted probability. We can then compute its AUC (area under the ROC curve) based on different thresholds and choose the one with the highest AUC as our threshold . Our final output is the response that has the highest mean pairwise cosine similarity with other responses if the certainty score is equal or greater than the threshold . Otherwise, will be "no clear explanation".
Appendix C More Qualitative Evaluation Results
Appendix D Visualization of the Generated Explanations
In this section, we provide visualization examples of the explanations generated by our proposed LatentExplainer framework under different inductive biases: combination bias with the inductive bias prompt in Figure 10, conditional bias with the inductive bias prompt in Figure 11 and without the inductive bias prompt Figure 12. Each figure demonstrates the patterns of change in generated image sequences along specific latent directions, accompanied by the corresponding explanations.