LatentExplainer: Explaining Latent Representations in Deep Generative Models with Multi-modal Foundation Models

Mengdan Zhu1  Raasikh Kanjiani1  Jiahui Lu2  Andrew Choi1  Qirui Ye1  Liang Zhao1
1Emory University  2University of Southern California
{mengdan.zhu,rkanji2,ajchoi5,qirui.ye,liang.zhao}@emory.edu
{jiahuilu}@usc.edu
Abstract

Deep generative models like VAEs and diffusion models have advanced various generation tasks by leveraging latent variables to learn data distributions and generate high-quality samples. Despite the field of explainable AI making strides in interpreting machine learning models, understanding latent variables in generative models remains challenging. This paper introduces LatentExplainer, a framework for automatically generating semantically meaningful explanations of latent variables in deep generative models. LatentExplainer tackles three main challenges: inferring the meaning of latent variables, aligning explanations with inductive biases, and handling varying degrees of explainability. By perturbing latent variables and interpreting changes in generated data, the framework provides a systematic approach to understanding and controlling the data generation process, enhancing the transparency and interpretability of deep generative models. We evaluate our proposed method on several real-world and synthetic datasets, and the results demonstrate superior performance in generating high-quality explanations of latent variables.

1 Introduction

Deep generative models, such as Variational Autoencoders (VAEs) Kingma and Welling (2013) and diffusion models Rombach et al. (2022), have become a state-of-the-art approach in various generation tasks Razavi et al. (2019); Ho et al. (2022); Yang et al. (2023). This method effectively leverages latent variables to learn underlying data distributions and generate high-quality samples by capturing the underlying structure of high-dimensional data in a low-dimensional semantic space Vahdat et al. (2021). To enhance performance, inductive biases are often enforced over latent variables. For instance, disentanglement is a rule of thumb which enforces orthogonality among different latent variables Ding et al. (2020). Moreover, sometimes latent variables can be grouped, leading to combination bias Klys et al. (2018); Guo et al. (2020). More recently, the controllability of deep generative models is desired, where latent variables are bound with specific properties of interest Wang et al. (2024). As the latent variables represent all the information in a lower dimension, they can be considered as an effective abstraction of data into key factors. Hence, it is highly beneficial to interpret the meaning of the latent variables to understand the key factors of the data better and to control the data generation process more effectively. For examples, as shown in Figure 1, by interpreting the latent variables of deep generative models such as VAEs for images, we can elicit key factors of the images for better understanding and control of the data generation.

Refer to caption
Figure 1: Latent variables of deep generative models represent data distribution in a low-dimensional space. This figure demonstrates how the interpretation of latent variables’ semantic meaning helps to generate data samples with controllable properties.

The field of explainable artificial intelligence (XAI) has extensively investigated the interpretation of machine learning models Adadi and Berrada (2018). However, interpreting latent variables in deep generative models remains underexplored. Machine learning model explanation, a.k.a., post-hoc explanation, can be categorized into global and local explanations Gao et al. (2024). Global explanations focus on elucidating the entire model, while local explanations target the reasoning behind specific predictions. Global explanations are more challenging, with existing work mostly emphasizing attributions to identify which features are most important for model decision-making Saleem et al. (2022). The missing piece is understanding the meaning of features when they are unknown, which is very common in deep generative models. More recently, one category of global explanation methods, called concept-based explanations, aims to generate more human-understandable concepts as explanations Poeta et al. (2023). However, current concept-based methods often rely on human heuristics or predefined concept and feature space, limiting the expressiveness of the explanations and falling short of achieving truly automatic explanation generation Kim et al. (2018); Koh et al. (2020); Bai et al. (2022).

Despite the progress in XAI, interpreting latent variables in deep generative models presents significant challenges. First, these variables are not grounded in real-world concepts, and the black-box nature of the models prevents us from inferring the meaning of latent variables from observations. Second, explanations must adhere to the inductive biases imposed on the latent variables, which is essential yet difficult to ensure. For example, in disentangled latent variables, the semantic meanings should be orthogonal. Third, different latent variables have varying degrees of explainability. Some may be trivial to data generation and intrinsically lack semantic meaning. It is crucial to identify which latent variables are explainable and which do not need explanation.

To address the aforementioned challenges, we propose LatentExplainer, a novel and generic framework that automatically generates semantically-meaningful explanations of latent variables in deep generative models. To explain these variables and work around the black-box nature of the models (Challenge 1), we propose to perturbe each latent variable and explain the resulting changes in the generated data. Specifically, we perturb and decode each manipulated latent variable to produce the corresponding sequence of generated data samples. The trend in the sequence is leveraged to reflect the semantics of the latent variable to be explained. To align explanations with the intrinsic nature of the deep generative models (Challenge 2), we design a generic framework that formulates inductive biases on the Bayesian network of latent variable models into textual prompts. These prompts are understandable to humans and large foundation models, hence converting user-provided inductive bias formulas into corresponding prompts that a multimodal large language model can easily understand. To handle the varying degree of explainability in latent variables (Challenge 3), we propose to probe the confidence of the explanations to avoid hallucitations by estimating their uncertainty. This approach assesses whether the latent representations are interpretable and selects the most consistent explanations, ensuring accurate and meaningful interpretation of the latent variables.

2 Related Work

Deep Generative Models. Deep generative models are essential for modeling complex data distributions. Variational Autoencoders (VAEs) are prominent in this area, introduced by Kingma and Welling Kingma and Welling (2013). VAEs encode input data into a latent space and decode it back, optimizing a balance between reconstruction error and the Kullback-Leibler divergence Rezende et al. (2014). They have diverse applications, including image generation Yan et al. (2016), semi-supervised learning Kingma et al. (2014), and anomaly detection An and Cho (2015).

Diffusion models, proposed by Ho et al. Ho et al. (2020), generate data by a diffusion process that gradually adds noise to the data and then learns to reverse this process to recover the original data. These models have achieved high-fidelity image generation, surpassing GANs in quality and diversity. Latent diffusion models allow the model to operate in a lower-dimensional space, which significantly reduces computational requirements while maintaining the quality of the generated samples Rombach et al. (2022). Advances have made them applicable to text-to-image synthesis Nichol and Dhariwal (2021), audio generation Kong et al. (2020), and molecular design Xu et al. (2022).

Image Manipulation. Image manipulation using latent variables in generative models like VAEs and diffusion models is an important technique for modifying and enhancing generated images. A key method is latent traverse, which involves traversing different values of latent variables to achieve diverse manipulations in the generated outputs. This technique allows for precise control over the attributes in generated images, enabling adjustments Chen et al. (2016); Higgins et al. (2017). For example, latent traverse has been effectively employed to disentangle and control various attributes in generated images Brock et al. (2016); Zhu et al. (2016). Although latent traverse is used for visualization and editing purposes, it has not yet been widely explored as a tool for explaining the underlying generative processes in natural language.

Multi-modal Large Language Models. Multi-modal Large Language Models (MLLMs) integrate diverse data modalities, enhancing their ability to understand and generate complex information Yin et al. (2023); Bai et al. (2024). Notable models include CLIP, which learns visual and textual representations Radford et al. (2021), GPT-4o, which extends GPT-4v with better visual capabilities OpenAI (2024), and Gemini that is a family of highly capable multimodal models Team et al. (2023) to perform complex tasks that require understanding context across different modalities.

Refer to caption
Figure 2: The overview of our proposed framework LatentExplainer (a) Inductive-bias-guided Data Manipulation generates image sequences by manipulating latent variables with predefined biases; (b) Automatic Prompt Generation with symbol-to-word map** uses these images and formulas to create prompts for a MLLM to produce explanations; (c) Uncertainty-aware Explanation Generation evaluates multiple responses from the MLLM, selecting the most consistent explanation with a certainty score.

3 Preliminaries and Problem Formulation

Deep Generative Models. These are a class of models that learn a map** between observations 𝐱𝐱\mathbf{x}bold_x and key underlying factors 𝐳𝐳\mathbf{z}bold_z. These models are widely used in deep generative models such as VAEs and latent diffusion models. VAEs, for instance, introduce a probabilistic approach to encoding data by maximizing the evidence lower bound (ELBO) Kingma and Welling (2013):

ELBO=𝔼q(𝐳|𝐱)[logp(𝐱|𝐳)]KL(q(𝐳|𝐱)p(𝐳)),subscriptELBOsubscript𝔼𝑞conditional𝐳𝐱delimited-[]𝑝conditional𝐱𝐳KLconditional𝑞conditional𝐳𝐱𝑝𝐳\mathcal{L}_{\text{ELBO}}=\mathbb{E}_{q(\mathbf{z}|\mathbf{x})}[\log p(\mathbf% {x}|\mathbf{z})]-\text{KL}(q(\mathbf{z}|\mathbf{x})\|p(\mathbf{z})),\vspace{-2mm}caligraphic_L start_POSTSUBSCRIPT ELBO end_POSTSUBSCRIPT = blackboard_E start_POSTSUBSCRIPT italic_q ( bold_z | bold_x ) end_POSTSUBSCRIPT [ roman_log italic_p ( bold_x | bold_z ) ] - KL ( italic_q ( bold_z | bold_x ) ∥ italic_p ( bold_z ) ) ,

where KL stands for Kullback–Leibler divergence. Latent diffusion models further refine the generation process by iteratively refining noise into structured data Rombach et al. (2022). These models effectively capture the underlying structure of data in a low-dimensional semantic space.

Latent variable manipulation of diffusion models aims at transversing the latent representation 𝐳𝐳\mathbf{z}bold_z along the semantic latent direction 𝐯𝐯\mathbf{v}bold_v. The perturbed vector 𝐳~=𝐳+γ[𝒢(𝐳+𝐯)𝒢(𝐳)]~𝐳𝐳𝛾delimited-[]𝒢𝐳𝐯𝒢𝐳\tilde{\mathbf{z}}=\mathbf{z}+\gamma\left[\mathcal{G}(\mathbf{z}+\mathbf{v})-% \mathcal{G}(\mathbf{z})\right]over~ start_ARG bold_z end_ARG = bold_z + italic_γ [ caligraphic_G ( bold_z + bold_v ) - caligraphic_G ( bold_z ) ], where γ𝛾\gammaitalic_γ is a hyper-parameter controlling the strength Park et al. (2023). An image sequence 𝒮𝒮\mathcal{S}caligraphic_S can then be generated by 𝒢(𝐳~)𝒢~𝐳\mathcal{G}(\mathbf{\tilde{z}})caligraphic_G ( over~ start_ARG bold_z end_ARG ). The perturbations in the semantic latent direction lead to semantic changes in the generated image sequence.

Inductive Bias in Latent Variables. Inductive biases are commonly imposed on latent variables to enhance the performance and interpretability of deep generative models. These biases can be categorized into several types: Disentanglement Bias: It enforces orthogonality among different latent variables, ensuring that each latent variable captures a distinct factor of variation in the data Ding et al. (2020). Combination bias: Sometimes latent variables are grouped, leading to biases in how they interact and combine to represent complex data structures Klys et al. (2018). Conditional Bias: It emphasizes the relationship between specific properties of interest and the corresponding latent variables Guo et al. (2020). Other Biases: Various other biases can be introduced depending on the specific requirements of the model, such as sparsity or hierarchical organization, to guide the learning process more effectively.

Interpreting latent variables in deep generative models is extremely challenging due to: 1). Their lack of grounding in real-world concepts and the models’ black-box nature, which hinders meaning inference. 2). The necessity for explanations to conform to inductive biases, such as orthogonality in disentangled variables, which is difficult to ensure. 3). The varying degrees of explainability among latent variables, with some being trivial to data generation and lacking semantic meaning, making it essential to identify which variables require explanation.

Problem Formulation. We assume a dataset 𝒟𝒟\mathcal{D}caligraphic_D, where each sample consists of x𝑥xitalic_x or (x,y)𝑥𝑦(x,y)( italic_x , italic_y ), with xN𝑥superscript𝑁x\in\mathbb{R}^{N}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT and y={yky=\left\{y_{k}\in\right.italic_y = { italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ }k=1K\mathbb{R}\}_{k=1}^{K}blackboard_R } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT as K𝐾Kitalic_K properties of x𝑥xitalic_x. The dataset 𝒟𝒟\mathcal{D}caligraphic_D is generated by M latent variables zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, where i{1,,M}𝑖1𝑀i\in\{1,\ldots,M\}italic_i ∈ { 1 , … , italic_M }. zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT can be a single latent variable or a group of correlated latent variables. Suppose we are given a generative model with a set of formulas \mathcal{F}caligraphic_F with respect to zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, where \mathcal{F}caligraphic_F represents an inductive bias that the generative model must satisfy. Our goal is to derive a textual sequence that explains the semantic meanings of the latent variable zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

4 Proposed Method

4.1 Overview of LatentExplainer

This paper focuses on the tasks in explaining the semantics of latent variables {zi}i=1Msuperscriptsubscriptsubscript𝑧𝑖𝑖1𝑀\{{z}_{i}\}_{i=1}^{M}{ italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT in deep generative models. To interpret the semantics of latent variables and work around the blackbox nature of deep generative models, we propose to perturb each latent variable and explain the change it imposes on the generated data. To solve these challenges, we propose a novel LatentExplainer scheme. We perturb zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and subsequently decode the manipulated latent variables into generated data that are perceptible by humans such as an image. Through a series of perturbations on zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, a sequence of generated data samples can be obtained to reflect the changes in zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in Figure 2(a). When explaining latent variable models, it is crucial to fully leverage and align with the prior knowledge about them. To do this, we design a generic framework that can automatically formulate inductive bias of generative models into textual prompts. Specifically, we have summarized three common inductive biases and designed their symbol-to-word one-on-one prompts 𝒫𝒫\mathcal{P}caligraphic_P (Section 4.2). Our scheme can adaptively convert any user-provided inductive bias formulas \mathcal{F}caligraphic_F into a corresponding prompt 𝒫𝒫\mathcal{P}caligraphic_P to provide more accurate explanations of the latent representations in Figure 2(b) (Section 4.3). Eventually, the explanations are selected through an uncertainty quantification approach to assess whether the latent representations are interpretable and select the most consistent explanations in Figure 2(c) (Section 4.4).

4.2 Inductive-bias-guided Prompt Framework

4.2.1 Generic Framework

In this section, we propose a generic framework that can verbalize the inductive bias in deep generative models into prompts for better latent variable explanations. The prevalent inductive biases in deep generative models are categorized into three types: disentanglement bias, conditional bias, and combination bias. Intrinsically, they are mathematical expressions of the conditional independence among the latent variables, which can confine their explanations but cannot be directly input into large foundation models.

Framework: Our framework proposes a principled, automatic way that translate the mathematical expression to textual prompts. The prompts include adaptive prompts and fixed prompts. The adaptive prompts are converted from the inductive bias formulas. The formula contains mathematical symbols that consist of mathematical variables and mathematical operators. We use the same color to represent the correspondence between mathematical symbols in the formulas and the text in the prompts. The translation mechanism is shown in Table 1.

Grammar # Symbol Prompt
1 p𝑝pitalic_p(zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT \mid\cdot∣ ⋅) pattern of change
2 p(zip(z_{i}\miditalic_p ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣zisubscript𝑧superscript𝑖z_{i^{\prime}}italic_z start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT), iifor-all𝑖superscript𝑖\forall i\neq i^{\prime}∀ italic_i ≠ italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT other variations
3 pksubscript𝑝𝑘p_{k}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT property of interests
4 G𝐺Gitalic_G a group
5 \in associated with
6 \notin not associated with
7 = same
8 \neq change
Table 1: Lookup table for symbol-to-word map**.

Fixed Ending: What is the pattern of change? Write in a sentence. If there is no clear pattern, just write "No clear explanation".

4.2.2 From Disentanglement Bias to Prompts

Disentanglement bias refers to the model’s ability to separate independent factors in the data Ding et al. (2020); Wu et al. (2023). The formula representing this bias focuses on ensuring that different latent variables correspond to different independent underlying factors. Independent factors would be invariant with respect to one another (Ridgeway, 2016). By disentangling these factors, researchers can better understand the underlying structure of the data and improve the model’s performance on tasks such as representation learning.

Formula:

p(zizi=α)=p(zizi=β),ii,αβ.formulae-sequence𝑝conditionalsubscript𝑧𝑖subscript𝑧superscript𝑖𝛼𝑝conditionalsubscript𝑧𝑖subscript𝑧superscript𝑖𝛽formulae-sequencefor-all𝑖superscript𝑖𝛼𝛽p({\color[rgb]{.75,0,.25}\definecolor[named]{pgfstrokecolor}{rgb}{.75,0,.25}z_% {i}}\mid{\color[rgb]{.75,.5,.25}\definecolor[named]{pgfstrokecolor}{rgb}{% .75,.5,.25}z_{i^{\prime}}=\alpha}){\color[rgb]{0,.5,.5}\definecolor[named]{% pgfstrokecolor}{rgb}{0,.5,.5}=}p({\color[rgb]{.75,0,.25}\definecolor[named]{% pgfstrokecolor}{rgb}{.75,0,.25}z_{i}}\mid{\color[rgb]{.75,.5,.25}\definecolor[% named]{pgfstrokecolor}{rgb}{.75,.5,.25}z_{i^{\prime}}=\beta}),{\color[rgb]{% .75,.5,.25}\definecolor[named]{pgfstrokecolor}{rgb}{.75,.5,.25}\forall i\neq i% ^{\prime},\alpha\neq\beta}.\vspace{-1mm}italic_p ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ italic_z start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = italic_α ) = italic_p ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ italic_z start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = italic_β ) , ∀ italic_i ≠ italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_α ≠ italic_β .

The above formula is translated into the following prompting using the grammar #1,2,7.

Prompt: These two rows of images show the same pattern of change despite other variations.

4.2.3 From Combination Bias to Prompts

Combination bias involves understanding how different latent variables interact within groups and remain independent across groups Klys et al. (2018). This bias is significant as it helps in identifying how combinations of factors contribute to the overall data generation process. Recognizing these interactions enables researchers to design models that can generate more complex and realistic data by capturing intricate relationships within the data.

  • No inter-group correlation:

    Formula: p(zizj=α)=p(zizj=β),𝑝conditionalsubscript𝑧𝑖subscript𝑧𝑗𝛼𝑝conditionalsubscript𝑧𝑖subscript𝑧𝑗𝛽\displaystyle p\left({\color[rgb]{.75,0,.25}\definecolor[named]{pgfstrokecolor% }{rgb}{.75,0,.25}z_{i}}\mid{\color[rgb]{.75,.5,.25}\definecolor[named]{% pgfstrokecolor}{rgb}{.75,.5,.25}z_{j}=\alpha}\right){\color[rgb]{0,.5,.5}% \definecolor[named]{pgfstrokecolor}{rgb}{0,.5,.5}=}p\left({\color[rgb]{% .75,0,.25}\definecolor[named]{pgfstrokecolor}{rgb}{.75,0,.25}z_{i}}\mid{\color% [rgb]{.75,.5,.25}\definecolor[named]{pgfstrokecolor}{rgb}{.75,.5,.25}z_{j}=% \beta}\right),italic_p ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_α ) = italic_p ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_β ) ,
    ziG,zjG,GG,αβ.formulae-sequencefor-allsubscript𝑧𝑖𝐺formulae-sequencesubscript𝑧𝑗superscript𝐺formulae-sequence𝐺superscript𝐺𝛼𝛽\displaystyle\forall{\color[rgb]{.75,0,.25}\definecolor[named]{pgfstrokecolor}% {rgb}{.75,0,.25}z_{i}}{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{% rgb}{0,0,1}\in}{\color[rgb]{1,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{% 1,0,1}\pgfsys@color@cmyk@stroke{0}{1}{0}{0}\pgfsys@color@cmyk@fill{0}{1}{0}{0}% G},{\color[rgb]{.75,.5,.25}\definecolor[named]{pgfstrokecolor}{rgb}{.75,.5,.25% }z_{j}}{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\in}{% \color[rgb]{1,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,1}% \pgfsys@color@cmyk@stroke{0}{1}{0}{0}\pgfsys@color@cmyk@fill{0}{1}{0}{0}G^{% \prime}},{\color[rgb]{.75,.5,.25}\definecolor[named]{pgfstrokecolor}{rgb}{% .75,.5,.25}G\neq G^{\prime}},{\color[rgb]{.75,.5,.25}\definecolor[named]{% pgfstrokecolor}{rgb}{.75,.5,.25}\alpha\neq\beta}.∀ italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_G , italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_G ≠ italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_α ≠ italic_β .

The above formula is translated into the following prompting using the grammar #1,2,4,5,7. Prompt: The pattern of change is associated with a group. The first two rows of images show the same pattern of change despite other variations in another group.

  • Intra-group correlation:

    Formula:p(zizi=α)p(zizi=β),zi,ziG,ii,αβ.formulae-sequenceFormula:𝑝conditionalsubscript𝑧𝑖subscript𝑧superscript𝑖𝛼𝑝conditionalsubscript𝑧𝑖subscript𝑧superscript𝑖𝛽for-allsubscript𝑧𝑖formulae-sequencesuperscriptsubscript𝑧𝑖𝐺formulae-sequence𝑖superscript𝑖𝛼𝛽\begin{split}\noindent\textbf{Formula:}\quad&p\left({\color[rgb]{.75,0,.25}% \definecolor[named]{pgfstrokecolor}{rgb}{.75,0,.25}z_{i}}\mid{\color[rgb]{% .75,.5,.25}\definecolor[named]{pgfstrokecolor}{rgb}{.75,.5,.25}z_{i^{\prime}}=% \alpha}\right){\color[rgb]{0,1,1}\definecolor[named]{pgfstrokecolor}{rgb}{% 0,1,1}\pgfsys@color@cmyk@stroke{1}{0}{0}{0}\pgfsys@color@cmyk@fill{1}{0}{0}{0}% \neq}p\left({\color[rgb]{.75,0,.25}\definecolor[named]{pgfstrokecolor}{rgb}{% .75,0,.25}z_{i}}\mid{\color[rgb]{.75,.5,.25}\definecolor[named]{pgfstrokecolor% }{rgb}{.75,.5,.25}z_{i^{\prime}}=\beta}\right),\\ &\forall{\color[rgb]{.75,0,.25}\definecolor[named]{pgfstrokecolor}{rgb}{% .75,0,.25}z_{i}},{\color[rgb]{.75,.5,.25}\definecolor[named]{pgfstrokecolor}{% rgb}{.75,.5,.25}z_{i}^{\prime}}{\color[rgb]{0,0,1}\definecolor[named]{% pgfstrokecolor}{rgb}{0,0,1}\in}{\color[rgb]{1,0,1}\definecolor[named]{% pgfstrokecolor}{rgb}{1,0,1}\pgfsys@color@cmyk@stroke{0}{1}{0}{0}% \pgfsys@color@cmyk@fill{0}{1}{0}{0}G},{\color[rgb]{.75,.5,.25}\definecolor[% named]{pgfstrokecolor}{rgb}{.75,.5,.25}i\neq i^{\prime},\alpha\neq\beta}.\end{split}start_ROW start_CELL Formula: end_CELL start_CELL italic_p ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ italic_z start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = italic_α ) ≠ italic_p ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ italic_z start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = italic_β ) , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ∀ italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_G , italic_i ≠ italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_α ≠ italic_β . end_CELL end_ROW

The above formula is translated into the following prompting using the grammar #1,2,4,5,8.

Prompt: The pattern of change is associated with a group. The pattern of change in the last two rows of images should change given other variations.

4.2.4 From Conditional Bias to Prompts

Conditional bias focuses on the relationship between specific properties of interest and the corresponding latent variables. This bias is important because it allows models to generate data conditioned on particular attributes, enhancing the model’s ability to produce targeted and controlled outputs.

Formula:

p(zipk=α)p(zipk=β),ziG,kαβ.p({\color[rgb]{.75,0,.25}\definecolor[named]{pgfstrokecolor}{rgb}{.75,0,.25}z_% {i}}\mid{\color[rgb]{.75,.5,.25}\definecolor[named]{pgfstrokecolor}{rgb}{% .75,.5,.25}p_{k}=\alpha}){\color[rgb]{0,1,1}\definecolor[named]{pgfstrokecolor% }{rgb}{0,1,1}\pgfsys@color@cmyk@stroke{1}{0}{0}{0}\pgfsys@color@cmyk@fill{1}{0% }{0}{0}\neq}p({\color[rgb]{.75,0,.25}\definecolor[named]{pgfstrokecolor}{rgb}{% .75,0,.25}z_{i}}\mid{\color[rgb]{.75,.5,.25}\definecolor[named]{pgfstrokecolor% }{rgb}{.75,.5,.25}p_{k}=\beta}),\forall{\color[rgb]{.75,0,.25}\definecolor[% named]{pgfstrokecolor}{rgb}{.75,0,.25}z_{i}}{\color[rgb]{0,0,1}\definecolor[% named]{pgfstrokecolor}{rgb}{0,0,1}\in}{\color[rgb]{1,0,1}\definecolor[named]{% pgfstrokecolor}{rgb}{1,0,1}\pgfsys@color@cmyk@stroke{0}{1}{0}{0}% \pgfsys@color@cmyk@fill{0}{1}{0}{0}G}{\color[rgb]{1,.5,0}{}_{k}},{\color[rgb]{% .75,.5,.25}\definecolor[named]{pgfstrokecolor}{rgb}{.75,.5,.25}\alpha\neq\beta% }.\vspace{-2mm}italic_p ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_α ) ≠ italic_p ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_β ) , ∀ italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_G start_FLOATSUBSCRIPT italic_k end_FLOATSUBSCRIPT , italic_α ≠ italic_β .

The above formula is translated into the following prompting using the grammar #1,2,3,4,5,8.

Prompt: If the pattern of change is associated with the group of the property of interest, this image sequence will change as other variations in [propertyk]delimited-[]subscriptproperty𝑘[\text{property}_{k}][ property start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ].

There may exists a latent variable zjsubscript𝑧𝑗z_{j}italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT that are independent of pksubscript𝑝𝑘p_{k}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT:

p(zjpk=α)=p(zjpk=β),zjG,kαβ.p({\color[rgb]{.75,0,.25}\definecolor[named]{pgfstrokecolor}{rgb}{.75,0,.25}z_% {j}}\mid{\color[rgb]{.75,.5,.25}\definecolor[named]{pgfstrokecolor}{rgb}{% .75,.5,.25}p_{k}=\alpha}){\color[rgb]{0,.5,.5}\definecolor[named]{% pgfstrokecolor}{rgb}{0,.5,.5}=}p({\color[rgb]{.75,0,.25}\definecolor[named]{% pgfstrokecolor}{rgb}{.75,0,.25}z_{j}}\mid{\color[rgb]{.75,.5,.25}\definecolor[% named]{pgfstrokecolor}{rgb}{.75,.5,.25}p_{k}=\beta}),\forall{\color[rgb]{% .75,0,.25}\definecolor[named]{pgfstrokecolor}{rgb}{.75,0,.25}z_{j}}{\color[rgb% ]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\notin}{\color[rgb]{% 1,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,1}\pgfsys@color@cmyk@stroke% {0}{1}{0}{0}\pgfsys@color@cmyk@fill{0}{1}{0}{0}G}{\color[rgb]{1,.5,0}{}_{k}},{% \color[rgb]{.75,.5,.25}\definecolor[named]{pgfstrokecolor}{rgb}{.75,.5,.25}% \alpha\neq\beta}.\vspace{-2mm}italic_p ( italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∣ italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_α ) = italic_p ( italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∣ italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_β ) , ∀ italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∉ italic_G start_FLOATSUBSCRIPT italic_k end_FLOATSUBSCRIPT , italic_α ≠ italic_β .

The above formula is translated into the following prompting using the grammar #1,2,3,4,6,7.

Prompt: If the pattern of change is not associated with the group of the property of interest, this image sequence will remain constant despite other variations in [propertyk]delimited-[]subscriptproperty𝑘[\text{property}_{k}][ property start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ].

Algorithm 1 Generate Prompt from Inductive Bias by LatentExplainer
1:Inductive Bias Formula(s) \mathcal{F}caligraphic_F, Few-Shot Examples \mathcal{H}caligraphic_H, pre-trained LLM πθsubscript𝜋𝜃\pi_{\theta}italic_π start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, an optional input of the information about the symbol \mathcal{I}caligraphic_I.
2:Generated Prompt 𝒫𝒫\mathcal{P}caligraphic_P
3:function GeneratePrompt(\mathcal{F}caligraphic_F, \mathcal{H}caligraphic_H, πθsubscript𝜋𝜃\pi_{\theta}italic_π start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT)
4:     symbols \leftarrow ExtractSymbols(\mathcal{F}caligraphic_F)
5:     semantics \leftarrow { }
6:     for symbol in symbols do
7:         semantic \leftarrow πθ([symbol,,])subscript𝜋𝜃symbol\pi_{\theta}([{\text{symbol},\mathcal{H},\mathcal{I}}])italic_π start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( [ symbol , caligraphic_H , caligraphic_I ] )
8:         semantics[symbol] \leftarrow semantic
9:     end for
10:     𝒫=πθ([,,semantics])𝒫subscript𝜋𝜃semantics\mathcal{P}=\pi_{\theta}([\mathcal{F},\mathcal{H},\text{semantics}])caligraphic_P = italic_π start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( [ caligraphic_F , caligraphic_H , semantics ] )
11:     return 𝒫𝒫\mathcal{P}caligraphic_P
12:end function

4.3 Automatic In-context Prompt Generation

By leveraging these three common inductive biases identified in generative models, we can automatically generate prompts 𝒫𝒫\mathcal{P}caligraphic_P that align with \mathcal{F}caligraphic_F using in-context learning. The pseudo-code of this section can be found at Algorithm 1.

Our approach starts by extracting mathematical symbols from the given formula \mathcal{F}caligraphic_F using the ExtractSymbols function (line 2). This function traverses the formula to identify and isolate the mathematical symbols.

Next, the algorithm initializes an empty dictionary semantics to store the semantic representations of these symbols (line 3). For each symbol, the pre-trained LLM πθsubscript𝜋𝜃\pi_{\theta}italic_π start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT extracts its semantic meaning based on few-shot examples \mathcal{H}caligraphic_H and the optional input symbol information \mathcal{I}caligraphic_I (lines 4-6).

Finally, the algorithm generates the prompt 𝒫𝒫\mathcal{P}caligraphic_P using the formula \mathcal{F}caligraphic_F, the few-shot examples \mathcal{H}caligraphic_H, and the gathered semantics (line 8). This step-by-step reasoning process ensures that the generated prompts are contextually relevant and aligned with the underlying formulas, which could reduce hallucination and enhance model performance and interoperability.

Model Dataset BLEU ROUGE-L METEOR SPICE Accuracy
DDPM w/ Inductive Bias Prompts CelebA-HQ 18.5 35.1 32.8 18.8 71.1
LSUN-Church 30.9 44.3 36.4 29.2 78.9
AFHQ-Dog 25.9 38.0 41.5 29.9 78.9
DDPM w/o Inductive Bias Prompts CelebA-HQ 0.2 4.1 4.3 1.8 13.8
LSUN-Church 8.9 28.1 20.6 14.5 55.6
AFHQ-Dog 2.9 13.3 13.8 9.0 39.1
β𝛽\betaitalic_β-TCVAE w/ Inductive Bias Prompts CelebA-HQ 29.9 55.4 44.3 30.2 83.3
3DShapes 25.4 49.1 32.7 22.8 88.9
dSprites 12.5 37.3 29.9 21.7 73.3
β𝛽\betaitalic_β-TCVAE w/o Inductive Bias Prompts CelebA-HQ 6.1 32.7 25.3 14.0 50.0
3DShapes 5.4 31.2 22.3 10.1 88.9
dSprites 0.0 18.7 14.4 7.5 13.3
Table 2: Quantitative explanation comparison of the disentanglement bias.
Model Dataset BLEU ROUGE-L METEOR SPICE Accuracy
CSVAE w/ Inductive Bias Prompts CelebA-HQ 25.3 45.3 39.8 27.0 93.3
3DShapes 36.2 43.3 36.6 36.8 80.0
dSprites 16.0 35.9 29.2 21.8 66.7
CSVAE w/o Inductive Bias Prompts CelebA-HQ 12.0 37.1 28.5 17.5 80.0
3DShapes 14.2 28.6 25.1 11.8 50.0
dSprites 0.0 16.4 10.1 7.3 25.0
Table 3: Quantitative explanation comparison of the combination bias.
Model Dataset BLEU ROUGE-L METEOR SPICE Accuracy
Stable Diffusion w/ Inductive Bias Prompts CelebA-HQ 18.5 40.9 33.9 23.9 82.7
LSUN-Church 18.3 38.1 32.5 25.0 97.3
AFHQ-Cat 13.3 29.7 23.6 20.1 88.0
Stable Diffusion w/o Inductive Bias Prompts CelebA-HQ 5.9 24.3 21.1 12.2 45.3
LSUN-Church 10.5 29.8 26.4 17.0 76.0
AFHQ-Cat 10.2 28.6 24.6 17.3 68.0
CSVAE w/ Inductive Bias Prompts CelebA-HQ 24.2 42.9 36.2 19.6 100.0
3DShapes 25.7 39.5 31.6 20.7 73.3
dSprites 21.2 35.0 28.7 20.7 53.3
CSVAE w/o Inductive Bias Prompts CelebA-HQ 6.1 32.7 25.3 14.0 53.3
3DShapes 5.4 31.2 22.3 10.1 80.0
dSprites 0.0 18.7 14.4 7.5 13.3
Table 4: Quantitative explanation comparison of the conditional bias.

4.4 Uncertainty-aware Explanations

Uncertainty-aware methods can be applied to large language model responses Lin et al. (2023) and image explanations Zhao et al. (2024). To measure the uncertainty of the responses from GPT-4o, we sampled n𝑛nitalic_n times from the GPT-4o to generate the responses ={r1,r2,r3,,rn}.subscript𝑟1subscript𝑟2subscript𝑟3subscript𝑟𝑛\mathcal{R}=\{r_{1},r_{2},r_{3},...,r_{n}\}.caligraphic_R = { italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , … , italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } . The certainty score of the explanation is the average pairwise cosine similarity of the responses \mathcal{R}caligraphic_R: 1/Ci=1nj=1,ijnsim(ri,rj)1𝐶superscriptsubscript𝑖1𝑛superscriptsubscriptformulae-sequence𝑗1𝑖𝑗𝑛simsubscript𝑟𝑖subscript𝑟𝑗1/C\cdot{\sum}_{i=1}^{n}{\sum}_{j=1,i\neq j}^{n}\text{sim}(r_{i},r_{j})1 / italic_C ⋅ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 , italic_i ≠ italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT sim ( italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) where C=n(n1)𝐶𝑛𝑛1C=n\cdot(n-1)italic_C = italic_n ⋅ ( italic_n - 1 ). We further use a threshold of the uncertainty score to assess the interpretability of the latent variables. The implementation details are in Appendix B.

5 Experiment

The experiments are conducted on a 64-bit machine with 24-core Intel 13th Gen Core i9-13900K @ 5.80GHz, 32GB memory and NVIDIA GeForce RTX 4090. We use GPT-4o as our MLLM and set temperature=1𝑡𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒1temperature=1italic_t italic_e italic_m italic_p italic_e italic_r italic_a italic_t italic_u italic_r italic_e = 1 and top_p=1.𝑡𝑜𝑝_𝑝1top\_p=1.italic_t italic_o italic_p _ italic_p = 1 . The code is available at  https://github.com/mengdanzhu/LatentExplainer.

5.1 Dataset

We utilized five datasets to evaluate the performance of different generative models under three inductive biases: CelebA-HQ (Karras et al., 2018), AFHQ-Dog (Choi et al., 2020), LSUN-Church (Yu et al., 2015) for the unconditional and conditional diffusion models, and CelebA-HQ (Karras et al., 2018), 3DShapes (Burgess and Kim, 2018), and dSprites (Matthey et al., 2017) for the VAE models. The CelebA-HQ dataset is a high-quality version of the CelebA dataset, consisting of 30K images of celebrities, divided into 28K for training and 2K for testing. The LSUN-Church dataset contains large-scale images of church buildings. The AFHQ dataset includes high-quality images of animals, divided into three categories: cats, dogs, and wild animals. 3DShapes is a synthetic dataset contains images of 3D shapes with six factors of variation: floor hue, wall hue, object hue, object scale, object shape, and wall orientation, divided into 384K for training and 96K for testing. dSprites consists of 2D shapes (hearts, squares, ellipses) generated with five factors of variation: shape, scale, orientation, position X, and position Y, divided into 516K for training and 221K for testing.

Refer to caption
Figure 3: Visualization of the generated explanations with the inductive bias prompt for the disentanglement bias.
Refer to caption
Figure 4: Visualization of the generated explanations w/o the inductive bias prompt for the disentanglement bias.

5.2 Generative Models under Inductive Biases

We explore the latent space in generative models that satisfy the aforementioned three types of inductive biases. For each type, we present the relevant generative models that align with the corresponding inductive bias: (1) Disentanglement Bias: β𝛽\betaitalic_β-TCVAE (Chen et al., 2018) explicitly penalizes the total correlation of the latent variables to disentangle the latent representations. Denoising Diffusion Probabilistic Model(DDPM) (Ho et al., 2020) adds Gaussian noise independently at each timestep in the forward process and eventually transforms into pure Gaussian noise, in which the covariance matrix is diagonal. This assumes the latent factors are independent; (2) Combination Bias: CSVAE (Klys et al., 2018) has two groups of latent variables z𝑧zitalic_z and w𝑤witalic_w, where z𝑧zitalic_z and w𝑤witalic_w are uncorrelated and the latent variables within the group are correlated; (3) Conditional Bias: CSVAE also satisfies conditional bias because one group of latent variables w𝑤witalic_w is associated with the properties while the other group of latent variables z𝑧zitalic_z minimizes the mutual information with the properties. Stable Diffusion (Rombach et al., 2022) is a latent diffusion model to generate images conditioned on prompts.

Model Method BLEU ROUGE-L METEOR SPICE Accuracy
DDPM LatentExplainer 25.1 39.1 36.9 26.0 76.3
w/o Inductive Bias Prompts 4.0 15.2 12.9 8.4 32.4
w/o Uncertainty Quantification 16.8 32.0 29.0 19.8 70.0
w/o IB and w/o UQ 2.9 13.8 11.8 7.1 32.2
β𝛽\betaitalic_β-TCVAE LatentExplainer 22.6 47.3 35.6 24.9 82.4
w/o Inductive Bias Prompts 3.8 27.5 20.7 10.5 52.9
w/o Uncertainty Quantification 18.7 41.0 33.8 22.6 54.9
w/o IB and w/o UQ 3.5 27.5 19.4 10.7 47.1
Table 5: Ablation study of the proposed LatentExpainer on inductive bias prompts and the selection of explanations based on the uncertainty for the disentanglement bias.
Model Dataset BLEU ROUGE-L METEOR SPICE Accuracy
CSVAE LatentExplainer 25.8 41.5 35.2 28.5 81.0
w/o Inductive Bias Prompts 8.7 27.4 21.2 12.2 54.6
w/o Uncertainty Quantification 16.2 35.6 27.6 22.8 66.7
w/o IB and w/o UQ 5.8 27.3 20.2 10.6 52.4
Table 6: Ablation study of the proposed LatentExpainer on inductive bias prompts and the selection of explanations based on the uncertainty for the combination bias.
Model Dataset BLEU ROUGE-L METEOR SPICE Accuracy
Stable Diffusion LatentExplainer 16.7 36.2 30.0 23.0 89.3
w/o Inductive Bias Prompts 8.9 27.6 24.0 15.5 63.1
w/o Uncertainty Quantification 14.9 32.8 27.0 22.0 88.4
w/o IB and w/o UQ 7.1 25.7 21.8 14.7 55.6
CSVAE LatentExplainer 23.7 39.1 32.2 20.3 75.6
w/o Inductive Bias Prompts 11.0 29.0 22.8 12.2 48.9
w/o Uncertainty Quantification 11.8 29.5 22.1 14.2 57.8
w/o IB and w/o UQ 6.5 23.0 17.0 8.5 44.4
Table 7: Ablation study of the proposed LatentExpainer on inductive bias prompts and the selection of explanations based on the uncertainty for the conditional bias.

5.3 Quantitative Analysis

For the quantitative explanation evaluation, we use BLEU Papineni et al. (2002), ROUGE-L Lin (2004), METEOR Banerjee and Lavie (2005), SPICE Anderson et al. (2016) as the automated metrics and human annotated accuracy to assess the explanations of latent variables. In Table 2, the explanations generated with the inductive bias prompts consistently outperform their counterparts without inductive bias prompts across all datasets and all metrics. Similar results can be found in Table 3 and Table 4. Their consistent results demonstrate that inductive bias is very important and necessary when explaining the latent representations of generative models. They also show that our framework can effectively design prompts for different inductive biases in generative models to improve the accuracy of latent explanations.

5.4 Qualitative Evaluation

To analyze the explanations for DDPM under disentanglement bias, we manipulate the latent representation along each latent direction and compare it with the one that first traverses along another latent direction and then traverses along the same latent direction. In Figure 3, for each latent direction, we pass two image sequences perturbed along that latent direction and the inductive bias prompt based on the disentanglement formula to MLLM to obtain the common latent explanation through the uncertainty quantification. In comparison, the explanations generated without the inductive bias prompt as shown in Figure 4 tend to show "no clear explanation" or explanations that align with one image sequence but do not reflect the common pattern in both image sequences. The addition of the inductive bias prompts can assist with ruling out the variation effects of other latent variables.

We also qualitatively evaluate the explanations for CSVAE under combination bias. We transverse each latent dimension and compare it with the one that first changes another latent variable in another group and then traverses the same latent variable, as well as the ones that first change another latent variable in the same group and then traverse the same latent variable. As Figure 10 depicts, our model can clearly show the explanations of latent variable z1,z2,z3subscript𝑧1subscript𝑧2subscript𝑧3z_{1},z_{2},z_{3}italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT as the color of the ground, the background color, and the shape of the object. The effect of removing inductive bias prompts is similar to the disentanglement bias.

In Figure 11, we provide the "young appearance" prompt to Stable Diffusion under conditional bias, and all three top latent directions are associated with the prompt. Compared with the one without the inductive bias prompts in Figure 12, the addition of inductive bias prompts can better identify the relation with the property of interest in the explanations as we can see the suggesting meaning with the property in addition to the description of the pattern in both v1subscript𝑣1v_{1}italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and v2subscript𝑣2v_{2}italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. More evaluation results can be found in Appendix C.

5.5 Ablation Study

To understand the impact of various components in our proposed LatentExplainer, we conducted a comprehensive ablation study across different models (DDPM, β𝛽\betaitalic_β-TCVAE, Stable Diffusion, and CSVAE) and inductive biases (disentanglement, combination, and conditional). Specifically, we evaluated the removal of inductive bias prompts (IB), uncertainty quantification (UQ), and both components. The results are summarized in Tables 5, Tables 6, and Tables 7. Removing inductive bias prompts leads to a substantial drop in all generative models, indicating the importance of inductive bias in generating high-quality explanations. Likewise, the removal of uncertainty quantification also results in decreased performance in all generative models. Overall, the ablation studies confirm the necessity of both inductive bias prompts and uncertainty quantification in our LatentExplainer framework, demonstrating their significant contributions to improving explanatory performance across various generative models.

6 Conclusion

In this paper, we introduced LatentExplainer, a framework designed to generate semantically meaningful explanations of latent variables in deep generative models. Our experiments quantitatively revealed that the inclusion of inductive bias prompts significantly enhances the quality of explanations generated by LatentExplainer. In qualitative evaluations, LatentExplainer provided clear explanations for the latent variables, effectively reflecting superior performance when inductive bias prompts were utilized. The ablation studies further confirmed the necessity of inductive bias prompts and uncertainty quantification in producing accurate and consistent explanations.

7 Limitation

In this work, we use GPT-4o as our MLLM agent to explain the latent representations. Although GPT-4o has improved vision capabilities across most tasks, we still find it incapable of identifying intricate facial features, positions, and orientations. It is necessary to enhance visual understanding to improve the explanation quality for MLLM.

References

  • Adadi and Berrada (2018) Amina Adadi and Mohammed Berrada. 2018. Peeking inside the black-box: a survey on explainable artificial intelligence (xai). IEEE access, 6:52138–52160.
  • An and Cho (2015) **won An and Sungzoon Cho. 2015. Variational autoencoder based anomaly detection using reconstruction probability. Special Lecture on IE, 2(1):1–18.
  • Anderson et al. (2016) Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould. 2016. Spice: Semantic propositional image caption evaluation. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V 14, pages 382–398. Springer.
  • Bai et al. (2022) Andrew Bai, Chih-Kuan Yeh, Pradeep Ravikumar, Neil YC Lin, and Cho-Jui Hsieh. 2022. Concept gradient: Concept-based interpretation without linear assumption. arXiv preprint arXiv:2208.14966.
  • Bai et al. (2024) Guangji Bai, Zheng Chai, Chen Ling, Shiyu Wang, Jiaying Lu, Nan Zhang, Tingwei Shi, Ziyang Yu, Mengdan Zhu, Yifei Zhang, et al. 2024. Beyond efficiency: A systematic survey of resource-efficient large language models. arXiv preprint arXiv:2401.00625.
  • Banerjee and Lavie (2005) Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 65–72, Ann Arbor, Michigan. Association for Computational Linguistics.
  • Brock et al. (2016) Andrew Brock, Theodore Lim, James M Ritchie, and Nick Weston. 2016. Neural photo editing with introspective adversarial networks. In International Conference on Learning Representations.
  • Burgess and Kim (2018) Chris Burgess and Hyunjik Kim. 2018. 3d shapes dataset. https://github.com/deepmind/3dshapes-dataset/.
  • Chang et al. (2023) Kai-Po Chang, Chi-Pin Huang, Wei-Yuan Cheng, Fu-En Yang, Chien-Yi Wang, Yung-Hsuan Lai, and Yu-Chiang Frank Wang. 2023. Rapper: Reinforced rationale-prompted paradigm for natural language explanation in visual question answering. In The Twelfth International Conference on Learning Representations.
  • Chen et al. (2018) Ricky TQ Chen, Xuechen Li, Roger B Grosse, and David K Duvenaud. 2018. Isolating sources of disentanglement in variational autoencoders. Advances in neural information processing systems, 31.
  • Chen et al. (2016) Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. 2016. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. arXiv preprint arXiv:1606.03657.
  • Choi et al. (2020) Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. 2020. Stargan v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8188–8197.
  • Ding et al. (2020) Zheng Ding, Yifan Xu, Weijian Xu, Gaurav Parmar, Yang Yang, Max Welling, and Zhuowen Tu. 2020. Guided variational autoencoder for disentanglement learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7920–7929.
  • Gao et al. (2024) Yuyang Gao, Siyi Gu, Junji Jiang, Sungsoo Ray Hong, Dazhou Yu, and Liang Zhao. 2024. Going beyond xai: A systematic survey for explanation-guided learning. ACM Computing Surveys, 56(7):1–39.
  • Guo et al. (2020) Xiaojie Guo, Yuanqi Du, and Liang Zhao. 2020. Property controllable variational autoencoder via invertible mutual dependence. In International Conference on Learning Representations.
  • Higgins et al. (2017) Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. 2017. beta-vae: Learning basic visual concepts with a constrained variational framework. International Conference on Learning Representations.
  • Ho et al. (2020) Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. arXiv preprint arXiv:2006.11239.
  • Ho et al. (2022) Jonathan Ho, Chitwan Saharia, William Chan, David J Fleet, Mohammad Norouzi, and Tim Salimans. 2022. Cascaded diffusion models for high fidelity image generation. Journal of Machine Learning Research, 23(47):1–33.
  • Karras et al. (2018) Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive growing of GANs for improved quality, stability, and variation. In International Conference on Learning Representations.
  • Kim et al. (2018) Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. 2018. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR.
  • Kingma et al. (2014) Diederik P Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and Max Welling. 2014. Semi-supervised learning with deep generative models. In Advances in Neural Information Processing Systems, pages 3581–3589.
  • Kingma and Welling (2013) Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
  • Klys et al. (2018) Jack Klys, Jake Snell, and Richard Zemel. 2018. Learning latent subspaces in variational autoencoders. Advances in neural information processing systems, 31.
  • Koh et al. (2020) Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. 2020. Concept bottleneck models. In International conference on machine learning, pages 5338–5348. PMLR.
  • Kong et al. (2020) Zhaojiang Kong, Wei **, Jianfu Huang, Kexin Zhao, and Bryan Catanzaro. 2020. Diffwave: A versatile diffusion model for audio synthesis. arXiv preprint arXiv:2009.09761.
  • Lin (2004) Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
  • Lin et al. (2023) Zhen Lin, Shubhendu Trivedi, and Jimeng Sun. 2023. Generating with confidence: Uncertainty quantification for black-box large language models. arXiv preprint arXiv:2305.19187.
  • Matthey et al. (2017) Loic Matthey, Irina Higgins, Demis Hassabis, and Alexander Lerchner. 2017. dsprites: Disentanglement testing sprites dataset. https://github.com/deepmind/dsprites-dataset/.
  • Nichol and Dhariwal (2021) Alexander Nichol and Prafulla Dhariwal. 2021. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741.
  • OpenAI (2024) OpenAI. 2024. Gpt-4o. https://openai.com/index/hello-gpt-4o/.
  • Papineni et al. (2002) Kishore Papineni, Salim Roukos, Todd Ward, and Wei-**g Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
  • Park et al. (2023) Yong-Hyun Park, Mingi Kwon, Jaewoong Choi, Junghyo Jo, and Youngjung Uh. 2023. Understanding the latent space of diffusion models through the lens of riemannian geometry. Advances in Neural Information Processing Systems, 36:24129–24142.
  • Poeta et al. (2023) Eleonora Poeta, Gabriele Ciravegna, Eliana Pastor, Tania Cerquitelli, and Elena Baralis. 2023. Concept-based explainable artificial intelligence: A survey. arXiv preprint arXiv:2312.12936.
  • Radford et al. (2021) Alec Radford, Jong Wook Kim, Aditya Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. International Conference on Machine Learning, pages 8748–8763.
  • Razavi et al. (2019) Ali Razavi, Aaron Van den Oord, and Oriol Vinyals. 2019. Generating diverse high-fidelity images with vq-vae-2. Advances in neural information processing systems, 32.
  • Rezende et al. (2014) Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic backpropagation and approximate inference in deep generative models. arXiv preprint arXiv:1401.4082.
  • Ridgeway (2016) Karl Ridgeway. 2016. A survey of inductive biases for factorial representation-learning. arXiv preprint arXiv:1612.05299.
  • Rombach et al. (2022) Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695.
  • Saleem et al. (2022) Rabia Saleem, Bo Yuan, Fatih Kurugollu, Ashiq Anjum, and Lu Liu. 2022. Explaining deep neural networks: A survey on the global interpretation methods. Neurocomputing, 513:165–180.
  • Sammani et al. (2022) Fawaz Sammani, Tanmoy Mukherjee, and Nikos Deligiannis. 2022. Nlx-gpt: A model for natural language explanations in vision and vision-language tasks. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8322–8332.
  • Team et al. (2023) Gemini Team, Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, et al. 2023. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805.
  • Vahdat et al. (2021) Arash Vahdat, Karsten Kreis, and Jan Kautz. 2021. Score-based generative modeling in latent space. Advances in neural information processing systems, 34:11287–11302.
  • Wang et al. (2024) Shiyu Wang, Yuanqi Du, Xiaojie Guo, Bo Pan, Zhaohui Qin, and Liang Zhao. 2024. Controllable data generation by deep learning: A review. ACM Computing Surveys, 56(9):1–38.
  • Wu et al. (2023) Qiucheng Wu, Yujian Liu, Handong Zhao, A**kya Kale, Trung Bui, Tong Yu, Zhe Lin, Yang Zhang, and Shiyu Chang. 2023. Uncovering the disentanglement capability in text-to-image diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1900–1910.
  • Xu et al. (2022) Mengzhao Xu, Lantao Zhang, Shitong Luo, Wei Liu, Pan Zhou, Heng Xie, and Jian Wu. 2022. Geodiff: A geometric diffusion model for molecular conformation generation. In International Conference on Learning Representations.
  • Yan et al. (2016) Xinchen Yan, Jimei Yang, Kihyuk Sohn, and Honglak Lee. 2016. Attribute2image: Conditional image generation from visual attributes. In European Conference on Computer Vision, pages 776–791.
  • Yang et al. (2023) Ruihan Yang, Prakhar Srivastava, and Stephan Mandt. 2023. Diffusion probabilistic modeling for video generation. Entropy, 25(10):1469.
  • Yin et al. (2023) Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu, and Enhong Chen. 2023. A survey on multimodal large language models. arXiv preprint arXiv:2306.13549.
  • Yu et al. (2015) Fisher Yu, Yinda Zhang, Shuran Song, Ari Seff, and Jianxiong Xiao. 2015. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365.
  • Zhao et al. (2024) Qilong Zhao, Yifei Zhang, Mengdan Zhu, Siyi Gu, Yuyang Gao, Xiaofeng Yang, and Liang Zhao. 2024. Due: Dynamic uncertainty-aware explanation supervision via 3d imputation. arXiv preprint arXiv:2403.10831.
  • Zhu et al. (2016) Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, and Alexei A Efros. 2016. Generative visual manipulation on the natural image manifold. In European Conference on Computer Vision, pages 597–613.
  • Zhu et al. (2024) Mengdan Zhu, Zhenke Liu, Bo Pan, Abhinav Angirekula, and Liang Zhao. 2024. Explaining latent representations of generative models with large multimodal models. arXiv preprint arXiv:2402.01858.

Appendix A Human Annotaions

The ground-truth annotations of explanations are performed by four different annotators from the United States and China and the results are then aggregated to calculate the automated evaluation metrics. To obtain the accuracy, the annotators have to select one out of four choices(yes, weak yes, weak no, and no) as a response for whether the explanation justifies the pattern of change in the image sequence. The four levels of rating are then mapped to the scores of 1, 2/3, 1/3, and 0 respectively. The scores are then averaged among data samples in the datasets to have the accuracy values, following the evaluation process in  Sammani et al. (2022); Chang et al. (2023).

Appendix B Selection of Threshold of the Certainty Scores

We follow our previous work Zhu et al. (2024) to find the threshold, we denote the true label of the interpretability of each latent variable as 1 if at least two annotators out of three can see a clear pattern in the generated images, otherwise we denote it as 0. The certainty score of each response is its predicted probability. We can then compute its AUC (area under the ROC curve) based on different thresholds and choose the one with the highest AUC as our threshold ε𝜀\varepsilonitalic_ε. Our final output 𝒯𝒯\mathcal{T}caligraphic_T is the response that has the highest mean pairwise cosine similarity with other responses if the certainty score is equal or greater than the threshold ε𝜀\varepsilonitalic_ε. Otherwise, 𝒯𝒯\mathcal{T}caligraphic_T will be "no clear explanation".

Appendix C More Qualitative Evaluation Results

We provide more qualitative evaluation results: (1) for the disentanglement bias in Figure 5, and Figure 6; (2) for the combination bias in Figure 7; (3) for the conditional bias in Figure 8 and Figure 9.

Refer to caption
Figure 5: Example of generated explanations for DDPM under the disentanglement bias.
Refer to caption
Figure 6: Example of generated explanations for β𝛽\betaitalic_β-TCVAE under the disentanglement bias.
Refer to caption
Figure 7: Example of generated explanations for CSVAE under the combination bias.
Refer to caption
Figure 8: Example of generated explanations for Stable Diffusion under conditional bias.
Refer to caption
Figure 9: Example of generated explanations for CSVAE under conditional bias.
Refer to caption
Figure 10: Visualization of the generated explanations with the inductive bias prompt for the combination bias.
Refer to caption
Figure 11: Visualization of the generated explanations with the inductive bias prompt for the conditional bias.
Refer to caption
Figure 12: Visualization of the generated explanations without the inductive bias prompt for the conditional bias.

Appendix D Visualization of the Generated Explanations

In this section, we provide visualization examples of the explanations generated by our proposed LatentExplainer framework under different inductive biases: combination bias with the inductive bias prompt in Figure 10, conditional bias with the inductive bias prompt in Figure 11 and without the inductive bias prompt Figure 12. Each figure demonstrates the patterns of change in generated image sequences along specific latent directions, accompanied by the corresponding explanations.