License: CC BY 4.0
arXiv:2401.13171v2 [cs.LG] 11 Mar 2024

Compositional Generative Inverse Design

Tailin Wu11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPTnormal-†{}^{\>\>\dagger}start_FLOATSUPERSCRIPT † end_FLOATSUPERSCRIPT, Takashi Maruyama2*2{}^{2*}start_FLOATSUPERSCRIPT 2 * end_FLOATSUPERSCRIPT, Long Wei1*1{}^{1*}start_FLOATSUPERSCRIPT 1 * end_FLOATSUPERSCRIPT, Tao Zhang1*1{}^{1*}start_FLOATSUPERSCRIPT 1 * end_FLOATSUPERSCRIPT, Yilun Du3*3{}^{3*}start_FLOATSUPERSCRIPT 3 * end_FLOATSUPERSCRIPT,
Gianluca Iaccarino44{}^{4}start_FLOATSUPERSCRIPT 4 end_FLOATSUPERSCRIPT, Jure Leskovec55{}^{5}start_FLOATSUPERSCRIPT 5 end_FLOATSUPERSCRIPT
11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT
Dept. of Engineering, Westlake University, 22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTNEC Laboratories Europe,
33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTDept. of Computer Science, MIT, 44{}^{4}start_FLOATSUPERSCRIPT 4 end_FLOATSUPERSCRIPTDept. of Mechanical Engineering, Stanford University,
55{}^{5}start_FLOATSUPERSCRIPT 5 end_FLOATSUPERSCRIPTDept. of Computer Science, Stanford University
[email protected], [email protected],
[email protected],[email protected],
[email protected]
[email protected], [email protected]
Equal contribution. {}^{\dagger}start_FLOATSUPERSCRIPT † end_FLOATSUPERSCRIPTCorresponding author.
Abstract

Inverse design, where we seek to design input variables in order to optimize an underlying objective function, is an important problem that arises across fields such as mechanical engineering to aerospace engineering. Inverse design is typically formulated as an optimization problem, with recent works leveraging optimization across learned dynamics models. However, as models are optimized they tend to fall into adversarial modes, preventing effective sampling. We illustrate that by instead optimizing over the learned energy function captured by the diffusion model, we can avoid such adversarial examples and significantly improve design performance. We further illustrate how such a design system is compositional, enabling us to combine multiple different diffusion models representing subcomponents of our desired system to design systems with every specified component. In an N-body interaction task and a challenging 2D multi-airfoil design task, we demonstrate that by composing the learned diffusion model at test time, our method allows us to design initial states and boundary shapes that are more complex than those in the training data. Our method generalizes to more objects for N-body dataset and discovers formation flying to minimize drag in the multi-airfoil design task. Project website and code can be found at https://github.com/AI4Science-WestlakeU/cindm.

1 Introduction

The problem of inverse design – finding a set of high-dimensional design parameters (e.g., boundary and initial conditions) for a system to optimize a set of specified objectives and constraints, occurs across many engineering domains such as mechanical, materials, and aerospace engineering, with important applications such as jet engine design (Athanasopoulos et al., 2009), nanophotonic design (Molesky et al., 2018), shape design for underwater robots (Saghafi & Lavimi, 2020), and battery design (Bhowmik et al., 2019). Such inverse design problems are extremely challenging since they typically involve simulating the full trajectory of complicated physical dynamics as an inner loop, have high-dimensional design space, and require out-of-distribution test-time generalization.

Recent deep learning has made promising progress for inverse design. A notable work is by Allen et al. (2022), which addresses inverse design by first learning a neural surrogate model to approximate the forward physical dynamics, and then performing backpropagation through the full simulation trajectory to optimize the design parameters such as the boundary shape. Compared with standard sampling-based optimization methods with classical simulators, it shows comparable and sometimes better performance, establishing deep learning as a viable technique for inverse design.

However, an underlying issue with backpropagation with surrogate models is over-optimization – as learned models have adversarial minima, excessive optimization with respect to a learned forward model leads to adversarial design parameters which lead to poor performance (Zhao et al., 2022). A root cause of this is that the forward model does not have a measure of data likelihood and does not know which design parameters are in or out of the training distribution it has seen, allowing optimization to easily fall out-of-distribution of the design parameters seen during training.

Refer to caption
Figure 1: CinDM schematic. By composing generative models specified over subsets of inputs, we present an approach which design materials significantly more complex than those seen at training.

To address this issue, we view the inverse design problem from an energy optimization perspective, where constraints of the simulation model are implicitly captured through the generative energy function of a diffusion model trained with design parameters and simulator outputs. Designing parameters subject to constraints corresponds to optimizing for design parameters that minimize the energy of both the generative energy function and associated design objective functions. The generative energy function prevents design parameters from deviating and falling out of distribution.

An essential aspect of inverse design is the ability to further construct new structures subjects to different constraints at test-time. By formulating inverse design as optimizing generative energy function trained on existing designs, a naïve issue is that it constrains design parameters to be roughly those seen in the training data. We circumvent this issue by using a set of generative energy functions, where each generative model captures a subset of design parameters governing the system. Each individual generative energy function ensures that designs do not locally fall out of distribution, with their composition ensuring that inferred design parameters are roughly “locally” in distribution. Simultaneously, designs from this compositional set of generative energy functions may be significantly different from the training data, as designs are not constrained to globally follow the observed data  (Liu et al., 2022; Du et al., 2023), achieving compositional generalization in design.

We illustrate the promise of using such compositional energy functions across a variety of different settings. We illustrate that temporally composing multiple compositional energy functions, we may design sequences of outputs that are significantly longer than the ones seen in training. Similarly, we can design systems with many more objects and more complex shapes than those seen in training.

Concretely, we contribute the following: (1) We propose a novel formulation for inverse design as an energy optimization problem. (2) We introduce Compositional Inverse Design with Diffusion Models (CinDM) method, which enables us to generalize to out-of-distribution and more complex design inputs than seen in training. (3) We present a set of benchmarks for inverse design in 1D and 2D. Our method generalizes to more objects for N-body dataset and discovers formation flying to minimize drag in the multi-airfoil design task.

2 Related Work

Inverse Design. Inverse design plays a key role across science and engineering, including mechanical engineering (Coros et al., 2013), materials science (Dijkstra & Luijten, 2021), nanophotonics (Molesky et al., 2018), robotics (Saghafi & Lavimi, 2020), chemical engineering (Bhowmik et al., 2019), and aerospace engineering (Athanasopoulos et al., 2009; Anderson & Venkatakrishnan, 1999). Classical methods to address inverse design rely on slow classical solvers. They are accurate but are prohibitively inefficient (e.g., sampling-based methods like CEM (Rubinstein & Kroese, 2004)). Recently, deep learning-based inverse design has made promising progress. Allen et al. (2022) introduced backpropagation through the full trajectory with surrogate models. Wu et al. (2022a) introduced backpropagation through latent dynamics to improve efficiency and accuracy. For Stokes systems, Du et al. (2020a) introduced an inverse design method under different types of boundary conditions. While the above methods typically rely on learning a surrogate model for the dynamics and use it as an inner loop during inverse design, we introduce a novel generative perspective that learns an energy function for the joint variable of trajectory and boundary. This brings the important benefit of out-of-distribution generalization and compositionality. Ren et al. (2020); Trabucco et al. (2021); Ansari et al. (2022); Chen et al. (2023) benchmarked varieties of deep learning-based methods in a wide range of inverse design tasks.

Compositional Models. A large body of recent work has explored how multiple different instances of generative models can be compositionally combined for applications such as 2D images synthesis (Du et al., 2020b; Liu et al., 2021; Nie et al., 2021; Liu et al., 2022; Wu et al., 2022b; Du et al., 2023; Wang et al., 2023), 3D synthesis (Po & Wetzstein, 2023), video synthesis (Yang et al., 2023a), trajectory planning (Du et al., 2019; Urain et al., 2021; Gkanatsios et al., 2023; Yang et al., 2023b), multimodal perception (Li et al., 2022) and hierarchical decision making (Ajay et al., 2023). Technically, product of experts is an effective kind of approaches to combine the predictive distributions of local experts (Hinton, 2002; Cohen et al., 2020; Gordon et al., 2023; Tautvaišas & Žilinskas, 2023) . To the best of our knowledge, we are the first to introduce a compositional generative perspective and method to inverse design, and show how compositional models can enable us to generalize to design spaces that are much more complex than seen at training time.

3 Method

In this section, we detail our method of Compositional INverse design with Diffusion Models (CinDM). We first introduce the problem setup in Section 3.1. In Section 3.2, we introduce generative inverse design, a novel generative paradigm for solving the inverse design problem. In Section 3.3, we detail how our method allows for test-time composition of the design variables.

3.1 Problem setup

We formalize the inverse design problem using a similar setup as in Zhang et al. (2023). Concretely, let u(x,t;γ)𝑢𝑥𝑡𝛾u(x,t;\gamma)italic_u ( italic_x , italic_t ; italic_γ ) be the state of a dynamical system at time t𝑡titalic_t and location x𝑥xitalic_x where the dynamics is described by a partial differential equation (PDE) or an ordinary differential equation (ODE).111In the case of ODE, the position x𝑥xitalic_x is neglected and the trajectory is u(t;γ)𝑢𝑡𝛾u(t;\gamma)italic_u ( italic_t ; italic_γ ), where γ𝛾\gammaitalic_γ only includes the initial state u0subscript𝑢0u_{0}italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. For more background information about PDEs, see Brandstetter et al. (2022). Here γ=(u0,)Γ𝛾subscript𝑢0Γ\gamma=(u_{0},\mathcal{B})\in\Gammaitalic_γ = ( italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , caligraphic_B ) ∈ roman_Γ consists of the initial state u0subscript𝑢0u_{0}italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and boundary condition \mathcal{B}caligraphic_B, ΓΓ\Gammaroman_Γ is the design space, and we will call γ𝛾\gammaitalic_γ “boundary” for simplicity222Since \mathcal{B}caligraphic_B is the boundary in space and the initial state u0subscript𝑢0u_{0}italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT can be seen as the “boundary” in time.. Given a PDE or ODE, a specific γ𝛾\gammaitalic_γ can uniquely determine a specific trajectory u[0,T](γ):={u(x,t;γ)|t[0,T]}assignsubscript𝑢0𝑇𝛾conditional-set𝑢𝑥𝑡𝛾𝑡0𝑇u_{[0,T]}(\gamma):=\{u(x,t;\gamma)|t\in[0,T]\}italic_u start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT ( italic_γ ) := { italic_u ( italic_x , italic_t ; italic_γ ) | italic_t ∈ [ 0 , italic_T ] }, where we have written the dependence of u[0,T]subscript𝑢0𝑇u_{[0,T]}italic_u start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT on γ𝛾\gammaitalic_γ explicitely. Let 𝒥𝒥\mathcal{J}caligraphic_J be the design objective which evaluates the quality of the design. Typically 𝒥𝒥\mathcal{J}caligraphic_J is a function of a subset of the trajectory u[0,T]subscript𝑢0𝑇u_{[0,T]}italic_u start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT and γ𝛾\gammaitalic_γ (esp. the boundary shape). The inverse design problem is to find an optimized design γ^^𝛾\hat{\gamma}over^ start_ARG italic_γ end_ARG which minimizes the design objective 𝒥𝒥\mathcal{J}caligraphic_J:

γ^=argminγ𝒥(u[0,T](γ),γ)^𝛾subscriptargmin𝛾𝒥subscript𝑢0𝑇𝛾𝛾\hat{\gamma}=\operatorname*{arg\,min}_{\gamma}\mathcal{J}(u_{[0,T]}(\gamma),\gamma)over^ start_ARG italic_γ end_ARG = start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT caligraphic_J ( italic_u start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT ( italic_γ ) , italic_γ ) (1)

We see that 𝒥𝒥\mathcal{J}caligraphic_J depends on γ𝛾\gammaitalic_γ through two routes. On the one hand, γ𝛾\gammaitalic_γ influences the future trajectory of the dynamical system, which 𝒥𝒥\mathcal{J}caligraphic_J evaluates on. On the other hand, γ𝛾\gammaitalic_γ can directly influence 𝒥𝒥\mathcal{J}caligraphic_J at future times, since the design objective may be directly dependent on the boundary shape.

Typically, we don’t have access to the ground-truth model for the dynamical system, but instead only observe the trajectories u[0,T](γ)subscript𝑢0𝑇𝛾u_{[0,T]}(\gamma)italic_u start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT ( italic_γ ) at discrete time steps and locations and a limited diversity of boundaries γΓ𝛾Γ\gamma\in\Gammaitalic_γ ∈ roman_Γ. We denote the above discrete version of the trajectory as U[0,T](γ)=(U0,U1,,UT)subscript𝑈0𝑇𝛾subscript𝑈0subscript𝑈1subscript𝑈𝑇U_{[0,T]}(\gamma)=(U_{0},U_{1},...,U_{T})italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT ( italic_γ ) = ( italic_U start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_U start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) across time steps t=0,1,T𝑡01𝑇t=0,1,...Titalic_t = 0 , 1 , … italic_T. Given the observed trajectories U[0,T](γ),γΓsubscript𝑈0𝑇𝛾𝛾ΓU_{[0,T]}(\gamma),\gamma\in\Gammaitalic_U start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT ( italic_γ ) , italic_γ ∈ roman_Γ, a straightforward method for inverse design is to use such observed trajectories to train a neural surrogate model fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT for forward modeling, so the trajectory can be autoregressively simulated by fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT:

U^t(γ)=fθ(U^t1(γ),γ),U^0:=U0,γ=(U0,),formulae-sequencesubscript^𝑈𝑡𝛾subscript𝑓𝜃subscript^𝑈𝑡1𝛾𝛾formulae-sequenceassignsubscript^𝑈0subscript𝑈0𝛾subscript𝑈0\hat{U}_{t}(\gamma)=f_{\theta}(\hat{U}_{t-1}(\gamma),\gamma),\quad\hat{U}_{0}:% =U_{0},\ \gamma=(U_{0},\mathcal{B}),over^ start_ARG italic_U end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_γ ) = italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( over^ start_ARG italic_U end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( italic_γ ) , italic_γ ) , over^ start_ARG italic_U end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT := italic_U start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_γ = ( italic_U start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , caligraphic_B ) , (2)

Here we use U^tsubscript^𝑈𝑡\hat{U}_{t}over^ start_ARG italic_U end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to represent the prediction by fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, to differentiate from the actual observed state Utsubscript𝑈𝑡U_{t}italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. In the test time, the goal is to optimize 𝒥(U^[0,T](γ),γ)𝒥subscript^𝑈0𝑇𝛾𝛾\mathcal{J}(\hat{U}_{[0,T]}(\gamma),\gamma)caligraphic_J ( over^ start_ARG italic_U end_ARG start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT ( italic_γ ) , italic_γ ) w.r.t. γ𝛾\gammaitalic_γ, which includes the autoregressive rollout with fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT as an inner loop, as introduced in Allen et al. (2022). In general inverse design, the trajectory length T𝑇Titalic_T, state dimension dim(U[0,T](γ))dimsubscript𝑈0𝑇𝛾\text{dim}(U_{[0,T]}(\gamma))dim ( italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT ( italic_γ ) ), and complexity of γ𝛾\gammaitalic_γ may be much larger than in training, requiring significant out-of-distribution generalization.

3.2 Generative Inverse Design

Directly optimizing Eq. 1 with respect to γ𝛾\gammaitalic_γ using a learned surrogate model fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT is often problematic as the optimization procedure on γ𝛾\gammaitalic_γ often leads a set of U[0,T]subscript𝑈0𝑇U_{[0,T]}italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT that is out-of-distribution or adversarial to the surrogate model fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, leading to poor performance, as observed in Zhao et al. (2022). A major cause of this is that fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT does not have an inherent measure of uncertainty, and cannot prevent optimization from entering a design spaces γ𝛾\gammaitalic_γ that the model cannot guarantee its performance in.

To circumvent this issue, we propose a generative perspective to inverse design: during the inverse design process, we jointly optimize for both the design objective 𝒥𝒥\mathcal{J}caligraphic_J and a generative objective Eθsubscript𝐸𝜃E_{\theta}italic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT,

γ^=argminγ,U[0,T][Eθ(U[0,T],γ)+λ𝒥(U[0,T],γ)],^𝛾subscriptargmin𝛾subscript𝑈0𝑇subscript𝐸𝜃subscript𝑈0𝑇𝛾𝜆𝒥subscript𝑈0𝑇𝛾\hat{\gamma}=\operatorname*{arg\,min}_{\gamma,U_{[0,T]}}\left[E_{\theta}(U_{[0% ,T]},\gamma)+\lambda\cdot\mathcal{J}(U_{[0,T]},\gamma)\right],over^ start_ARG italic_γ end_ARG = start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT italic_γ , italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT , italic_γ ) + italic_λ ⋅ caligraphic_J ( italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT , italic_γ ) ] , (3)

where Eθsubscript𝐸𝜃E_{\theta}italic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT is an energy-based model (EBM) p(U[0,T],γ)eEθ(U[0,T],γ)proportional-to𝑝subscript𝑈0𝑇𝛾superscript𝑒subscript𝐸𝜃subscript𝑈0𝑇𝛾p(U_{[0,T]},\gamma)\propto e^{-E_{\theta}(U_{[0,T]},\gamma)}italic_p ( italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT , italic_γ ) ∝ italic_e start_POSTSUPERSCRIPT - italic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT , italic_γ ) end_POSTSUPERSCRIPT (LeCun et al., 2006; Du & Mordatch, 2019) trained over the joint distribution of trajectories U[0,T]subscript𝑈0𝑇U_{[0,T]}italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT and boundaries γ𝛾\gammaitalic_γ, and λ𝜆\lambdaitalic_λ is a hyperparameter. Both U[0,T]subscript𝑈0𝑇U_{[0,T]}italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT and γ𝛾\gammaitalic_γ are jointly optimized, and the energy function Eθsubscript𝐸𝜃E_{\theta}italic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT is minimized when both U[0,T]subscript𝑈0𝑇U_{[0,T]}italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT and γ𝛾\gammaitalic_γ are consistent with each other and serves the purpose of a surrogate model fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT in approximating simulator dynamics. The joint optimization optimizes all the steps of the trajectory U[0,T]subscript𝑈0𝑇U_{[0,T]}italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT and the boundary γ𝛾\gammaitalic_γ simultaneously, which also gets rid of the time-consuming autoregressive rollout as an inner loop as in Allen et al. (2022), significantly improving inference efficiency. In addition to approximating simulator dynamics, the generative objective also serves as a measure of uncertainty. Essentially, the Eθsubscript𝐸𝜃E_{\theta}italic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT in Eq. 3 encourages the trajectory U[0,T]subscript𝑈0𝑇U_{[0,T]}italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT and boundary γ𝛾\gammaitalic_γ to be physically consistent, and the 𝒥𝒥\mathcal{J}caligraphic_J encourages them to optimize the design objective.

To train Eθsubscript𝐸𝜃E_{\theta}italic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, we use a diffusion objective, where we learn a denoising network ϵθsubscriptitalic-ϵ𝜃\epsilon_{\theta}italic_ϵ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT that learns to denoise all variables in design optimization z=U[0,T]γ𝑧subscript𝑈0𝑇direct-sum𝛾z=U_{[0,T]}\bigoplus\gammaitalic_z = italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT ⨁ italic_γ supervised with the training loss

MSE=ϵϵθ(1βsz+βsϵ,s)22,ϵ𝒩(0,I).formulae-sequencesubscriptMSEsuperscriptsubscriptnormitalic-ϵsubscriptitalic-ϵ𝜃1subscript𝛽𝑠𝑧subscript𝛽𝑠italic-ϵ𝑠22similar-toitalic-ϵ𝒩0𝐼\mathcal{L}_{\text{MSE}}=\|\mathbf{\epsilon}-\epsilon_{\theta}(\sqrt{1-\beta_{% s}}z+\sqrt{\beta_{s}}\mathbf{\epsilon},s)\|_{2}^{2},\ \ \ \epsilon\sim\mathcal% {N}(0,I).caligraphic_L start_POSTSUBSCRIPT MSE end_POSTSUBSCRIPT = ∥ italic_ϵ - italic_ϵ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( square-root start_ARG 1 - italic_β start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_ARG italic_z + square-root start_ARG italic_β start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_ARG italic_ϵ , italic_s ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_ϵ ∼ caligraphic_N ( 0 , italic_I ) . (4)

As discussed in Liu et al. (2022), the denoising network ϵθsubscriptitalic-ϵ𝜃\epsilon_{\theta}italic_ϵ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT corresponds to the gradient of a EBM zEθ(z)subscript𝑧subscript𝐸𝜃𝑧\nabla_{z}E_{\theta}(z)∇ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT italic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_z ), that represents the distribution over all optimization variables p(z)eEθ(z)proportional-to𝑝𝑧superscript𝑒subscript𝐸𝜃𝑧p(z)\propto e^{-E_{\theta}(z)}italic_p ( italic_z ) ∝ italic_e start_POSTSUPERSCRIPT - italic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_z ) end_POSTSUPERSCRIPT. To optimize Eq. 3 using a Langevin sampling procedure, we can initialize an optimization variable zSsubscript𝑧𝑆z_{S}italic_z start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT from Gaussian noise 𝒩(0,I)𝒩0𝐼\mathcal{N}(0,I)caligraphic_N ( 0 , italic_I ), and iteratively run

zs1=zsη(z(Eθ(zs)+λ𝒥(zs)))+ξ,ξ𝒩(0,σs2I),z_{s-1}=z_{s}-\eta\left(\nabla_{z}(E_{\theta}(z_{s})+\lambda\,\mathcal{J}(z_{s% }))\right)+\xi,\quad\xi\sim\mathcal{N}\bigl{(}0,\sigma^{2}_{s}I\bigl{)},italic_z start_POSTSUBSCRIPT italic_s - 1 end_POSTSUBSCRIPT = italic_z start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT - italic_η ( ∇ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) + italic_λ caligraphic_J ( italic_z start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) ) ) + italic_ξ , italic_ξ ∼ caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_I ) , (5)

for s=S,S1,,1𝑠𝑆𝑆11s=S,S-1,...,1italic_s = italic_S , italic_S - 1 , … , 1. This procedure is implemented with diffusion models by optimizing333There is also an additional scaling term applied on the sample zssubscript𝑧𝑠z_{s}italic_z start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT during the diffusion sampling procedure, which we omit below for clarity but also implement in practice.

zs1=zsη(ϵθ(zs,s)+λz𝒥(zs))+ξ,ξ𝒩(0,σs2I),z_{s-1}=z_{s}-\eta\left(\epsilon_{\theta}(z_{s},s)+\lambda\nabla_{z}\mathcal{J% }(z_{s})\right)+\xi,\quad\xi\sim\mathcal{N}\bigl{(}0,\sigma^{2}_{s}I\bigl{)},italic_z start_POSTSUBSCRIPT italic_s - 1 end_POSTSUBSCRIPT = italic_z start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT - italic_η ( italic_ϵ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , italic_s ) + italic_λ ∇ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT caligraphic_J ( italic_z start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) ) + italic_ξ , italic_ξ ∼ caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_I ) , (6)

where σs2subscriptsuperscript𝜎2𝑠\sigma^{2}_{s}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT and η𝜂\etaitalic_η correspond to a set of different noise schedules and scaling factors used in the diffusion process. To further improve the performance, we run additional steps of Langevin dynamics optimization at a given noise level following Du et al. (2023).

Intuitively, the above diffusion procedure starts from a random variable zS=(U[0,T],SγS)𝒩(0,I)subscript𝑧𝑆subscript𝑈0𝑇𝑆direct-sumsubscript𝛾𝑆similar-to𝒩0𝐼z_{S}=(U_{[0,T],S}\bigoplus\gamma_{S})\sim\mathcal{N}(0,I)italic_z start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = ( italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] , italic_S end_POSTSUBSCRIPT ⨁ italic_γ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) ∼ caligraphic_N ( 0 , italic_I ), follows the denoising network ϵθ(zs,s)subscriptitalic-ϵ𝜃subscript𝑧𝑠𝑠\epsilon_{\theta}(z_{s},s)italic_ϵ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , italic_s ) and the gradient z𝒥(zs)subscript𝑧𝒥subscript𝑧𝑠\nabla_{z}\mathcal{J}(z_{s})∇ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT caligraphic_J ( italic_z start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ), and step-by-step arrives at a final z0=U[0,T],0γ0subscript𝑧0subscript𝑈0𝑇0direct-sumsubscript𝛾0z_{0}=U_{[0,T],0}\bigoplus\gamma_{0}italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] , 0 end_POSTSUBSCRIPT ⨁ italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT that approximately minimizes the objective in Eq. 3.

Algorithm 1 Algorithm for Compositional Inverse Design with Diffusion Models (CinDM)
1:  Require Compositional set of diffusion models ϵθi(zs,s),i=1,2,Nformulae-sequencesubscriptsuperscriptitalic-ϵ𝑖𝜃subscript𝑧𝑠𝑠𝑖12𝑁\epsilon^{i}_{\theta}(z_{s},s),i=1,2,...Nitalic_ϵ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , italic_s ) , italic_i = 1 , 2 , … italic_N, design objective 𝒥()𝒥\mathcal{J}(\cdot)caligraphic_J ( ⋅ ), covariance matrix σs2Isuperscriptsubscript𝜎𝑠2𝐼\sigma_{s}^{2}Iitalic_σ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I, hyperparameters λ,S,K𝜆𝑆𝐾\lambda,S,Kitalic_λ , italic_S , italic_K
2:  Initialize optimization variables zS𝒩(𝟎,𝑰)similar-tosubscript𝑧𝑆𝒩0𝑰z_{S}\sim\mathcal{N}(\bm{0},\bm{I})italic_z start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ∼ caligraphic_N ( bold_0 , bold_italic_I ) // optimize across diffusion steps S𝑆Sitalic_S:
3:  for s=S,,1𝑠𝑆1s=S,\ldots,1italic_s = italic_S , … , 1 do
4:     // optimize K𝐾Kitalic_K steps of Langevin sampling at diffusion step s𝑠sitalic_s:
5:     for k=1,,K𝑘1𝐾k=1,\ldots,Kitalic_k = 1 , … , italic_K do
6:        ξ𝒩(0,σs2I)\xi\sim\mathcal{N}\bigl{(}0,\sigma^{2}_{s}I\bigl{)}italic_ξ ∼ caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_I )
7:        // run a single Langevin sampling steps:
8:        zszsη1Ni=1N(ϵθi(zsi,s)+λz𝒥(zs))+ξsubscript𝑧𝑠subscript𝑧𝑠𝜂1𝑁superscriptsubscript𝑖1𝑁subscriptsuperscriptitalic-ϵ𝑖𝜃superscriptsubscript𝑧𝑠𝑖𝑠𝜆subscript𝑧𝒥subscript𝑧𝑠𝜉z_{s}\leftarrow z_{s}-\eta\frac{1}{N}\sum_{i=1}^{N}\left(\epsilon^{i}_{\theta}% (z_{s}^{i},s)+\lambda\nabla_{z}\mathcal{J}(z_{s})\right)+\xiitalic_z start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ← italic_z start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT - italic_η divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( italic_ϵ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_s ) + italic_λ ∇ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT caligraphic_J ( italic_z start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) ) + italic_ξ
9:     end for
10:     ξ𝒩(0,σs2I)\xi\sim\mathcal{N}\bigl{(}0,\sigma^{2}_{s}I\bigl{)}italic_ξ ∼ caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_I )
11:     // scale sample to transition to next diffusion step:
12:     zs1zsη1Ni=1N(ϵθi(zsi,s)+λz𝒥(zs))+ξsubscript𝑧𝑠1subscript𝑧𝑠𝜂1𝑁superscriptsubscript𝑖1𝑁subscriptsuperscriptitalic-ϵ𝑖𝜃superscriptsubscript𝑧𝑠𝑖𝑠𝜆subscript𝑧𝒥subscript𝑧𝑠𝜉z_{s-1}\leftarrow z_{s}-\eta\frac{1}{N}\sum_{i=1}^{N}\left(\epsilon^{i}_{% \theta}(z_{s}^{i},s)+\lambda\nabla_{z}\mathcal{J}(z_{s})\right)+\xiitalic_z start_POSTSUBSCRIPT italic_s - 1 end_POSTSUBSCRIPT ← italic_z start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT - italic_η divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( italic_ϵ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_s ) + italic_λ ∇ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT caligraphic_J ( italic_z start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) ) + italic_ξ        
13:  end for
14:  γ,U[0,T]=z0𝛾subscript𝑈0𝑇subscript𝑧0\gamma,U_{[0,T]}=z_{0}italic_γ , italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT = italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT
15:  return  γ𝛾\gammaitalic_γ

3.3 Compositional Generative Inverse Design

A key challenge in inverse design is that the boundary γ𝛾\gammaitalic_γ or the trajectory U[0,T]subscript𝑈0𝑇U_{[0,T]}italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT can be substantially different than seen during training. To enable generalization across such design variables, we propose to compositionally represent the design variable z=U[0,T]γ𝑧subscript𝑈0𝑇direct-sum𝛾z=U_{[0,T]}\bigoplus\gammaitalic_z = italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT ⨁ italic_γ, using a composition of different energy functions Eθsubscript𝐸𝜃E_{\theta}italic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT (Du et al., 2020b) on subsets of the design variable zizsubscript𝑧𝑖𝑧z_{i}\subset zitalic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊂ italic_z. Each of the above Eθsubscript𝐸𝜃E_{\theta}italic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT on the subset of design variable zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT provides a physical consistency constraint on zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, encouraging each zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to be physically consistent internally. Also we make sure that different zi,i=1,2,Nformulae-sequencesubscript𝑧𝑖𝑖12𝑁z_{i},i=1,2,...Nitalic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i = 1 , 2 , … italic_N overlap with each other, and overall covers z𝑧zitalic_z (See Fig. 1), so that the full z𝑧zitalic_z is physically consistent. Thus, test-time compositions of energy functions defined over subsets of the design variable zizsubscript𝑧𝑖𝑧z_{i}\subset zitalic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊂ italic_z can then be composed together to generalize to new design variable z𝑧zitalic_z values that substantially different than those seen during training, but exploiting shared local structure in z𝑧zitalic_z.

Below, we illustrate three different ways compositional inverse design can enable to generalize to design variables z𝑧zitalic_z that are much more complex than the ones seen during training.

I. Generalization to more time steps. In the test time, the trajectory length T𝑇Titalic_T may be much longer than the trajectory length Ttrsuperscript𝑇trT^{\text{tr}}italic_T start_POSTSUPERSCRIPT tr end_POSTSUPERSCRIPT seen in training. To allow generalization over a longer trajectory length, the energy function over the design variable can be written in terms of a composition of N𝑁Nitalic_N energy functions over subsets of trajectories with overlap** states:

Eθ(U[0,T],γ)=i=1NEθ(U[(i1)tq,itq+Ttr],γ).subscript𝐸𝜃subscript𝑈0𝑇𝛾superscriptsubscript𝑖1𝑁subscript𝐸𝜃subscript𝑈𝑖1subscript𝑡𝑞𝑖subscript𝑡𝑞superscript𝑇tr𝛾E_{\theta}(U_{[0,T]},\gamma)=\sum_{i=1}^{N}E_{\theta}(U_{[(i-1)\cdot t_{q},i% \cdot t_{q}+T^{\text{tr}}]},\gamma).italic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT , italic_γ ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_U start_POSTSUBSCRIPT [ ( italic_i - 1 ) ⋅ italic_t start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT , italic_i ⋅ italic_t start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT + italic_T start_POSTSUPERSCRIPT tr end_POSTSUPERSCRIPT ] end_POSTSUBSCRIPT , italic_γ ) . (7)

Here zi:=U[(i1)tq,itq+Ttr]γassignsubscript𝑧𝑖subscript𝑈𝑖1subscript𝑡𝑞𝑖subscript𝑡𝑞superscript𝑇trdirect-sum𝛾z_{i}:=U_{[(i-1)\cdot t_{q},i\cdot t_{q}+T^{\text{tr}}]}\bigoplus\gammaitalic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := italic_U start_POSTSUBSCRIPT [ ( italic_i - 1 ) ⋅ italic_t start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT , italic_i ⋅ italic_t start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT + italic_T start_POSTSUPERSCRIPT tr end_POSTSUPERSCRIPT ] end_POSTSUBSCRIPT ⨁ italic_γ is a subset of the design variable z:=U[0,T]γassign𝑧subscript𝑈0𝑇direct-sum𝛾z:=U_{[0,T]}\bigoplus\gammaitalic_z := italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT ⨁ italic_γ. tq{1,2,T1}subscript𝑡𝑞12𝑇1t_{q}\in\{1,2,...T-1\}italic_t start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ∈ { 1 , 2 , … italic_T - 1 } is the stride for consecutive time intervals, and we let T=Ntq+Ttr𝑇𝑁subscript𝑡𝑞superscript𝑇trT=N\cdot t_{q}+T^{\text{tr}}italic_T = italic_N ⋅ italic_t start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT + italic_T start_POSTSUPERSCRIPT tr end_POSTSUPERSCRIPT.

II. Generalization to more interacting bodies. Many inverse design applications require generalizing the trained model to more interacting bodies for a dynamical system, which is far more difficult than generalizing to more time steps. Our method allows such generalization by composing the energy function of few-body interactions to more interacting bodies. Now we illustrate it with a 2-body to N-body generalization. Suppose that only the trajectory of a 2-body interaction is given, where we have the trajectory of U[0,T](i)=(U0(i),U1(i),,UT(i))subscriptsuperscript𝑈𝑖0𝑇subscriptsuperscript𝑈𝑖0subscriptsuperscript𝑈𝑖1subscriptsuperscript𝑈𝑖𝑇U^{(i)}_{[0,T]}=(U^{(i)}_{0},U^{(i)}_{1},...,U^{(i)}_{T})italic_U start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT = ( italic_U start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_U start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_U start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) for body i{1,2}𝑖12i\in\{1,2\}italic_i ∈ { 1 , 2 }. We can learn an energy function Eθ((U[0,T](1),U[0,T](2)),γ)subscript𝐸𝜃subscriptsuperscript𝑈10𝑇subscriptsuperscript𝑈20𝑇𝛾E_{\theta}((U^{(1)}_{[0,T]},U^{(2)}_{[0,T]}),\gamma)italic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( ( italic_U start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT , italic_U start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT ) , italic_γ ) from this trajectory. In the test time, given N>2𝑁2N>2italic_N > 2 interacting bodies subjecting to the same pairwise interactions, the energy function for the combined trajectory U[0,T]=(U[0,T](1),,U[0,T](N))subscript𝑈0𝑇subscriptsuperscript𝑈10𝑇subscriptsuperscript𝑈𝑁0𝑇U_{[0,T]}=(U^{(1)}_{[0,T]},...,U^{(N)}_{[0,T]})italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT = ( italic_U start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT , … , italic_U start_POSTSUPERSCRIPT ( italic_N ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT ) for the N𝑁Nitalic_N bodies is then given by:

Eθ(U[0,T],γ)=i<jEθ((U[0,T](i),U[0,T](j)),γ)subscript𝐸𝜃subscript𝑈0𝑇𝛾subscript𝑖𝑗subscript𝐸𝜃subscriptsuperscript𝑈𝑖0𝑇subscriptsuperscript𝑈𝑗0𝑇𝛾E_{\theta}(U_{[0,T]},\gamma)=\sum_{i<j}E_{\theta}\left((U^{(i)}_{[0,T]},U^{(j)% }_{[0,T]}),\gamma\right)italic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT , italic_γ ) = ∑ start_POSTSUBSCRIPT italic_i < italic_j end_POSTSUBSCRIPT italic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( ( italic_U start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT , italic_U start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT ) , italic_γ ) (8)

III. Generalization from part to whole for boundaries. Real-life inverse design typically involves designing shapes consisting of multiple parts that constitute an integral whole. Examples include planes that consist of wings, the body, the rudder, and many other parts. The shape of the whole may be more complex and out-of-distribution than the parts seen in training. To generalize from parts to whole, we can again compose the energy function over subsets of the design variable z𝑧zitalic_z. Concretely, suppose that we have trajectories U[0,T](i)subscriptsuperscript𝑈𝑖0𝑇U^{(i)}_{[0,T]}italic_U start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT corresponding to the part γisuperscript𝛾𝑖\gamma^{i}italic_γ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, i=1,2,N𝑖12𝑁i=1,2,...Nitalic_i = 1 , 2 , … italic_N, we can learn energy functions corresponding to the dynamics of each part Eθi(U[0,T](i),γi),i=1,2,Nformulae-sequencesubscript𝐸subscript𝜃𝑖subscriptsuperscript𝑈𝑖0𝑇superscript𝛾𝑖𝑖12𝑁E_{\theta_{i}}(U^{(i)}_{[0,T]},\gamma^{i}),i=1,2,...Nitalic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_U start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT , italic_γ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , italic_i = 1 , 2 , … italic_N. An example is that γisuperscript𝛾𝑖\gamma^{i}italic_γ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT represents the shape for each part of the plane, and U[0,T](i)subscriptsuperscript𝑈𝑖0𝑇U^{(i)}_{[0,T]}italic_U start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT represents the full fluid state around the part γisuperscript𝛾𝑖\gamma^{i}italic_γ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT without other parts present. In the test time, when requiring to generalize over a whole boundary γ𝛾\gammaitalic_γ that consisting of these N𝑁Nitalic_N parts γi,i=1,2Nformulae-sequencesuperscript𝛾𝑖𝑖12𝑁\gamma^{i},i=1,2...Nitalic_γ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_i = 1 , 2 … italic_N, we have

Eθ(U[0,T],γ)=i=1NEθi(U[0,T],γi)subscript𝐸𝜃subscript𝑈0𝑇𝛾superscriptsubscript𝑖1𝑁subscript𝐸subscript𝜃𝑖subscript𝑈0𝑇superscript𝛾𝑖E_{\theta}(U_{[0,T]},\gamma)=\sum_{i=1}^{N}E_{\theta_{i}}(U_{[0,T]},\gamma^{i})italic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT , italic_γ ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT , italic_γ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) (9)

Note that here in the composition, all the parts γisuperscript𝛾𝑖\gamma^{i}italic_γ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT share the same trajectory U[0,T]subscript𝑈0𝑇U_{[0,T]}italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT, which can be intuitively understood in the example of the plane where all the parts of the plane influence the same full state of fluid around the plane. The composition of energy functions in Eq. 9 means that the full energy Eθ(U[0,T],γ)subscript𝐸𝜃subscript𝑈0𝑇𝛾E_{\theta}(U_{[0,T]},\gamma)italic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT , italic_γ ) will be low if the trajectory U[0,T]subscript𝑈0𝑇U_{[0,T]}italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT is consistent with all the parts γisuperscript𝛾𝑖\gamma^{i}italic_γ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT.

Compositional Generative Inverse Design. Given the above composition of energy functions, we can correspondingly learn each energy function over the design variable z=U[0,T]γ𝑧subscript𝑈0𝑇direct-sum𝛾z=U_{[0,T]}\bigoplus\gammaitalic_z = italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT ⨁ italic_γ by training a corresponding diffusion model over the subset of design variables zizsuperscript𝑧𝑖𝑧z^{i}\subset zitalic_z start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ⊂ italic_z. Our overall sampling objective given the set of energy functions {Ei(zi)}i=1:Nsubscriptsubscript𝐸𝑖superscript𝑧𝑖:𝑖1𝑁\{E_{i}(z^{i})\}_{i=1:N}{ italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_z start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 : italic_N end_POSTSUBSCRIPT is then given by

zs1=zsη1Ni=1N(ϵθi(zsi,s)+λz𝒥(zs))+ξ,ξ𝒩(0,σs2I),z_{s-1}=z_{s}-\eta\frac{1}{N}\sum_{i=1}^{N}\left(\epsilon^{i}_{\theta}(z_{s}^{% i},s)+\lambda\nabla_{z}\mathcal{J}(z_{s})\right)+\xi,\quad\xi\sim\mathcal{N}% \bigl{(}0,\sigma^{2}_{s}I\bigl{)},italic_z start_POSTSUBSCRIPT italic_s - 1 end_POSTSUBSCRIPT = italic_z start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT - italic_η divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( italic_ϵ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_s ) + italic_λ ∇ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT caligraphic_J ( italic_z start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) ) + italic_ξ , italic_ξ ∼ caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_I ) , (10)

for s=S,S1,1𝑠𝑆𝑆11s=S,S-1,...1italic_s = italic_S , italic_S - 1 , … 1. Similarly to before, we can further run multiple steps of Langevin dynamics optimization at a given noise level following Du et al. (2023) to further improve performance. We provide the overall pseudo-code of our method in the compositional setting in Algorithm 1.

Our proposed paradigm of generative inverse design in Section 3.2 (consisting of its design objective Eq. 3 and training objective Eq. 4) and our compositional inverse design method in Section 3.3, constitute our full method of Compositional INverse design with Diffusion Models (CinDM). Our approach is different from product of experts (Hinton, 2002) in that CinDM learns distribution of a subspace in training, based on which we infer distribution in much higher spaces during inference. Below, we will test our method’s capability in compositional inverse design.

4 Experiments

In the experiments, we aim to answer the following questions: (1) Can CinDM generalize to more complex designs in the test time using its composition capability? (2) Comparing backpropagation with surrogate models and other strong baselines, can CinDM improve on the design objective or prediction accuracy? (3) Can CinDM address high-dimensional design space? To answer these questions, we perform our experiments in three different scenarios: compositional inverse design in time dimension (Sec. 4.1), compositional inverse design generalizing to more objects (Sec. 4.2), and 2D compositional design for multiple airfoils with Navier-Stokes flow (Sec. 4.3). Each of the above experiments represents an important scenario in inverse design and has important implications in science and engineering. In each experiment, we compare CinDM with the state-of-the-art deep learning-based inverse design method proposed by Allen et al. (2022), which we term Backprop, and cross-entropy method (CEM) (Rubinstein & Kroese, 2004) which is a standard sampling-based optimization method typically used in classical inverse design. Additionally, we compare CinDM with two inverse-design methods: neural adjoint method with the boundary loss function (NABL) and conditional invertible neural network (cINN) method (Ren et al., 2020; Ansari et al., 2022). The details and results are provided in Appendix H, in which we adopt a more reasonable experimental setting to those new baselines. All the baselines and our model contain similar numbers of parameters in each comparison for fair evaluation. To evaluate the performance of each inverse design method, we feed the output of the inverse design method (i.e., the optimized initial or boundary states) to the ground-truth solver, perform rollout by the solver and feed the rollout trajectory to the design objective. We do not use the surrogate model to perform rollout since the trained surrogate models may not be faithful to the ground-truth dynamics and can overestimate the design objective. By evaluating using a ground-truth solver, all inverse design methods can be evaluated fairly.

4.1 Compositional inverse design in time

In this experiment, we aim to test each method’s ability to generalize to more forward time steps than during training. This is important since in test time, the inverse design methods are typically used over longer predictions horizons than in training. We use an N-body interaction environment where each ball with a radius of 0.1 is bouncing in a 1×1111\times 11 × 1 box. The balls will exchange momentum when elastically colliding with each other or with the wall. The design task is to identify the initial state (position and velocity of the balls) of the system such that the end state optimizes a certain objective (e.g., as close to a certain target as possible). This setting represents a simplified version of many real-life scenarios such as billiard, bowling, and ice hockey. Since the collisions preserve kinetic energy but modify speed and direction of each ball and multiple collisions can happen over a long time, this represents a non-trivial inverse design problem with abrupt changes in the design space. During training time, we provide each method with training trajectory consisting of 24 steps, and in test time, let it roll out for a total of 24, 34, 44, and 54 steps. The design objective is to minimize the last step’s Euclidean distance to the center (x,y)=(0.5,0.5)𝑥𝑦0.50.5(x,y)=(0.5,0.5)( italic_x , italic_y ) = ( 0.5 , 0.5 ). For baselines, we compare with CEM (Rubinstein & Kroese, 2004) and Backprop (Allen et al., 2022). Each method uses either Graph Network Simulator (GNS, Sanchez-Gonzalez et al. (2020), a state-of-the-art method for modeling N-body interactions) or U-Net (Ronneberger et al., 2015) as backbone architecture that either predicts 1 step or 23 steps in a single forward pass. For our method, we use the same U-Net backbone architecture for diffusion. To perform time composition, we superimpose N𝑁Nitalic_N EBMs Eθ(U[0,T],γ)subscript𝐸𝜃subscript𝑈0𝑇𝛾E_{\theta}(U_{[0,T]},\gamma)italic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT , italic_γ ) on states with overlap** time ranges: U[0,23]subscript𝑈023U_{[0,23]}italic_U start_POSTSUBSCRIPT [ 0 , 23 ] end_POSTSUBSCRIPT, U[10,33]subscript𝑈1033U_{[10,33]}italic_U start_POSTSUBSCRIPT [ 10 , 33 ] end_POSTSUBSCRIPT, U[20,43]subscript𝑈2043U_{[20,43]}italic_U start_POSTSUBSCRIPT [ 20 , 43 ] end_POSTSUBSCRIPT,… U[10(N1),10(N1)+23]subscript𝑈10𝑁110𝑁123U_{[10(N-1),10(N-1)+23]}italic_U start_POSTSUBSCRIPT [ 10 ( italic_N - 1 ) , 10 ( italic_N - 1 ) + 23 ] end_POSTSUBSCRIPT as in Eq. 7, and use Eq. 10 to perform denoising diffusion. Besides evaluating with the design objective (𝒥𝒥\mathcal{J}caligraphic_J), we also use the metric of mean absolute error (MAE) between the predicted trajectory and the trajectory generated by the ground-truth solver to evaluate how faithful each method’s prediction is. Each design scenario is run 500 times and the average performance is reported in Table 1. We show example trajectories of our method in Fig. 2. Details for the architecture and training are provided in Appendix A. We also make comparison with a simple baseline that performs diffusion over 44 steps directly without time composition. Details and results are presented in Appendix C.

Table 1: Compositional Generalization Across Time. Experiment on compositional inverse design in time. The confidence interval information is deligated to Table 6 in Appendix B for page constraints. Bold font denotes the best model.
2-body 24 steps 2-body 34 steps 2-body 44 steps 2-body 54 steps
Method design obj MAE design obj MAE design obj MAE design obj MAE
CEM, GNS (1-step) 0.2622 0.13963 0.2204 0.15378 0.2701 0.21277 0.2773 0.21706
CEM, GNS 0.2699 0.12746 0.3142 0.14637 0.3056 0.18155 0.3124 0.20266
CEM, U-Net (1-step) 0.2364 0.07720 0.2391 0.09701 0.2744 0.11885 0.2729 0.12992
CEM, U-Net 0.1762 0.03597 0.1639 0.03094 0.1816 0.03900 0.1887 0.04350
Backprop, GNS (1-step) 0.1452 0.04339 0.1497 0.03806 0.1511 0.03621 0.1851 0.04104
Backprop, GNS 0.2407 0.09788 0.2678 0.11017 0.2762 0.12395 0.2952 0.13963
Backprop, U-Net (1-step) 0.2182 0.07554 0.2445 0.08278 0.2536 0.08487 0.2751 0.10599
Backprop, U-Net 0.1228 0.01974 0.1171 0.01236 0.1143 0.00970 0.1289 0.01067
CinDM (ours) 0.1160 0.01264 0.1288 0.00917 0.1447 0.00959 0.1650 0.01064

From Table 1, we see that our method is competitive in design objectives and outperforms every baseline in MAE. In the “2-body 24 steps” scenario which is the same setting as in training and without composition, our method outperforms the strongest baselines by a wide margin both on design objective and MAE. With more prediction steps, our method not only performs better than any baselines in MAE but also merely is weaker than the strongest baseline in design objective. For example, our method’s MAE outperforms the best baseline by 36.0%, 25.8%, 1.1%, and 0.3% for 24, 34, 44, and 54-step predictions, respectively, with an average of 15.8% improvement. Similarly, our method’s design objective outperforms the best baseline by 5.5% for 24-step. This shows the two-fold advantage of our method. Firstly, even with the same backbone architecture, our diffusion method can roll out stably and accurately for much longer than the baseline, since the forward surrogate models in the baselines during design may encounter out-of-distribution and adversarial inputs which it does not know how to evolve properly. On the other hand, our diffusion-based method is trained to denoise and favor inputs consistent with the underlying physics. Secondly, our compositional method allows our model to generalize to longer time steps and allows for stable rollout. An example trajectory designed by our CinDM is shown in Fig. 2 (a). We see that it matches with the ground-truth simulation nicely, captures the bouncing with walls and with other balls, and the end position of the bodies tends towards the center, showing the effectiveness of our method. We also see that Backprop’s performance are superior to the sampling-based CEM, consistent with Allen et al. (2022).

4.2 Compositional inverse design generalizing to more objects

Refer to caption
Figure 2: Example trajectories for N-body dataset with compositional inverse design in time (a) and bodies (b). The circles indicate CinDM-designed trajectory for the balls, drawn with every 2 steps and darker color indicating later states. The central star indicates the design target that the end state should be as close to as possible. “+” indicates ground-truth trajectory simulated by the solver.

In this experiment, we test each method’s performance in inverse design on larger state dimensions than in training. We utilize the N-body simulation environment as in Sec. 4.1, but instead of considering longer trajectories, we test on more bodies than in training. This setting is also inspired by real-life scenarios where the dynamics in test time have more interacting objects than in training (e.g., in astronomical simulation and biophysics). Specifically, all methods are trained with only 2-body interactions with 24 time steps, and tested with 4-body and 8-body interactions for 24 and 44 time steps using Eq. 8. This is a markedly more challenging task than generalizing to more time steps since the methods need to generalize to a much larger state space than in training. For N-body interaction, there are N(N1)/2𝑁𝑁12N(N-1)/{2}italic_N ( italic_N - 1 ) / 2 pairs of 2-body interactions. The case with 44 time steps adds difficulty by testing generalization in both state size and time (composing 28×3=842838428\times 3=8428 × 3 = 84 diffusion models for CinDM).

For the base network architecture, the U-Net in Backprop cannot generalize to more bodies due to U-Net’s fixed feature dimension. Thus we only use GNS as the backbone architecture in the baselines. In contrast, while our CinDM method also uses U-Net as base architecture, it can generalize to more bodies due to the compositional capability of diffusion models. The results are reported in Table 2.

Table 2: Compositional Generalizaion Across Objects. Experiment on compositional inverse design generalizing to more objects. The confidence interval information is deligated to Table 7 in Appendix B for page constraints.
4-body 24 steps 4-body 44 steps 8-body 24 steps 8-body 44 steps
Method design obj MAE design obj MAE design obj MAE design obj MAE
CEM, GNS (1-step) 0.3173 0.23293 0.3307 0.53521 0.3323 0.38632 0.3306 0.53839
CEM, GNS 0.3314 0.25325 0.3313 0.28375 0.3314 0.25325 0.3313 0.28375
Backprop, GNS (1-step) 0.2947 0.06008 0.2933 0.30416 0.3280 0.46541 0.3317 0.72814
Backprop, GNS 0.3221 0.09871 0.3195 0.15745 0.3251 0.15917 0.3299 0.21489
CinDM (ours) 0.2034 0.03928 0.2254 0.03163 0.3062 0.09241 0.3212 0.09249

From Table 2, we see that our CinDM method outperforms all baselines by a wide margin in both the design objective and MAE. On average, our method achieves an improvement of 15.6% in design objective, and an improvement of 53.4% in MAE than the best baseline. In Fig. 2 (b), we see that our method captures the interaction of the 4 bodies with the wall and each other nicely and all bodies tend towards center at the end. The above results again demonstrate the strong compositional capability of our method: it can generalize to much larger state space than seen in training.

4.3 2D compositional design for multiple airfoils

In this experiment, we test the methods’ ability to perform inverse design in high-dimensional space, for multiple 2D airfoils. We train the methods using flow around a single randomly-sampled shape, and in the test time, ask it to perform inverse design for one or more airfoils. The standard goal for airfoil design is to maximize the ratio between the total lift force and total drag force, thus improving aerodynamic performance and reducing cost. The multi-airfoil case represents an important scenario in real-life engineering where the boundary shape that needs to be designed is more complicated and out-of-distribution than in training, but can be constructed by composing multiple parts. Moreover, when there are multiple flying agents, they may use formation flying to minimize drag, as has been observed in nature for migrating birds (Lissaman & Shollenberger, 1970; Hummel, 1995) and adopted by humans in aerodynamics (Venkataramanan et al., 2003). For the ground-truth solver that generates a training set and performs evaluation, we use Lily-Pad (Weymouth, 2015). The fluid state Utsubscript𝑈𝑡U_{t}italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT at each time step t𝑡titalic_t is represented by 64×64646464\times 6464 × 64 grid cells where each cell has three dynamic features: fluid velocity vxsubscript𝑣𝑥v_{x}italic_v start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT, vysubscript𝑣𝑦v_{y}italic_v start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT, and pressure. The boundary γ𝛾\gammaitalic_γ is represented by a 64×64×36464364\times 64\times 364 × 64 × 3 tensor, where for each grid cell, it has three features: a binary mask indicating whether the cell is inside a boundary (denoted by 1) or in the fluid (denoted by 0), and relative position (ΔxΔ𝑥\Delta xroman_Δ italic_x, ΔyΔ𝑦\Delta yroman_Δ italic_y) between the cell center to the closest point on the boundary. Therefore, the boundary has 64×64×3=12288646431228864\times 64\times 3=1228864 × 64 × 3 = 12288 dimensions, making the inverse design task especially challenging.

Refer to caption
Figure 3: Discovered formation flying. In the 2-airfoil case, our model’s designed boundary forms a “leader” and “follower” formation (a), reducing the drag by 53.6% and increases the lift-to-drag ratio by 66.1% compared to each airfoil flying separately (b)(c). Colors represent fluid vorticity.

For CinDM, we use U-Net as the backbone architecture and train it to denoise the trajectory and boundary. In the test time, we utilize Eq. 9 to compose multiple airfoils into a formation. For both CEM and Backprob, we use the state-of-the-art architecture of FNO (Li et al., 2021) and LE-PDE (Wu et al., 2022a). For all methods, to improve design stability, we use the design objective of 𝒥=lift+drag𝒥liftdrag\mathcal{J}=-\text{lift}+\text{drag}caligraphic_J = - lift + drag and evaluate both this design objective and the lift-to-drag ratio. The results are in Table 3. Details are provided in Appendix D.

Table 3: Generalization Across Airfoils. Experiment results for multi-airfoil compositional design.
1 airfoil 2 airfoils
Method design obj \downarrow lift-to-drag ratio \uparrow design obj \downarrow lift-to-drag ratio \uparrow
CEM, FNO 0.0932 1.4005 0.3890 1.0914
CEM, LE-PDE 0.0794 1.4340 0.1691 1.0568
Backprop, FNO 0.0281 1.3300 0.1837 0.9722
Backprop, LE-PDE 0.1072 1.3203 0.0891 0.9866
CinDM (ours) 0.0797 2.1770 0.1986 1.4216

The table shows that although CinDM has a similar design objective as baseline methods, it achieves a much higher lift-to-drag ratio than the baselines, especially in the compositional case of 2 airfoils. Fig. 9 and Fig. 10 show examples of the designed initial state and boundary for the 2-airfoil scenario, for our model and “CEM, FNO” baseline, respectively. We see that while our CinDM can design a smooth initial state and reasonable boundaries, the baseline falls into adversarial modes. A surprising finding is that our model discovers formation flying (Fig. 3) that reduces the drag by 53.6% and increases the lift-to-drag ratio by 66.1% compared to each airfoil flying separately. The above demonstrates the capability of CinDM to effectively design boundaries that are more complex than in training, and achieving much better design performance.

5 Conclusion

In this work, we have introduced Compositional Inverse Design with Diffusion Models (CinDM), a novel paradigm and method to perform compositional generative inverse design. By composing the trained diffusion models on subsets of the design variables and jointly optimizing the trajectory and the boundary, CinDM can generalize to design systems much more complex than the ones seen in training. We’ve demonstrated our model’s compositional inverse design capability in N-body and 2D multi-airfoil tasks, and believe that the techniques presented in this paper are general (Appendix J), and may further be applied across other settings such as material, drug, and molecule design.

Acknowledgments

We thank Boai Sun and Haodong Feng for suggestions on Lily-Pad simulation. We thank the anonymous reviewers for providing valuable feedback on our manuscript. We also gratefully acknowledge the support of Westlake University Research Center for Industries of the Future and Westlake University Center for High-performance Computing.

The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding entities.

References

  • Ajay et al. (2023) Anurag Ajay, Seungwook Han, Yilun Du, Shaung Li, Abhi Gupta, Tommi Jaakkola, Josh Tenenbaum, Leslie Kaelbling, Akash Srivastava, and Pulkit Agrawal. Compositional foundation models for hierarchical planning. arXiv preprint arXiv:2309.08587, 2023.
  • Allen et al. (2022) Kelsey Allen, Tatiana Lopez-Guevara, Kimberly L Stachenfeld, Alvaro Sanchez Gonzalez, Peter Battaglia, Jessica B Hamrick, and Tobias Pfaff. Inverse design for fluid-structure interactions using graph network simulators. In Advances in Neural Information Processing Systems, volume 35, pp.  13759–13774. Curran Associates, Inc., 2022.
  • Anderson & Venkatakrishnan (1999) W Kyle Anderson and V Venkatakrishnan. Aerodynamic design optimization on unstructured grids with a continuous adjoint formulation. Computers & Fluids, 28(4-5):443–480, 1999.
  • Ansari et al. (2022) Navid Ansari, Hans peter Seidel, Nima Vahidi Ferdowsi, and Vahid Babaei. Autoinverse: Uncertainty aware inversion of neural networks. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=dNyCj1AbOb.
  • Athanasopoulos et al. (2009) Michael Athanasopoulos, Hassan Ugail, and Gabriela González Castro. Parametric design of aircraft geometry using partial differential equations. Advances in Engineering Software, 40(7):479–486, 2009.
  • Bhowmik et al. (2019) Arghya Bhowmik, Ivano E Castelli, Juan Maria Garcia-Lastra, Peter Bjørn Jørgensen, Ole Winther, and Tejs Vegge. A perspective on inverse design of battery interphases using multi-scale modelling, experiments and generative deep learning. Energy Storage Materials, 21:446–456, 2019.
  • Blomqvist (2007) Victor Blomqvist. Pymunk tutorials, 2007. URL http://www.pymunk.org/en/latest/tutorials.html.
  • Brandstetter et al. (2022) Johannes Brandstetter, Daniel E. Worrall, and Max Welling. Message passing neural PDE solvers. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=vSix3HPYKSU.
  • Chen et al. (2023) Can Chen, Yingxue Zhang, Xue Liu, and Mark Coates. Bidirectional learning for offline model-based biological sequence design, 2023. URL https://openreview.net/forum?id=luEG3j9LW5-.
  • Cohen et al. (2020) Samuel Cohen, Rendani Mbuvha, Tshilidzi Marwala, and Marc Deisenroth. Healing products of gaussian process experts. In International Conference on Machine Learning, pp. 2068–2077. PMLR, 2020.
  • Coros et al. (2013) Stelian Coros, Bernhard Thomaszewski, Gioacchino Noris, Shinjiro Sueda, Moira Forberg, Robert W Sumner, Wojciech Matusik, and Bernd Bickel. Computational design of mechanical characters. ACM Transactions on Graphics (TOG), 32(4):1–12, 2013.
  • Dijkstra & Luijten (2021) Marjolein Dijkstra and Erik Luijten. From predictive modelling to machine learning and reverse engineering of colloidal self-assembly. Nature materials, 20(6):762–773, 2021.
  • Du et al. (2020a) Tao Du, Kui Wu, Andrew Spielberg, Wojciech Matusik, Bo Zhu, and Eftychios Sifakis. Functional optimization of fluidic devices with differentiable stokes flow. ACM Transactions on Graphics (TOG), 39(6):1–15, 2020a.
  • Du & Mordatch (2019) Yilun Du and Igor Mordatch. Implicit generation and generalization in energy-based models. arXiv preprint arXiv:1903.08689, 2019.
  • Du et al. (2019) Yilun Du, Toru Lin, and Igor Mordatch. Model based planning with energy based models. CORL, 2019.
  • Du et al. (2020b) Yilun Du, Shuang Li, and Igor Mordatch. Compositional visual generation with energy based models. In Advances in Neural Information Processing Systems, 2020b.
  • Du et al. (2023) Yilun Du, Conor Durkan, Robin Strudel, Joshua B Tenenbaum, Sander Dieleman, Rob Fergus, Jascha Sohl-Dickstein, Arnaud Doucet, and Will Grathwohl. Reduce, reuse, recycle: Compositional generation with energy-based diffusion models and mcmc. arXiv preprint arXiv:2302.11552, 2023.
  • Gkanatsios et al. (2023) Nikolaos Gkanatsios, Ayush Jain, Zhou Xian, Yunchu Zhang, Christopher Atkeson, and Katerina Fragkiadaki. Energy-based models as zero-shot planners for compositional scene rearrangement. arXiv preprint arXiv:2304.14391, 2023.
  • Gordon et al. (2023) Spencer L Gordon, Manav Kant, Eric Ma, Leonard J Schulman, and Andrei Staicu. Identifiability of product of experts models. arXiv preprint arXiv:2310.09397, 2023.
  • Hinton (2002) Geoffrey E Hinton. Training products of experts by minimizing contrastive divergence. Neural computation, 14(8):1771–1800, 2002.
  • Hummel (1995) Dietrich Hummel. Formation flight as an energy-saving mechanism. Israel Journal of Ecology and Evolution, 41(3):261–278, 1995.
  • Kingma & Ba (2014) Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  • LeCun et al. (2006) Yann LeCun, Sumit Chopra, Raia Hadsell, M Ranzato, and F Huang. A tutorial on energy-based learning. Predicting structured data, 1(0), 2006.
  • Li et al. (2022) Shuang Li, Xavier Puig, Chris Paxton, Yilun Du, Clinton Wang, Linxi Fan, Tao Chen, De-An Huang, Ekin Akyürek, Anima Anandkumar, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems, 35:31199–31212, 2022.
  • Li et al. (2021) Zongyi Li, Nikola Borislavov Kovachki, Kamyar Azizzadenesheli, Burigede liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=c8P9NQVtmnO.
  • Lissaman & Shollenberger (1970) Peter BS Lissaman and Carl A Shollenberger. Formation flight of birds. Science, 168(3934):1003–1005, 1970.
  • Liu et al. (2021) Nan Liu, Shuang Li, Yilun Du, Josh Tenenbaum, and Antonio Torralba. Learning to compose visual relations. Advances in Neural Information Processing Systems, 34:23166–23178, 2021.
  • Liu et al. (2022) Nan Liu, Shuang Li, Yilun Du, Antonio Torralba, and Joshua B Tenenbaum. Compositional visual generation with composable diffusion models. arXiv preprint arXiv:2206.01714, 2022.
  • Molesky et al. (2018) Sean Molesky, Zin Lin, Alexander Y Piggott, Weiliang **, Jelena Vucković, and Alejandro W Rodriguez. Inverse design in nanophotonics. Nature Photonics, 12(11):659–670, 2018.
  • Nie et al. (2021) Weili Nie, Arash Vahdat, and Anima Anandkumar. Controllable and compositional generation with latent-space energy-based models. Advances in Neural Information Processing Systems, 34, 2021.
  • Po & Wetzstein (2023) Ryan Po and Gordon Wetzstein. Compositional 3d scene generation using locally conditioned diffusion. arXiv preprint arXiv:2303.12218, 2023.
  • Ren et al. (2020) Simiao Ren, Willie Padilla, and Jordan Malof. Benchmarking deep inverse models over time, and the neural-adjoint method. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  38–48. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/007ff380ee5ac49ffc34442f5c2a2b86-Paper.pdf.
  • Ronneberger et al. (2015) Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp.  234–241. Springer, 2015.
  • Rubinstein & Kroese (2004) Reuven Y Rubinstein and Dirk P Kroese. The cross-entropy method: a unified approach to combinatorial optimization, Monte-Carlo simulation, and machine learning, volume 133. Springer, 2004.
  • Saghafi & Lavimi (2020) Mohammad Saghafi and Roham Lavimi. Optimal design of nose and tail of an autonomous underwater vehicle hull to reduce drag force using numerical simulation. Proceedings of the Institution of Mechanical Engineers, Part M: Journal of Engineering for the Maritime Environment, 234(1):76–88, 2020.
  • Sanchez-Gonzalez et al. (2020) Alvaro Sanchez-Gonzalez, Jonathan Godwin, Tobias Pfaff, Rex Ying, Jure Leskovec, and Peter Battaglia. Learning to simulate complex physics with graph networks. In International Conference on Machine Learning, pp. 8459–8468. PMLR, 2020.
  • Shinners (2000) Pete Shinners. Pygame: A set of Python modules designed for writing video games, 2000. URL https://www.pygame.org/.
  • Tautvaišas & Žilinskas (2023) Saulius Tautvaišas and Julius Žilinskas. Heteroscedastic bayesian optimization using generalized product of experts. Journal of Global Optimization, pp.  1–21, 2023.
  • Trabucco et al. (2021) Brandon Trabucco, Aviral Kumar, Xinyang Geng, and Sergey Levine. Design-bench: Benchmarks for data-driven offline model-based optimization, 2021. URL https://openreview.net/forum?id=cQzf26aA3vM.
  • Urain et al. (2021) Julen Urain, Anqi Li, Puze Liu, Carlo D’Eramo, and Jan Peters. Composable energy policies for reactive motion generation and reinforcement learning. arXiv preprint arXiv:2105.04962, 2021.
  • Venkataramanan et al. (2003) Sriram Venkataramanan, Atilla Dogan, and William Blake. Vortex effect modelling in aircraft formation flight. In AIAA atmospheric flight mechanics conference and exhibit, pp.  5385, 2003.
  • Wang et al. (2023) Zihao Wang, Lin Gui, Jeffrey Negrea, and Victor Veitch. Concept algebra for text-controlled vision models. arXiv preprint arXiv:2302.03693, 2023.
  • Weymouth (2015) Gabriel D Weymouth. Lily pad: Towards real-time interactive computational fluid dynamics. arXiv preprint arXiv:1510.06886, 2015.
  • Wu et al. (2022a) Tailin Wu, Takashi Maruyama, and Jure Leskovec. Learning to accelerate partial differential equations via latent global evolution. Advances in Neural Information Processing Systems, 35:2240–2253, 2022a.
  • Wu et al. (2022b) Tailin Wu, Megan Tjandrasuwita, Zhengxuan Wu, Xuelin Yang, Kevin Liu, Rok Sosic, and Jure Leskovec. Zeroc: A neuro-symbolic model for zero-shot concept recognition and acquisition at inference time. Advances in Neural Information Processing Systems, 35:9828–9840, 2022b.
  • Yang et al. (2023a) Mengjiao Yang, Yilun Du, Bo Dai, Dale Schuurmans, Joshua B Tenenbaum, and Pieter Abbeel. Probabilistic adaptation of text-to-video models. arXiv preprint arXiv:2306.01872, 2023a.
  • Yang et al. (2023b) Zhutian Yang, Jiayuan Mao, Yilun Du, Jiajun Wu, Joshua B Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling. Compositional diffusion-based continuous constraint solvers. arXiv preprint arXiv:2309.00966, 2023b.
  • Zhang et al. (2023) Xuan Zhang, Limei Wang, Jacob Helwig, Youzhi Luo, Cong Fu, Yaochen Xie, Meng Liu, Yuchao Lin, Zhao Xu, Keqiang Yan, et al. Artificial intelligence for science in quantum, atomistic, and continuum systems. arXiv preprint arXiv:2307.08423, 2023.
  • Zhao et al. (2022) Qingqing Zhao, David B. Lindell, and Gordon Wetzstein. Learning to solve PDE-constrained inverse problems with graph networks. In ICML, 2022.

Appendix A Additional details for compositional inverse design in time

This section provides additional details for Section 4.1 and Section 4.2. In both sections, we use the same dataset for training, and the model architecture and training specifics are the same for both sections.

Dataset. We use two Python packages Pymunk (Blomqvist, 2007) and Pygame (Shinners, 2000) to generate the trajectories for this N-body dataset. We use 4 walls and several bodies to define the simulation environment. The walls are shaped as a 200×200200200200\times 200200 × 200 rectangle, setting elasticity to 1.0 and friction to 0.0. A body is described as a ball (circle) with a radius of 20, which shares the same elasticity and friction coefficient as the wall it interacts with. The body is placed randomly within the boundaries and its initial velocity is determined using a uniform distribution vU(100,100)similar-to𝑣𝑈100100v\sim U(-100,100)italic_v ∼ italic_U ( - 100 , 100 ). We performed 2000 simulations, for 2 balls, 4 balls, and 8 balls in each simulation. Each simulation has a time step of 1/60 seconds, consisting of 1000 steps in total. During these simulations, we record the positions and velocities of each particle in two dimensions at each time step to generate 3 datasets with a shape of [Ns,Nt,Nb,Nf]subscript𝑁𝑠subscript𝑁𝑡subscript𝑁𝑏subscript𝑁𝑓\left[N_{s},N_{t},N_{b},N_{f}\right][ italic_N start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , italic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_N start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_N start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ]. Nssubscript𝑁𝑠N_{s}italic_N start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT means number of simulations, Ntsubscript𝑁𝑡N_{t}italic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT means number of time steps, Nbsubscript𝑁𝑏N_{b}italic_N start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT is number of bodies, Nfsubscript𝑁𝑓N_{f}italic_N start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT means number of features. The input of one piece of data shaped as [B,1,Nb×Nf]𝐵1subscript𝑁𝑏subscript𝑁𝑓\left[B,1,N_{b}\times N_{f}\right][ italic_B , 1 , italic_N start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ], B𝐵Bitalic_B is batch size, for example, [32,1,8]3218\left[32,1,8\right][ 32 , 1 , 8 ] for 2 bodies conditioning on only one step. Before training the model, the final data will be normalized by dividing it by 200 and setting the time resolution to four simulation time steps.

Model structure. The U-Net (Ronneberger et al., 2015) consists of three modules: the downsampling encoder, the middle module, and the upsampling decoder. The downsampling encoder comprises 4 layers, each including three residual modules and downsampling convolutions. The middle module contains 3 residual modules, while the upsampling decoder includes four layers, each with 3 residual modules and upsampling. We mainly utilize one-dimensional convolutions in each residual module and incorporate attention mechanisms. The input shape of our model is defined as [batch_size,n_steps,n_features]𝑏𝑎𝑡𝑐_𝑠𝑖𝑧𝑒𝑛_𝑠𝑡𝑒𝑝𝑠𝑛_𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠[batch\_size,n\_steps,n\_features][ italic_b italic_a italic_t italic_c italic_h _ italic_s italic_i italic_z italic_e , italic_n _ italic_s italic_t italic_e italic_p italic_s , italic_n _ italic_f italic_e italic_a italic_t italic_u italic_r italic_e italic_s ], and the output shape follows the same structure. The GNS (Sanchez-Gonzalez et al., 2020) model consists of three main components. First, it builds an undirected graph based on the current state. Then, it encodes nodes and edges on the constructed graph, using message passing to propagate information. Finally, it decodes the predicted acceleration and utilizes semi-implicit Euler integration to update the next state. In our implementation of GNS, each body represents a node with three main attributes: current speed, distance from the wall, and particle type. We employ the standard k-d tree search algorithm to locate adjacent bodies within a connection radius, which is set as 0.2 twice the body radius. The attribute of an edge is the vector distance between the two connected bodies. More details are in Table 4.

Training. We utilize the MSE (mean squared error) as the loss function in our training process. Our model is trained for approximately 60 hours on a single Tesla V100 GPU, with a batch size of 32, employing the Adam optimizer for 1 million iterations. For the first 600,000 steps, the learning rate is set to 1e-4. After that, the learning rate is decayed by 0.5 every 40,000 steps for the remaining 400,000 iterations. More details are provided in Table 5.

Table 4: Hyperparameters of model architecture for N-body task.
Hyperparameter name 23-steps 1-step
Hyperparameters for U-Net architecture:
Channel Expansion Factor (1,2,4,8)1248\left(1,2,4,8\right)( 1 , 2 , 4 , 8 ) (1,2,1,1)1211\left(1,2,1,1\right)( 1 , 2 , 1 , 1 )
Number of downsampling layers 4 4
Number of upsampling layers 4 4
Input channels 8 8
Number of residual blocks for each layer 3 3
Batch size 32 32
Input shape [32,24,8]32248\left[32,24,8\right][ 32 , 24 , 8 ] [32,2,8]3228\left[32,2,8\right][ 32 , 2 , 8 ]
Output shape [32,24,8]32248\left[32,24,8\right][ 32 , 24 , 8 ] [32,2,8]3228\left[32,2,8\right][ 32 , 2 , 8 ]
Hyperparameters for GNS architecture:
Input steps 1 1
Prediction steps 23 1
Number of particle types 1 1
Connection radius 0.2 0.2
Maximum number of edges per node 6 6
Number of node features 8 8
Number of edge features 3 3
Message propagation layers 5 5
Latent size 64 64
Output size 46 2
Hyperparameters for the U-Net in our CinDM:
Diffusion Noise Schedule cosine cosine
Diffusion Step 1000 1000
Channel Expansion Factor (1,2,4,8)1248\left(1,2,4,8\right)( 1 , 2 , 4 , 8 ) (1,2,1,1)1211\left(1,2,1,1\right)( 1 , 2 , 1 , 1 )
Number of downsampling layers 4 4
Number of upsampling layers 4 4
Input channels 8 8
Number of residual blocks for each layer 3 3
Batch size 32 32
Input shape [32,24,8]32248\left[32,24,8\right][ 32 , 24 , 8 ] [32,2,8]3228\left[32,2,8\right][ 32 , 2 , 8 ]
Output shape [32,24,8]32248\left[32,24,8\right][ 32 , 24 , 8 ] [32,2,8]3228\left[32,2,8\right][ 32 , 2 , 8 ]
Table 5: Hyperparameters of training for N-body task.
Hyperparameter name 23-steps 1-step
Hyperparameters for U-Net training:
Loss function MSE MSE
Number of examples for training dataset 3×1053superscript1053\times 10^{5}3 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT 3×1053superscript1053\times 10^{5}3 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT
Total number of training steps 1×1061superscript1061\times 10^{6}1 × 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 1×1061superscript1061\times 10^{6}1 × 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT
Batch size 32 32
Initial learning rate 1×1041superscript1041\times 10^{-4}1 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 1×1041superscript1041\times 10^{-4}1 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT
Number of training steps with a fixed learning rate 6×1056superscript1056\times 10^{5}6 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT 6×1056superscript1056\times 10^{5}6 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT
Learning rate adjustment strategy StepLR StepLR
Optimizer Adam Adam
Number of steps for saving checkpoint 1×1041superscript1041\times 10^{4}1 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 1×1041superscript1041\times 10^{4}1 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT
Exponential Moving Average decay rate 0.95 0.95
Hyperparameters for GNS training:
Loss function MSE MSE
Number of examples for training dataset 3×1053superscript1053\times 10^{5}3 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT 3×1053superscript1053\times 10^{5}3 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT
Total number of training steps 1×1061superscript1061\times 10^{6}1 × 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 1×1061superscript1061\times 10^{6}1 × 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT
Batch size 32 32
Initial learning rate 1×1041superscript1041\times 10^{-4}1 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 1×1041superscript1041\times 10^{-4}1 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT
Number of training steps with a fixed learning rate 6×1056superscript1056\times 10^{5}6 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT 6×1056superscript1056\times 10^{5}6 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT
Learning rate adjustment strategy StepLR StepLR
Optimizer Adam Adam
Number of steps for saving checkpoint 1×1041superscript1041\times 10^{4}1 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 1×1041superscript1041\times 10^{4}1 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT
Exponential Moving Average decay rate 0.95 0.95
Hyperparameters for our CinDM training:
Loss function MSE MSE
Number of examples for training dataset 3×1053superscript1053\times 10^{5}3 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT 3×1053superscript1053\times 10^{5}3 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT
Total number of training steps 1×1061superscript1061\times 10^{6}1 × 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 1×1061superscript1061\times 10^{6}1 × 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT
Batch size 32 32
Initial learning rate 1×1041superscript1041\times 10^{-4}1 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 1×1041superscript1041\times 10^{-4}1 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT
Number of training steps with a fixed learning rate 6×1056superscript1056\times 10^{5}6 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT 6×1056superscript1056\times 10^{5}6 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT
Learning rate adjustment strategy StepLR StepLR
Optimizer Adam Adam
Number of steps for saving checkpoint 1×1041superscript1041\times 10^{4}1 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 1×1041superscript1041\times 10^{4}1 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT
Exponential Moving Average decay rate 0.95 0.95

To perform inverse design, we mainly trained the following models: U-Net, conditioned on 1 step and capable of rolling out 23 steps; U-Net (single step), conditioned on 1 step and limited to rolling out only 1 step; GNS, conditioned on 1 step and able to roll out 23 steps; GNS (single step), conditioned on 1 step and restricted to rolling out only 1 step; and the diffusion model. Simultaneously, we conducted a comparison to assess the efficacy of time compose by training a diffusion model with 44 steps directly for inverse design, eliminating the requirement for time compose. The results and analysis are shown in Appendix C. Throughout the training process, we maintained consistency in the selection of optimizers, datasets, and training steps for these models.

Inverse design. The center point is defined as the target point, and our objective is to minimize the mean squared error (MSE) between the position of the trajectory’s last step and the target point. To compare our CinDM method, we utilize U-Net and GNS as forward models separately. We then use CEM (Rubinstein & Kroese, 2004) and Backprop (Allen et al., 2022) for inverse design with conditioned state (x0,y0,vx0,vy0)subscript𝑥0subscript𝑦0subscript𝑣𝑥0subscript𝑣𝑦0(x_{0},y_{0},v_{x0},v_{y0})( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_x 0 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_y 0 end_POSTSUBSCRIPT ) used as input, and multiple trajectories of different bodies as rolled out. While the CEM algorithm does not require gradient information, we define a parameterized Gaussian distribution and sample several conditions from it to input into the forward model for prediction. After the calculation of loss between the prediction and target, the best-performing samples are selected to update the parameterized Gaussian distribution. Through multiple iterations, we can sample favorable conditions from the optimized distribution to predict trajectories with low loss values. Backpropagation heavily relies on gradient information. It calculates the gradient of the loss concerning the conditions and updates the conditions using gradient descent, ultimately designing conditions that result in promising output.

During training, we can only predict a finite number of time steps based on conditional states, but the system evolves over an infinite number of time steps starting from an initial state in real-world physical processes. To address this, we need to combine time intervals while training a single model capable of predicting longer trajectories despite having a limited number of training steps. For the forward model, whether using U-Net or GNS, we rely on an intermediate time step derived from the last prediction as the condition for the subsequent prediction. We iteratively forecast additional time steps based on a single initial condition in this manner. As for the forward model (single step), we employ an autoregressive approach using the last step of the previous prediction to predict more steps.

Appendix B Full results for compositional inverse design of the N-body task

Here we provide the full statistical results including the 95% confidence interval (for 500 instances) for N-body experiments, including compostional inverse design in time and more objects. Specifically, Table 6 shows detailed results for Table 1 in Section 4.1; and Table 7 extends Table 2 in Section 4.2.

Table 6: Compositional Generalization Across Time. The confidence interval information is provided in addition to Table 1.
2-body 24 steps 2-body 34 steps 2-body 44 steps 2-body 54 steps
Method design obj MAE design obj MAE design obj MAE design obj MAE
CEM, GNS (1-step) 0.2622 ± 0.0090 0.13963 ± 0.00999 0.2204 ± 0.0072 0.15378 ± 0.01123 0.2701 ± 0.0079 0.21277 ± 0.01264 0.2773 ± 0.0070 0.21706 ± 0.01256
CEM, GNS 0.2699 ± 0.0081 0.12746 ± 0.00662 0.3142 ± 0.0064 0.14637 ± 0.00657 0.3056 ± 0.0060 0.18155 ± 0.00689 0.3124 ± 0.0062 0.20266 ± 0.00679
CEM, U-Net (1-step) 0.2364 ± 0.0068 0.07720 ± 0.00623 0.2391 ± 0.0081 0.09701 ± 0.00796 0.2744 ± 0.0073 0.11885 ± 0.00854 0.2729 ± 0.0074 0.12992 ± 0.00897
CEM, U-Net 0.1762 ± 0.0071 0.03597 ± 0.00395 0.1639 ± 0.0062 0.03094 ± 0.00342 0.1816 ± 0.0072 0.03900 ± 0.00451 0.1887 ± 0.0075 0.04350 ± 0.00487
Backprop, GNS (1-step) 0.1452 ± 0.0050 0.04339 ± 0.00285 0.1497 ± 0.0061 0.03806 ± 0.00304 0.1511 ± 0.0062 0.03621 ± 0.00322 0.1851 ± 0.0062 0.04104 ± 0.00285
Backprop, GNS 0.2407 ± 0.0067 0.09788 ± 0.00615 0.2678 ± 0.0072 0.11017 ± 0.00620 0.2762 ± 0.0071 0.12395 ± 0.00657 0.2952 ± 0.0073 0.13963 ± 0.00623
Backprop, U-Net (1-step) 0.2182 ± 0.0068 0.07554 ± 0.00466 0.2445 ± 0.0093 0.08278 ± 0.00613 0.2536 ± 0.0078 0.08487 ± 0.00611 0.2751 ± 0.0088 0.10599 ± 0.00709
Backprop, U-Net 0.1228 ± 0.0040 0.01974 ± 0.00223 0.1171 ± 0.0032 0.01236 ± 0.00104 0.1143 ± 0.0026 0.00970 ± 0.00076 0.1289 ± 0.0043 0.01067 ± 0.00090
CinDM (ours) 0.1160 ± 0.0019 0.01264 ± 0.00057 0.1288 ± 0.0030 0.00917 ± 0.00070 0.1447 ± 0.0040 0.00959 ± 0.00116 0.1650 ± 0.0045 0.01064 ± 0.00117
Table 7: Compositional Generalizaion Across Objects.. The confidence interval information is provided in addition to Table 2.
4-body 24 steps 4-body 44 steps 8-body 24 steps 8-body 44 steps
Method design obj MAE design obj MAE design obj MAE design obj MAE
CEM, GNS (1-step) 0.3173 ± 0.0040 0.23293 ± 0.01007 0.3307 ± 0.0022 0.53521 ± 0.00987 0.3323 ± 0.0023 0.38632 ± 0.00737 0.3306 ± 0.0023 0.53839 ± 0.01001
CEM, GNS 0.3314 ± 0.0023 0.25325 ± 0.00369 0.3313 ± 0.0023 0.28375 ± 0.00336 0.3314 ± 0.0023 0.25325 ± 0.00369 0.3313 ± 0.0023 0.28375 ± 0.00336
Backprop, GNS (1-step) 0.2947 ± 0.0044 0.06008 ± 0.00437 0.2933 ± 0.0041 0.30416 ± 0.03387 0.3280 ± 0.0026 0.46541 ± 0.02768 0.3317 ± 0.0023 0.72814 ± 0.01783
Backprop, GNS 0.3221 ± 0.0043 0.09871 ± 0.00499 0.3195 ± 0.0042 0.15745 ± 0.00561 0.3251 ± 0.0021 0.15917 ± 0.00261 0.3299 ± 0.0022 0.21489 ± 0.00238
CinDM (ours) 0.2034 ± 0.0032 0.03928 ± 0.00161 0.2254 ± 0.0044 0.03163 ± 0.00251 0.3062 ± 0.0021 0.09241 ± 0.00210 0.3212 ± 0.0023 0.09249 ± 0.00276

Appendix C Additional baseline for time composition of the N-body task

Table 8: Compositional Generalization Across Time. Comparison to a baseline that directly diffuses 44 steps without time composition.
Methods #parameters(Million) 2-body 44 steps 4-body 44 steps
design_obj MAE design_obj MAE
Our method 20.76M 0.1326 ± 0.0087 0.00695 ± 0.00067 0.2281 ± 0.0145 0.03195 ± 0.00705
Directly diffuse 44 steps 44.92M 0.2779 ± 0.0197 0.00810 ± 0.00200 0.2986 ± 0.01481 0.05166 ± 0.01218

We also make a comparison with a simple baseline that performs diffusion over 44 steps directly without time composition. We designed this baseline to verify the effectiveness of our time-compositional approach. This baseline takes the same architecture as CinDM but with 44 time steps instead of 24 time steps, thus has almost twice of number of parameters in CinDM. The results are displayed in Table 8, which indicates that this sample baseline is outperformed by our CinDM. Its reason may be the difficulty in capturing dynamics across 44 time steps simultaneously using a single model, due to the presence of long-range dependencies. In such cases, a 24-step diffusion model proves to be more suitable. Hence, when dealing with designs that involve a larger number of time steps, employing time composition is a more effective approach, with lower cost and better performance.

Appendix D Additional details for compositional inverse design of 2D airfoils

Refer to caption
Figure 4: Diffusion model architecture of 2D inverse design.

D.1 Details for the main experiment

Refer to caption
Figure 5: Example of Lily-Pad simulation.

Dataset. We use Lily-Pad (Weymouth, 2015) as our data generator (Fig. 5). We generate 30,000 ellipse bodies and NACA airfoil boundary bodies and perform fluid simulations around each body. The bodies are sampled by randomizing location, thickness, and rotation between respective ranges. Each body is represented by 40 two-dimensional points composing its boundary. The spatial resolution is 64 ×\times× 64 and each cell is equipped with temporal pressure and velocities in both horizontal and vertical directions. Each trajectory consists of 100 times steps. To generate training trajectories, we use a sliding time window over the 100 time steps. Each time window contains state data of T=6𝑇6T=6italic_T = 6 time steps with a stride of 4. So each original trajectory amounts to 25 training trajectories, and we get 750,000 training samples in total.

Model architecture. We use U-Net (Ronneberger et al., 2015) as our backbone for denoising from a random state sampled from a prior distribution. Without considering the mini-batch size dimension, the input includes a tensor of shape (3T+3)×64×643𝑇36464(3T+3)\times 64\times 64( 3 italic_T + 3 ) × 64 × 64, which concatenates flow states (pressure, velocity of horizontal and vertical directions) of T𝑇Titalic_T time steps and the boundary mask and offsets of horizontal and vertical directions along the channel dimension, and additionally the current diffusion step s𝑠sitalic_s. The output tensor shares the same shape with the input except s𝑠sitalic_s. The model architecture is illustrated in Fig. 4. The hyperparameters in our model architecture are shown in Table 9.

Training. We utilize the MSE (mean squared error) between prediction and a Gaussian noise as the loss function during training. We take a batch size of 48 and run for 700,000 iterations. The learning rate is initialized as 1×1041superscript1041\times 10^{-4}1 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT. Training details are provided in Table 10.

Evaluation of design results. In inference, we set λ𝜆\lambdaitalic_λ in Eq. 3 as 0.0002. We find that this λ𝜆\lambdaitalic_λ could get the best design result. More discussion on the selection of λ𝜆\lambdaitalic_λ is presented in Appendix I. For each method and each airfoil design task (one airfoil or two airfoils), we conduct 10 batches of design, and each batch contains 20 examples. After we get the designed boundaries, we input them into Lily-Pad and run the simulation. To make the simulation more accurate and convincing, we use a 128×\times×128 resolution of the flow field, instead of 64×64646464\times 6464 × 64 as in the generation of training data. Then we use the calculated horizontal and vertical flow force to compute our two metrics: lift+dragliftdrag-\text{lift}+\text{drag}- lift + drag and lift-to-drag ratio. In each batch, we choose the best-designed boundary (or pair of boundaries in two airfoil scenarios) and then we report average values regarding the two metrics over 10 batches.

Table 9: Hyperparameters used in 2D diffusion model architecture.
Number of downsampling blocks 4
Number of upsampling blocks 4
Input channels 21
Number of residual blocks for each layer 2
Batch size 48
Input shape [48,21,64,64]48216464\left[48,21,64,64\right][ 48 , 21 , 64 , 64 ]
Output shape [48,21,64,64]48216464\left[48,21,64,64\right][ 48 , 21 , 64 , 64 ]
Table 10: Hyperparameters used in 2D diffusion model training.
Loss function MSE
Number of examples for training dataset 3×1063superscript1063\times 10^{6}3 × 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT
Total number of training steps 7×1057superscript1057\times 10^{5}7 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT
Batch size 48
Initial learning rate 1×1041superscript1041\times 10^{-4}1 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT
Number of training steps with a fixed learning rate 6×1056superscript1056\times 10^{5}6 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT
Learning rate adjustment strategy StepLR
Optimizer Adam
Number of saving checkpoint 700
Exponential Moving Average decay rate 0.995

D.2 Surrogate Model for Force Prediction

Model architecture. In the 2D compositional inverse design of multiple airfoils, we propose a neural surrogate model gφsubscript𝑔𝜑g_{\varphi}italic_g start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT to approximate the map** from the state Utsubscript𝑈𝑡U_{t}italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and boundary γ𝛾\gammaitalic_γ to the lift and drag forces, so that the design objective 𝒥𝒥\mathcal{J}caligraphic_J is differentiable to the design variable z=U[0,T]γ𝑧subscript𝑈0𝑇direct-sum𝛾z=U_{[0,T]}\bigoplus\gammaitalic_z = italic_U start_POSTSUBSCRIPT [ 0 , italic_T ] end_POSTSUBSCRIPT ⨁ italic_γ. The input of our model is a tensor comprising pressure, boundary mask, and offsets (both horizontal and vertical directions) of shape 4×64×64464644\times 64\times 644 × 64 × 64 for a given time step. The output is the predicted drag and lift forces of dimension 2. Boundary masks indicate the inner part (+1) and outside part (0) of a closed boundary. Offsets measure the signed deviation of the center of each square on a 64×64646464\times 6464 × 64 grid from the boundary in horizontal and vertical direction respectively, where the deviation of a given point is defined as its distance to the nearest point on a boundary. If two or more boundaries appear in a sample, the input mask (resp. offsets) is given by the summation of masks (resp. offsets) of all the boundaries. Notice that since the input boundaries are assumed not to be overlapped, the summed mask and offset are still valid. The model architecture is half of a U-Net Ronneberger et al. (2015) where we only take the down-sampling part to embed the input features to a 512-dimensional representation; then we use a linear transformation to output forces.

Dataset. We use Lily-Pad (Weymouth, 2015) to generate simulation data with 1, 2, or 3 airfoil boundaries to train and evaluate the surrogate model. Boundaries are a mixture of ellipses and NACA airfoils. We generate 10,000 trajectories for the training dataset and 1,000 trajectories for the test dataset. Each trajectory consists of 100 time steps. We use pressure as features and lift and drag forces as labels. Thus we have 3 million training samples and 300 thousand testing samples in total.

Training. We use MSE (mean squared error) loss between the ground truth and predicted forces to train the surrogate model. The optimizer is Adam (Kingma & Ba, 2014). The batch size is 128. The model is trained for 20 epochs. The learning rate starts from 1×1041superscript1041\times 10^{-4}1 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT and multiplies a factor of 0.1 every five epochs. The test error is 0.04, smaller than 5% of the average force in the training dataset.

Appendix E Visualization of N-body inverse design.

Examples of N-body design results are provided in this section. Figure 6 shows the results of using the backpropagation algorithm and CinDM to design 2-body 54-time step trajectories. The results of designing 2-body 54-time steps trajectories using CEM and CinDM are provided in Figure 7. Figure 8 are the results of designing44-time 44 time steps trajectories using CEM, backpropagation, and CinDM.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Figure 6: 54 time steps trajectories of 2 bodies after performing inverse design using the backpropagation algorithm. Figures (a), (b), (c), and (d) represent the trajectory graphs obtained using GNS, GNS (single step), U-Net, and U-Net (single step) as the forward models, respectively. And (e) is the result of CinDM. The legend of this figure is consistent with Figure 2.
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Figure 7: 54-step trajectories of 2 bodies after performing inverse design using the CEM algorithm. The trajectory graphs (a), (b), (c), and (d) depict the outcomes using different forward models such as GNS, GNS (single step), U-Net, and U-Net (single step) respectively. Additionally, figure (e) demonstrates the result generated by CinDM. The legend of this figure is consistent with Figure 2.
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Figure 8: 44-time steps trajectories of 4 bodies after performing inverse design using CEM. Figures (a), (b), (c), and (d) represent the trajectory graphs obtained using GNS, GNS (single step), U-Net, and U-Net (single step) as the forward models, respectively. And (e) is the result of CinDM. The legend of this figure is consistent with Figure 2.

Appendix F Visualization results of 2D inverse design by our CinDM

We show the compositional design results of our method in 2D airfoil generation in Figure 9.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 9: Compositional design results of our method in 2D airfoil generation. Each row represents an example. We show the heatmap of velocity in horizontal and vertical direction and pressure in the initial time step, inside which we plot the generated airfoil boundaries.

Appendix G some visualization results of 2d inverse design baseline.

We show some 2D design results of our baseline model in Fig. 10.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Figure 10: Design results of FNO with CEM in 2D airfoil generation. Each row is the heatmap of optimized velocities in the horizontal and vertical direction and optimized pressure in the initial time step, inside which we plot the generated airfoil boundaries.

Appendix H Comparison to additional works

Table 11: Comparison to NABL and cINN for N-body time composition inverse design task.
2-body 24 steps 2-body 34 steps 2-body 44 steps 2-body 54 steps
Method design obj MAE design obj MAE design obj MAE design obj MAE
NABL, U-Net (1-step) 0.1174 0.01650 0.1425 0.01511 0.1788 0.01185 0.2606 0.02042
cINN 0.3235 0.11704 0.3085 0.18015 0.3478 0.18322 0.3372 0.19296
CinDM (ours) 0.1143 0.01202 0.1251 0.00763 0.1326 0.00695 0.1533 0.00870

Besides comparison results of baselines shown in the main text, we further evaluated additional two baselines: neural adjoint method + boundary loss function (NABL) and conditional invertible neural network (cINN) method for both N-body and airfoils design experiments.

We implement NABL on top of baselines FNO and LE-PDE in the airfoil design task and U-net in tcompositionalostional taskamed as “NABL, FNO”, “NABL, LE-PDE” and “NABL, U-net” respectively. These new NABL baselines additionally use the boundary loss defined by the mean value and 95% significance radius of the training dataset. cINN does not apply to compositional design because the input scale for the invertible neural network function is fixed. Therefore, for the time composition task, we trained 4 cINN models, each for one of the time steps: 24, 34, 44, and 54. These models differ only in the input size. The input x𝑥xitalic_x to cINN is a vector of size 2×4×T24𝑇2\times 4\times T2 × 4 × italic_T, where 2 is the number of objects, 4 is the number of features and T𝑇Titalic_T is the number of time steps. The condition y𝑦yitalic_y is set to 0, the minimal distance to the target point. For cINN for 2D airfoil design, we adopt 2D coordinates of 40 boundary pointsarewhich is spanned 80-dimensionalensional vector, as the input, since the invertible constraint on the cINN model hardly accepts image-like inputs adopted in the main experiments. Therefore we evaluate cINN only in the single airfoil design task. The condition y𝑦yitalic_y is set as the minimal value of drag - lift drag in the training trajectories. In both tasks, the random variable z𝑧zitalic_z has a dimension of dim(x𝑥xitalic_x) - dim(y𝑦yitalic_y). It is drawn from a Gaussian distribution and then input to the INN for inference. We also adjust the hyperparameters, such as hidden size and a number of reversible blocks, to make the number of parameters in cINN close to ours for fair comparison.

The results of NABL and cINN are shown in Table 11 and Table 12. We can see that CinDM significantly outperforms the new baselines in both experiments. Even compared to the original baselines (who contains contain “Backprop-”) without the boundary loss function, as shown in Table 1 and Table 3, the NABL baselines in both tasks do not show the improvement in the objective for out-of-distribution data. These results show that our method generalizes to out-of-distribution while the original and new baselines struggle to generalize the out-of-distribution. CinDM also outperforms cINN by a large margin in both the time composition and airfoil design tasks. Despite the quantities, we also find that airfoil boundaries generated by cINN have little variation in shape, and the orientation is not as desired, which could incur high drag force in simulation. These results may be caused by the limitation of the model architecture of cINN, which utilizes fully connected layers as building blocks, and thus has an obvious disadvantage in capturing inductive bias of spatial-temporal features. We think it is necessary to extend cINN to convolutional networks when cINN is applied to such high-resolution design problems. However, this appears challenging when the invertible requirement is imposed. In summary, our method outperforms both NABL and cINN in both tasks. Furthermore, our method could be used for flexible compositional design. We use only one trained model to generate samples lying in a much larger state space than in training during inference, which is a unique advantage of our method beyond these baselines.

Table 12: Comparison to NABL and cINN for 2D airfoils inverse design task.
1 airfoil 2 airfoils
Method # parameters (Million) design obj \downarrow lift-to-drag ratio \uparrow design obj \downarrow lift-to-drag ratio \uparrow
NABL, FNO 3.29 0.0323 1.3169 0.3071 0.9541
NABL, LE-PDE 3.13 0.1010 1.3104 0.0891 0.9860
cINN 3.07 1.1745 0.7556 - -
CinDM (ours) 3.11 0.0797 2.177 0.1986 1.4216

Appendix I performance sensitivity to hyperparameters, initialization and sampling steps.

This section evaluate the effects of different λ𝜆\lambdaitalic_λ, initialization and sampling steps on performance of CinDM.

I.1 Influence of the hyperparameter λ𝜆\lambdaitalic_λ

Table 13: Effect of λ𝜆\lambdaitalic_λ in N-body time composition inverse design.
2-body 24 steps 2-body 34 steps 2-body 44 steps 2-body 54 steps
λ𝜆\lambdaitalic_λ design_obj MAE design_obj MAE design_obj MAE design_obj MAE
0.0001 0.3032 ± 0.0243 0.00269 ± 0.00047 0.2954 ± 0.0212 0.00413 ± 0.00155 0.3091 ± 0.0223 0.00394 ± 0.00076 0.2996 ± 0.0201 0.01046 ± 0.00859
0.001 0.2531 ± 0.0185 0.00385 ± 0.00183 0.2937 ± 0.0213 0.00336 ± 0.00115 0.2797 ± 0.0190 0.00412 ± 0.00105 0.2927 ± 0.0219 0.00521 ± 0.00103
0.01 0.1200 ± 0.0069 0.00483 ± 0.00096 0.1535 ± 0.0135 0.00435 ± 0.00100 0.1624 ± 0.0137 0.00416 ± 0.00059 0.1734 ± 0.0154 0.00658 ± 0.00267
0.1 0.1201 ± 0.0046 0.01173 ± 0.00150 0.1340 ± 0.0107 0.00772 ± 0.00099 0.1379 ± 0.0088 0.00816 ± 0.00149 0.1662 ± 0.0180 0.01141 ± 0.00473
0.2 0.1283 ± 0.0141 0.01313 ± 0.00312 0.1392 ± 0.0119 0.00836 ± 0.00216 0.1529 ± 0.0130 0.01019 ± 0.00584 0.1513 ± 0.0131 0.00801 ± 0.00172
0.4 0.1172 ± 0.0084 0.01500 ± 0.00207 0.1385 ± 0.0145 0.00948 ± 0.00293 0.1402 ± 0.0113 0.00763 ± 0.00112 0.1663 ± 0.0126 0.00850 ± 0.00124
0.6 0.1259 ± 0.0100 0.01382 ± 0.00115 0.1326 ± 0.0126 0.01171 ± 0.00595 0.1592 ± 0.0151 0.01140 ± 0.00355 0.1670 ± 0.0177 0.00991 ± 0.00287
0.8 0.1217 ± 0.0073 0.01596 ± 0.00127 0.1385 ± 0.0120 0.01095 ± 0.00337 0.1573 ± 0.0116 0.00893 ± 0.00113 0.1715 ± 0.0181 0.01026 ± 0.00239
1 0.1330 ± 0.0063 0.01679 ± 0.00139 0.1428 ± 0.0112 0.01087 ± 0.00149 0.1634 ± 0.0119 0.00968 ± 0.00079 0.1789 ± 0.0164 0.01102 ± 0.00185
2 0.1513 ± 0.0079 0.02654 ± 0.00160 0.1795 ± 0.0129 0.01765 ± 0.00193 0.1779 ± 0.0121 0.01707 ± 0.00474 0.2113 ± 0.0161 0.01447 ± 0.00130
10 0.2821 ± 0.0197 0.21153 ± 0.01037 0.2210 ± 0.0149 0.09715 ± 0.00236 0.2273 ± 0.0133 0.07781 ± 0.00232 0.2269 ± 0.0175 0.06538 ± 0.00210
Table 14: Effect of λ𝜆\lambdaitalic_λ in 2D inverse design.
λ𝜆\lambdaitalic_λ obj lift/drag
0.05 0.7628±0.1892 1.015±0.2008
0.02 0.3849±0.0632 1.0794±0.1165
0.01 0.2292±0.0408 1.286±0.1402
0.005 0.2061±0.0388 1.2378±0.1414
0.002 0.217±0.0427 1.2429±0.1243
0.001 0.2277±0.0451 1.2608±0.1469
0.0005 0.2465±0.0473 1.4102±0.1771
0.0002 0.1986±0.0431 1.4216±0.1607
0.0001 0.271±0.0577 1.1962±0.1284
Refer to caption
Figure 11: Design objective of different λ𝜆\lambdaitalic_λ in N-body time composition inverse design.
Refer to caption
Figure 12: MAE of different λ𝜆\lambdaitalic_λ in N-body time composition inverse design.
Refer to caption
Figure 13: Performance of different λ𝜆\lambdaitalic_λ in 2D airfoil inverse design.

To evaluate influence of the hyperparameter λ𝜆\lambdaitalic_λ in Eq. 3, we perform inference in both N-body time composition and 2D airfoils design task for a wide range of λ𝜆\lambdaitalic_λ. The results are shown in Table 13, Table 14, Fig 11, Fig 12, and Fig 13, where Table 13 corresponds to Fig 11 and Fig 12 while Table Table 14 corresponds to Fig 13. Our method demonstrates robustness and consistent performance across a wide range of lambda values. However, if λ𝜆\lambdaitalic_λ is set too small (\leq0.0001 in the 2D airfoil task, or \leq0.01 in the N-body task), the design results will be subpar because there is minimal objective guidance incorporated. On the other hand, if λ𝜆\lambdaitalic_λ is set too large (\geq 0.01 in the 2D airfoil task, or \geq 1.0 in the N-body task), there is a higher likelihood of entering a poor likelihood region, and the preservation of physical consistency is compromised. In practical terms, λ𝜆\lambdaitalic_λ can be set between 0.01 and 1.0 for the N-body task, and between 0.0002 and 0.02 for the 2D airfoil task. In our paper, we choose based on the best evaluation performance, namely we set as 0.4 for the N-body task and 0.0002 for the 2D airfoils task.

I.2 Influence of initialization

Refer to caption
Figure 14: “Re-simulation” error rBsubscript𝑟𝐵r_{B}italic_r start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT of different B𝐵Bitalic_B in N-body inverse design.
Refer to caption
Figure 15: Design performance (lift-to-drag) of different B𝐵Bitalic_B in 2D airfoil inverse design.
Table 15: Influence of initialization. rBsubscript𝑟𝐵r_{B}italic_r start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT with respect to B𝐵Bitalic_B for N-body inverse design task. Each number is an average over 10 batches.
B𝐵Bitalic_B 2-body 24 steps 2-body 34 steps 2-body 44 steps 2-body 54 steps
10 0.10122654 0.1022556 0.10542078 0.11227837
20 0.10051114 0.10122902 0.10261874 0.10554917
30 0.09950846 0.10106587 0.10220513 0.10408381
40 0.09928784 0.10066015 0.10173534 0.10409425
50 0.09794939 0.10023642 0.10168899 0.10462530
60 0.09876589 0.09997466 0.10105932 0.10257294
70 0.09858151 0.09979441 0.10124100 0.10179855
80 0.09809845 0.09972977 0.10060663 0.10203485
90 0.09808731 0.09941968 0.10108861 0.10120515
100 0.09734109 0.09912691 0.10056177 0.10135190

Table 16: Influence of initialization. Design performance (lift-to-drag) with respect to B𝐵Bitalic_B for 2D inverse design task. Each number is an average over 10 batches.

B𝐵Bitalic_B 1 airfoil 2 airfoils
10 1.4505 0.8246
20 2.2725 0.7178
30 2.2049 1.3862
40 2.6506 1.5781
50 2.1355 1.6055

To analyze the sensitivity of initialization in our approach, we follow a similar methodology discussed in Ren et al. (2020). We consider the “re-simulation” error r𝑟ritalic_r of a target objective y𝑦yitalic_y as a function of the number of samplings B𝐵Bitalic_B, where each sampling starts from a Gaussian initialization z𝑧zitalic_z. We use the simulator to obtain the output y^^𝑦\hat{y}over^ start_ARG italic_y end_ARG for each design x𝑥xitalic_x from the B𝐵Bitalic_B design results given the target y𝑦yitalic_y and compute the “re-simulation” error L(y^,y)𝐿^𝑦𝑦L(\hat{y},y)italic_L ( over^ start_ARG italic_y end_ARG , italic_y ). We then calculate the least error among a batch of B𝐵Bitalic_B design results. This process is repeated for several batches, and the mean least error rBsubscript𝑟𝐵r_{B}italic_r start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT is obtained by averaging over these batches.

Table 15 and Fig 14 present the results for the N-body inverse design task. We consider values of B𝐵Bitalic_B ranging from 10 to 100, with N=10𝑁10N=10italic_N = 10 batches. The target y𝑦yitalic_y is set to be 0, which represents the distance to a fixed target point. The results show that rBsubscript𝑟𝐵r_{B}italic_r start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT gradually decreases as B𝐵Bitalic_B increases in the 24-step design, indicating that the design space is well explored and most solutions can be retrieved even with a small number of samplings. This demonstrates the efficiency of our method in generating designs. Moreover, similar observations can be made when time composition is performed in 34, 44, and 54 steps, indicating the effectiveness of our time composition approach in capturing long-time range physical dependencies and enabling efficient generation in a larger design space.

In the 2D inverse design task, the target y𝑦yitalic_y is slightly different. Here, we aim to minimize the model output (drag - lift force). Hence, we adopt the “re-simulation” performance metric, which is the lift/drag ratio, as opposed to the “re-simulation” error used in the N-body task, to evaluate sensitivity to initialization. For each B𝐵Bitalic_B, the lift/drag ratio is chosen as the highest value among the simulation results of a batch of B𝐵Bitalic_B designed boundaries (or boundary pairs for the 2 airfoils design). Any invalid design results, such as overlap** airfoil pairs in the 2-airfoil design, are removed from the B𝐵Bitalic_B results before computing the maximal lift/drag ratio. The reported numbers are obtained by averaging over N=10𝑁10N=10italic_N = 10 batches for each B𝐵Bitalic_B.

Table 16 and Fig 15 present the results for the 2D airfoils design task. In the 1 airfoil design column, we observe that the lift/drag ratio is relatively low for B=10𝐵10B=10italic_B = 10, indicating that the design space is not sufficiently explored due to its high dimensionality (64×64×36464364\times 64\times 364 × 64 × 3 in our boundary mask and offsets representation). For B20𝐵20B\geq 20italic_B ≥ 20, the lift/drag performance remains steady. In the 2 airfoils design column, the lift/drag ratio increases roughly with B𝐵Bitalic_B. This is attributed to the higher dimensional and more complex design space compared to the single airfoil design task. The stringent constraints on boundary pairs, such as non-overlap**, lead to the presence of complex infeasible regions in the design space. Random initialization may lead to these infeasible regions, resulting in invalid design results. The rate of increase in lift/drag ratio becomes slower when B30𝐵30B\geq 30italic_B ≥ 30, indicating that a majority of solutions have been explored. Despite the training data only containing a single airfoil boundary, which lies in a relatively lower dimensional and simpler design space, our model demonstrates a strong ability to generalize and efficiently generate designs for this challenging 2 body compositional design problem.

I.3 Influence of the number of sampling steps in inference

Refer to caption
Figure 16: Design objective of different sampling steps in N-body inverse design.
Refer to caption
Figure 17: MAE of different sampling steps in N-body inverse design.

Fig 16 and Fig 17 illustrate the outcomes of inverse design carried out by CinDM. It is apparent that with an increase in the number of sampling time steps, the design objective gradually decreases. In contrast, the MAE fluctuates within a small range, occasionally rising. This phenomenon can be examined as follows: as the number of sampling steps increases, the participation of the design objective in the diffusion process intensifies. As a result, the designs improve and align more closely with the design objective, ultimately leading to a decrease in the design objective. However, when the number of sampling steps increases, the MAE also increases. This is because, with a small number of sampling steps, the initial velocities of some designed samples are very small, causing the diffusion of trajectories to be concentrated within a narrow range. Consequently, both the true trajectory and the diffused trajectory are highly concentrated, resulting in a small calculated MAE. By analyzing the sensitivity of the design objective and MAE to different sampling steps, we can conclude that CInDM can achieve desired design results that align with design objectives and physical constraints by appropriately selecting a sampling step size during the inverse design process.

Appendix J Broader impacts and limitations

Our method, CinDM, extends the scope of design exploration and enables efficient design and control of complex systems. Its application across various scientific and engineering fields has profound implications. In materials science, utilizing the diffusion model for inverse design facilitates the customization of material microstructures and properties. In biomedicine, it enables the structural design of drug molecular systems and optimizes production processes. Furthermore, in the aerospace sector, integrating the diffusion model with inverse design can lead to the development of more diverse shapes and structures, thereby significantly enhancing design efficiency and quality.

CinDM combines the advantages of diffusion models, allowing us to generate more diverse and sophisticated design samples. However, some limitations need to be addressed at present. In terms of design quality and exploration space, we need to strike a balance between different objectives to avoid getting stuck in local optima, especially when dealing with complex, nonlinear systems in the real world. We also need to ensure that the designed samples adhere to complex multi-scale physical constraints. Furthermore, achieving interpretability in the samples designed by deep learning models is challenging for inverse design applications. From a cost perspective, training diffusion models requires large datasets and intensive computational resources. The complexity of calculations also hinders the speed of our model design.

Moving forward, we intend to incorporate more physical prior knowledge into the model, leverage multi-modal data for training, employ more efficient sampling methods to enhance training efficiency, improve interpretability, and generalize the model to multiple scales.