Search | arXiv e-print repository

Towards Enhanced Controllability of Diffusion Models

Authors: Wonwoong Cho, Hareesh Ravi, Midhun Harikumar, Vinh Khuc, Krishna Kumar Singh, **gwan Lu, David I. Inouye, A**kya Kale

Abstract: Denoising Diffusion models have shown remarkable capabilities in generating realistic, high-quality and diverse images. However, the extent of controllability during generation is underexplored. Inspired by techniques based on GAN latent space for image manipulation, we train a diffusion model conditioned on two latent codes, a spatial content mask and a flattened style embedding. We rely on the i… ▽ More Denoising Diffusion models have shown remarkable capabilities in generating realistic, high-quality and diverse images. However, the extent of controllability during generation is underexplored. Inspired by techniques based on GAN latent space for image manipulation, we train a diffusion model conditioned on two latent codes, a spatial content mask and a flattened style embedding. We rely on the inductive bias of the progressive denoising process of diffusion models to encode pose/layout information in the spatial structure mask and semantic/style information in the style code. We propose two generic sampling techniques for improving controllability. We extend composable diffusion models to allow for some dependence between conditional inputs, to improve the quality of generations while also providing control over the amount of guidance from each latent code and their joint distribution. We also propose timestep dependent weight scheduling for content and style latents to further improve the translations. We observe better controllability compared to existing methods and show that without explicit training objectives, diffusion models can be used for effective image manipulation and image translation. △ Less

Submitted 15 March, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

Comments: 28 pages, 28 figures

arXiv:2302.11710 [pdf, other]

Controlled and Conditional Text to Image Generation with Diffusion Prior

Authors: Pranav Aggarwal, Hareesh Ravi, Naveen Marri, Sachin Kelkar, Fengbin Chen, Vinh Khuc, Midhun Harikumar, Ritiz Tambi, Sudharshan Reddy Kakumanu, Purvak Lapsiya, Alvin Ghouas, Sarah Saber, Malavika Ramprasad, Baldo Faieta, A**kya Kale

Abstract: Denoising Diffusion models have shown remarkable performance in generating diverse, high quality images from text. Numerous techniques have been proposed on top of or in alignment with models like Stable Diffusion and Imagen that generate images directly from text. A lesser explored approach is DALLE-2's two step process comprising a Diffusion Prior that generates a CLIP image embedding from text… ▽ More Denoising Diffusion models have shown remarkable performance in generating diverse, high quality images from text. Numerous techniques have been proposed on top of or in alignment with models like Stable Diffusion and Imagen that generate images directly from text. A lesser explored approach is DALLE-2's two step process comprising a Diffusion Prior that generates a CLIP image embedding from text and a Diffusion Decoder that generates an image from a CLIP image embedding. We explore the capabilities of the Diffusion Prior and the advantages of an intermediate CLIP representation. We observe that Diffusion Prior can be used in a memory and compute efficient way to constrain the generation to a specific domain without altering the larger Diffusion Decoder. Moreover, we show that the Diffusion Prior can be trained with additional conditional information such as color histogram to further control the generation. We show quantitatively and qualitatively that the proposed approaches perform better than prompt engineering for domain specific generation and existing baselines for color conditioned generation. We believe that our observations and results will instigate further research into the diffusion prior and uncover more of its capabilities. △ Less

Submitted 1 August, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

arXiv:2302.07979 [pdf, other]

PRedItOR: Text Guided Image Editing with Diffusion Prior

Authors: Hareesh Ravi, Sachin Kelkar, Midhun Harikumar, A**kya Kale

Abstract: Diffusion models have shown remarkable capabilities in generating high quality and creative images conditioned on text. An interesting application of such models is structure preserving text guided image editing. Existing approaches rely on text conditioned diffusion models such as Stable Diffusion or Imagen and require compute intensive optimization of text embeddings or fine-tuning the model wei… ▽ More Diffusion models have shown remarkable capabilities in generating high quality and creative images conditioned on text. An interesting application of such models is structure preserving text guided image editing. Existing approaches rely on text conditioned diffusion models such as Stable Diffusion or Imagen and require compute intensive optimization of text embeddings or fine-tuning the model weights for text guided image editing. We explore text guided image editing with a Hybrid Diffusion Model (HDM) architecture similar to DALLE-2. Our architecture consists of a diffusion prior model that generates CLIP image embedding conditioned on a text prompt and a custom Latent Diffusion Model trained to generate images conditioned on CLIP image embedding. We discover that the diffusion prior model can be used to perform text guided conceptual edits on the CLIP image embedding space without any finetuning or optimization. We combine this with structure preserving edits on the image decoder using existing approaches such as reverse DDIM to perform text guided image editing. Our approach, PRedItOR does not require additional inputs, fine-tuning, optimization or objectives and shows on par or better results than baselines qualitatively and quantitatively. We provide further analysis and understanding of the diffusion prior model and believe this opens up new possibilities in diffusion models research. △ Less

Submitted 20 March, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

arXiv:2109.11047 [pdf, other]

Cross-Modal Coherence for Text-to-Image Retrieval

Authors: Malihe Alikhani, Fangda Han, Hareesh Ravi, Mubbasir Kapadia, Vladimir Pavlovic, Matthew Stone

Abstract: Common image-text joint understanding techniques presume that images and the associated text can universally be characterized by a single implicit model. However, co-occurring images and text can be related in qualitatively different ways, and explicitly modeling it could improve the performance of current joint understanding models. In this paper, we train a Cross-Modal Coherence Modelfor text-to… ▽ More Common image-text joint understanding techniques presume that images and the associated text can universally be characterized by a single implicit model. However, co-occurring images and text can be related in qualitatively different ways, and explicitly modeling it could improve the performance of current joint understanding models. In this paper, we train a Cross-Modal Coherence Modelfor text-to-image retrieval task. Our analysis shows that models trained with image--text coherence relations can retrieve images originally paired with target text more often than coherence-agnostic models. We also show via human evaluation that images retrieved by the proposed coherence-aware model are preferred over a coherence-agnostic baseline by a huge margin. Our findings provide insights into the ways that different modalities communicate and the role of coherence relations in capturing commonsense inferences in text and imagery. △ Less

Submitted 15 April, 2022; v1 submitted 22 September, 2021; originally announced September 2021.

Comments: This paper is published in AAAI-2022

arXiv:2010.04366 [pdf, other]

GitEvolve: Predicting the Evolution of GitHub Repositories

Authors: Honglu Zhou, Hareesh Ravi, Carlos M. Muniz, Vahid Azizi, Linda Ness, Gerard de Melo, Mubbasir Kapadia

Abstract: Software development is becoming increasingly open and collaborative with the advent of platforms such as GitHub. Given its crucial role, there is a need to better understand and model the dynamics of GitHub as a social platform. Previous work has mostly considered the dynamics of traditional social networking sites like Twitter and Facebook. We propose GitEvolve, a system to predict the evolution… ▽ More Software development is becoming increasingly open and collaborative with the advent of platforms such as GitHub. Given its crucial role, there is a need to better understand and model the dynamics of GitHub as a social platform. Previous work has mostly considered the dynamics of traditional social networking sites like Twitter and Facebook. We propose GitEvolve, a system to predict the evolution of GitHub repositories and the different ways by which users interact with them. To this end, we develop an end-to-end multi-task sequential deep neural network that given some seed events, simultaneously predicts which user-group is next going to interact with a given repository, what the type of the interaction is, and when it happens. To facilitate learning, we use graph based representation learning to encode relationship between repositories. We map users to groups by modelling common interests to better predict popularity and to generalize to unseen users during inference. We introduce an artificial event type to better model varying levels of activity of repositories in the dataset. The proposed multi-task architecture is generic and can be extended to model information diffusion in other social networks. In a series of experiments, we demonstrate the effectiveness of the proposed model, using multiple metrics and baselines. Qualitative analysis of the model's ability to predict popularity and forecast trends proves its applicability. △ Less

Submitted 9 October, 2020; originally announced October 2020.

arXiv:1706.04403 [pdf, ps, other]

Finding the number density of atomic vapor by studying its absorption profile

Authors: Harish Ravi, Mangesh Bhattarai, Vasant Natarajan

Abstract: We demonstrate a technique for obtaining the density of atomic vapor, by doing a fit of the resonant absorption spectrum to a density-matrix model. In order to demonstrate the usefulness of the technique, we apply it to absorption in the ${\rm D_2}$ line of a Cs vapor cell at room temperature. The lineshape of the spectrum is asymmetric due to the role of open transitions. This asymmetry is explai… ▽ More We demonstrate a technique for obtaining the density of atomic vapor, by doing a fit of the resonant absorption spectrum to a density-matrix model. In order to demonstrate the usefulness of the technique, we apply it to absorption in the ${\rm D_2}$ line of a Cs vapor cell at room temperature. The lineshape of the spectrum is asymmetric due to the role of open transitions. This asymmetry is explained in the model using transit-time relaxation as the atoms traverse the laser beam. We also obtain the latent heat of evaporation by studying the number density as a function of temperature close to room temperature. △ Less

Submitted 14 June, 2017; originally announced June 2017.

Comments: 9 pages, 6 figures; Accepted

arXiv:1510.06058 [pdf, ps, other]

doi 10.1209/0295-5075/117/63002

Polarization dependent tuning of the Hanle effect in the ground state of Cs

Authors: Harish Ravi, Mangesh Bhattarai, Vineet Bharti, Vasant Natarajan

Abstract: We demonstrate that the Hanle effect can be tuned between magnetically induced absorption (MIA) and magnetically induced transmission (MIT) simply by changing the polarization of the input laser beam. The experiments are done using closed hyperfine transitions of the $ \rm D_2 $ line of ${\rm ^{133}Cs}$ ---$ F_g = 3 \rightarrow F_e = 2 $ and $ F_g =4 \rightarrow F_e = 5 $. The former shows a trans… ▽ More We demonstrate that the Hanle effect can be tuned between magnetically induced absorption (MIA) and magnetically induced transmission (MIT) simply by changing the polarization of the input laser beam. The experiments are done using closed hyperfine transitions of the $ \rm D_2 $ line of ${\rm ^{133}Cs}$ ---$ F_g = 3 \rightarrow F_e = 2 $ and $ F_g =4 \rightarrow F_e = 5 $. The former shows a transformation from MIT to MIA, while the latter shows the opposite behavior. A qualitative explanation based on optical pum** and coherences among the magnetic sublevels of the ground state is borne out by a detailed density-matrix calculation. To increase the coherence time, the experiments are done in a Cs vapor cell with paraffin coating on the walls. The observed linewidth is extremely narrow ($\sim 0.1$ mG) compared to previous work in this area, making this a promising technique for all kinds of precision measurements. △ Less

Submitted 29 June, 2017; v1 submitted 13 October, 2015; originally announced October 2015.

Comments: 17 pages, 10 figures

Journal ref: Europhysics Letters, volume 117, article 63002 (7 pages) (2017)

arXiv:1510.03683 [pdf, ps, other]

Measuring the linewidth of a stabilized diode laser

Authors: Lal Muanzuala, Harish Ravi, Karthik Sylvan, Vasant Natarajan

Abstract: We demonstrate a straight-forward technique to measure the linewidth of a grating-stabilized diode laser system---known as an external cavity diode laser (ECDL)---by beating the output of two independent ECDLs in a Michelson interferometer, and then taking the Fourier transform of the beat signal. The measured linewidth is the sum of the linewidths of the two laser systems. Assuming that the two a… ▽ More We demonstrate a straight-forward technique to measure the linewidth of a grating-stabilized diode laser system---known as an external cavity diode laser (ECDL)---by beating the output of two independent ECDLs in a Michelson interferometer, and then taking the Fourier transform of the beat signal. The measured linewidth is the sum of the linewidths of the two laser systems. Assuming that the two are equal, we find that the linewidth of each ECDL measured over a time period of 2 \textmu s is about 0.3 MHz. This narrow linewidth shows the advantage of using such systems for high-resolution spectroscopy and other experiments in atomic physics. △ Less

Submitted 3 September, 2015; originally announced October 2015.

Comments: 4 pages, 4 figures

Journal ref: Current Science, Vol. 109, No. 4, 25 August 2015, pp 765--767

arXiv:1501.01624 [pdf, ps, other]

Permanent EDM measurement in Cs using nonlinear magneto-optic rotation

Authors: Harish Ravi, Mangesh Bhattarai, Abhilash Y D, Ummal Momeen, Vasant Natarajan

Abstract: We use the technique of chopped nonlinear magneto-optic rotation (NMOR) in a room temperature $^{133}$Cs vapor cell to measure the permanent electric dipole moment (EDM) in the atom. The cell has paraffin coating on the walls to increase the relaxation time. The signature of the EDM is a shift in the Larmor precession frequency which is correlated with the application of an E field. We analyze err… ▽ More We use the technique of chopped nonlinear magneto-optic rotation (NMOR) in a room temperature $^{133}$Cs vapor cell to measure the permanent electric dipole moment (EDM) in the atom. The cell has paraffin coating on the walls to increase the relaxation time. The signature of the EDM is a shift in the Larmor precession frequency which is correlated with the application of an E field. We analyze errors in the technique, and show that the main source of systematic error is the appearance of a longitudinal B field when the E field is applied. This error can be eliminated by doing measurements on the two ground hyperfine levels. Using an E field of 2.6 kV/cm, we place an upper limit on the electron EDM of $ 2.9 \times 10^{-22} $ e-cm ($95 \%$ confidence). This limit can be increased by 7 orders-of-magnitude---and brought below the current best experimental value---with easily implementable improvements to the technique. △ Less

Submitted 7 November, 2016; v1 submitted 7 January, 2015; originally announced January 2015.

Comments: 11 pages, 3 figures

Journal ref: Asian Journal of Physics, volume 25, issue 9, no page numbers, year 2016

arXiv:1308.2265 [pdf, other]

Measurement of the electronic thermal conductance channels and heat capacity of graphene at low temperature

Authors: K. C. Fong, Emma Wollman, Harish Ravi, Wei Chen, Aash Clerk, M. D. Shaw, H. G. LeDuc, K. C. Schwab

Abstract: The ability to transport energy is a fundamental property of the two-dimensional Dirac fermions in graphene. Electronic thermal transport in this system is relatively unexplored and is expected to show unique fundamental properties and to play an important role in future applications of graphene, including opto-electronics, plasmonics, and ultra-sensitive bolometry. Here we present measurements of… ▽ More The ability to transport energy is a fundamental property of the two-dimensional Dirac fermions in graphene. Electronic thermal transport in this system is relatively unexplored and is expected to show unique fundamental properties and to play an important role in future applications of graphene, including opto-electronics, plasmonics, and ultra-sensitive bolometry. Here we present measurements of bipolar, electron-diffusion and electron-phonon thermal conductances, and infer the electronic specific heat, with a minimum value of 10 $k_{\rm{B}}$ ($10^{-22}$ JK$^{-1}$) per square micron. We test the validity of the Wiedemann-Franz law and find the Lorenz number equals $1.32\times(π^2/3)(k_{\rm{B}}/e)^2$. The electron-phonon thermal conductance has a temperature power law $T^2$ at high do** levels, and the coupling parameter is consistent with recent theory, indicating its enhancement by impurity scattering. We demonstrate control of the thermal conductance by electrical gating and by suppressing the diffusion channel using superconducting electrodes, which sets the stage for future graphene-based single microwave photon detection. △ Less

Submitted 9 August, 2013; originally announced August 2013.

Showing 1–10 of 10 results for author: Ravi, H