-
StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images
Authors:
Rushikesh Zawar,
Shaurya Dewan,
Andrew F. Luo,
Margaret M. Henderson,
Michael J. Tarr,
Leila Wehbe
Abstract:
Understanding the semantics of visual scenes is a fundamental challenge in Computer Vision. A key aspect of this challenge is that objects sharing similar semantic meanings or functions can exhibit striking visual differences, making accurate identification and categorization difficult. Recent advancements in text-to-image frameworks have led to models that implicitly capture natural scene statist…
▽ More
Understanding the semantics of visual scenes is a fundamental challenge in Computer Vision. A key aspect of this challenge is that objects sharing similar semantic meanings or functions can exhibit striking visual differences, making accurate identification and categorization difficult. Recent advancements in text-to-image frameworks have led to models that implicitly capture natural scene statistics. These frameworks account for the visual variability of objects, as well as complex object co-occurrences and sources of noise such as diverse lighting conditions. By leveraging large-scale datasets and cross-attention conditioning, these models generate detailed and contextually rich scene representations. This capability opens new avenues for improving object recognition and scene understanding in varied and challenging environments. Our work presents StableSemantics, a dataset comprising 224 thousand human-curated prompts, processed natural language captions, over 2 million synthetic images, and 10 million attention maps corresponding to individual noun chunks. We explicitly leverage human-generated prompts that correspond to visually interesting stable diffusion generations, provide 10 generations per phrase, and extract cross-attention maps for each image. We explore the semantic distribution of generated images, examine the distribution of objects within images, and benchmark captioning and open vocabulary segmentation methods on our data. To the best of our knowledge, we are the first to release a diffusion dataset with semantic attributions. We expect our proposed dataset to catalyze advances in visual semantic understanding and provide a foundation for develo** more sophisticated and effective visual models. Website: https://stablesemantics.github.io/StableSemantics
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
DiffusionPID: Interpreting Diffusion via Partial Information Decomposition
Authors:
Shaurya Dewan,
Rushikesh Zawar,
Prakanshul Saxena,
Yingshan Chang,
Andrew Luo,
Yonatan Bisk
Abstract:
Text-to-image diffusion models have made significant progress in generating naturalistic images from textual inputs, and demonstrate the capacity to learn and represent complex visual-semantic relationships. While these diffusion models have achieved remarkable success, the underlying mechanisms driving their performance are not yet fully accounted for, with many unanswered questions surrounding w…
▽ More
Text-to-image diffusion models have made significant progress in generating naturalistic images from textual inputs, and demonstrate the capacity to learn and represent complex visual-semantic relationships. While these diffusion models have achieved remarkable success, the underlying mechanisms driving their performance are not yet fully accounted for, with many unanswered questions surrounding what they learn, how they represent visual-semantic relationships, and why they sometimes fail to generalize. Our work presents Diffusion Partial Information Decomposition (DiffusionPID), a novel technique that applies information-theoretic principles to decompose the input text prompt into its elementary components, enabling a detailed examination of how individual tokens and their interactions shape the generated image. We introduce a formal approach to analyze the uniqueness, redundancy, and synergy terms by applying PID to the denoising model at both the image and pixel level. This approach enables us to characterize how individual tokens and their interactions affect the model output. We first present a fine-grained analysis of characteristics utilized by the model to uniquely localize specific concepts, we then apply our approach in bias analysis and show it can recover gender and ethnicity biases. Finally, we use our method to visually characterize word ambiguity and similarity from the model's perspective and illustrate the efficacy of our method for prompt intervention. Our results show that PID is a potent tool for evaluating and diagnosing text-to-image diffusion models.
△ Less
Submitted 12 June, 2024; v1 submitted 7 June, 2024;
originally announced June 2024.
-
Curiosity & Entropy Driven Unsupervised RL in Multiple Environments
Authors:
Shaurya Dewan,
Anisha Jain,
Zoe LaLena,
Lifan Yu
Abstract:
The authors of 'Unsupervised Reinforcement Learning in Multiple environments' propose a method, alpha-MEPOL, to tackle unsupervised RL across multiple environments. They pre-train a task-agnostic exploration policy using interactions from an entire environment class and then fine-tune this policy for various tasks using supervision. We expanded upon this work, with the goal of improving performanc…
▽ More
The authors of 'Unsupervised Reinforcement Learning in Multiple environments' propose a method, alpha-MEPOL, to tackle unsupervised RL across multiple environments. They pre-train a task-agnostic exploration policy using interactions from an entire environment class and then fine-tune this policy for various tasks using supervision. We expanded upon this work, with the goal of improving performance. We primarily propose and experiment with five new modifications to the original work: sampling trajectories using an entropy-based probability distribution, dynamic alpha, higher KL Divergence threshold, curiosity-driven exploration, and alpha-percentile sampling on curiosity. Dynamic alpha and higher KL-Divergence threshold both provided a significant improvement over the baseline from the earlier work. PDF-sampling failed to provide any improvement due to it being approximately equivalent to the baseline method when the sample space is small. In high-dimensional environments, the addition of curiosity-driven exploration enhances learning by encouraging the agent to seek diverse experiences and explore the unknown more. However, its benefits are limited in low-dimensional and simpler environments where exploration possibilities are constrained and there is little that is truly unknown to the agent. Overall, some of our experiments did boost performance over the baseline and there are a few directions that seem promising for further research.
△ Less
Submitted 8 January, 2024;
originally announced January 2024.
-
Canonical Fields: Self-Supervised Learning of Pose-Canonicalized Neural Fields
Authors:
Rohith Agaram,
Shaurya Dewan,
Rahul Sajnani,
Adrien Poulenard,
Madhava Krishna,
Srinath Sridhar
Abstract:
Coordinate-based implicit neural networks, or neural fields, have emerged as useful representations of shape and appearance in 3D computer vision. Despite advances, however, it remains challenging to build neural fields for categories of objects without datasets like ShapeNet that provide "canonicalized" object instances that are consistently aligned for their 3D position and orientation (pose). W…
▽ More
Coordinate-based implicit neural networks, or neural fields, have emerged as useful representations of shape and appearance in 3D computer vision. Despite advances, however, it remains challenging to build neural fields for categories of objects without datasets like ShapeNet that provide "canonicalized" object instances that are consistently aligned for their 3D position and orientation (pose). We present Canonical Field Network (CaFi-Net), a self-supervised method to canonicalize the 3D pose of instances from an object category represented as neural fields, specifically neural radiance fields (NeRFs). CaFi-Net directly learns from continuous and noisy radiance fields using a Siamese network architecture that is designed to extract equivariant field features for category-level canonicalization. During inference, our method takes pre-trained neural radiance fields of novel object instances at arbitrary 3D pose and estimates a canonical field with consistent 3D pose across the entire category. Extensive experiments on a new dataset of 1300 NeRF models across 13 object categories show that our method matches or exceeds the performance of 3D point cloud-based methods.
△ Less
Submitted 17 May, 2023; v1 submitted 5 December, 2022;
originally announced December 2022.
-
GC Insights: Space sector careers resources in the UK need a greater diversity of roles
Authors:
Martin O. Archer,
Cara L. Water,
Shafiat Dewan,
Simon Foster,
Antonio Portas
Abstract:
Educational research highlights that improved careers education is needed to increase participation in science, technology, engineering, and mathematics (STEM). Current UK careers resources concerning the space sector, however, are found to perhaps not best reflect the diversity of roles present and may in fact perpetuate misconceptions about the usefulness of science. We, therefore, compile a mor…
▽ More
Educational research highlights that improved careers education is needed to increase participation in science, technology, engineering, and mathematics (STEM). Current UK careers resources concerning the space sector, however, are found to perhaps not best reflect the diversity of roles present and may in fact perpetuate misconceptions about the usefulness of science. We, therefore, compile a more diverse set of space-related jobs, which will be used in the development of a new space careers resource.
△ Less
Submitted 29 April, 2022;
originally announced April 2022.
-
High-Temperature Photocurrent Mechanism of \b{eta}-Ga2O3 Based MSM Solar-Blind Photodetectors
Authors:
B. R. Tak,
Manjari Garg,
Sheetal Dewan,
Carlos G. Torres-Castanedo,
Kuang-Hui Li,
Vinay Gupta,
Xiaohang Li,
R. Singh
Abstract:
High-temperature operation of metal-semiconductor-metal (MSM) UV photodetectors fabricated on pulsed laser deposited \b{eta}-Ga2O3 thin films has been investigated. These photodetectors were operated up to 250 °C temperature under 255 nm illumination. The photo current to dark current (PDCR) ratio of about 7100 was observed at room temperature (RT) while it had a value 2.3 at 250 °C at 10 V applie…
▽ More
High-temperature operation of metal-semiconductor-metal (MSM) UV photodetectors fabricated on pulsed laser deposited \b{eta}-Ga2O3 thin films has been investigated. These photodetectors were operated up to 250 °C temperature under 255 nm illumination. The photo current to dark current (PDCR) ratio of about 7100 was observed at room temperature (RT) while it had a value 2.3 at 250 °C at 10 V applied bias. A decline in photocurrent was observed from RT to 150 °C and then it increased with temperature up to 250 °C. The suppression of the blue band was also observed from 150 °C temperature which indicated that self-trapped holes in Ga2O3 became unstable. Temperature-dependent rise and decay times of carriers were analyzed to understand the photocurrent mechanism and persistence photocurrent at high temperatures. Coupled electron-phonon interaction with holes was found to influence the photoresponse in the devices. The obtained results are encouraging and significant for high-temperature applications of \b{eta}-Ga2O3 MSM deep UV photodetectors.
△ Less
Submitted 18 December, 2018;
originally announced December 2018.
-
Chiral standing waves and its trap** force on chiral particles
Authors:
Tianhang Zhang,
M. R. C. Mahdy,
Shadman Sajid Dewan,
Md. Nayem Hossain,
Hamim Mahmud Rivy,
Nabila Masud,
Ziaur Rahman Jony
Abstract:
Up to now, in the literature of optical manipulation, optical force due to chirality usually coexists with the non-chiral force and the chiral force usually takes a very small portion of the total force. In this work, we investigate a case where the optical force exerted on an object is purely due to the chirality while there is zero force on non-chiral object. We find that a trap** force arises…
▽ More
Up to now, in the literature of optical manipulation, optical force due to chirality usually coexists with the non-chiral force and the chiral force usually takes a very small portion of the total force. In this work, we investigate a case where the optical force exerted on an object is purely due to the chirality while there is zero force on non-chiral object. We find that a trap** force arises on chiral particles when it is placed in a field consisted of two orthogonally polarized counter-propagating plane waves. We have revealed the underlying physics of this force by modeling the particle as a chiral diploe and analytically study the optical force. We find besides chirality; the trap** force is also closely related to the dual electric-magnetic symmetry of field and dual asymmetry of material. We also demonstrate that the proposed idea is not restricted to dipolar chiral objects only. Chiral Mie objects can also be trapped based on the technique proposed in this article. Notably, such chiral trap** forces have been found robust by varying several parameters throughout the investigation. This trap** force may find applications in identifying object's chirality and the selective trap** of chiral objects.
△ Less
Submitted 5 November, 2018;
originally announced November 2018.