-
Why do LLaVA Vision-Language Models Reply to Images in English?
Authors:
Musashi Hinck,
Carolin Holtermann,
Matthew Lyle Olson,
Florian Schneider,
Sungduk Yu,
Anahita Bhiwandiwalla,
Anne Lauscher,
Shaoyen Tseng,
Vasudev Lal
Abstract:
We uncover a surprising multilingual bias occurring in a popular class of multimodal vision-language models (VLMs). Including an image in the query to a LLaVA-style VLM significantly increases the likelihood of the model returning an English response, regardless of the language of the query. This paper investigates the causes of this loss with a two-pronged approach that combines extensive ablatio…
▽ More
We uncover a surprising multilingual bias occurring in a popular class of multimodal vision-language models (VLMs). Including an image in the query to a LLaVA-style VLM significantly increases the likelihood of the model returning an English response, regardless of the language of the query. This paper investigates the causes of this loss with a two-pronged approach that combines extensive ablation of the design space with a mechanistic analysis of the models' internal representations of image and text inputs. Both approaches indicate that the issue stems in the language modelling component of the LLaVA model. Statistically, we find that switching the language backbone for a bilingual language model has the strongest effect on reducing this error. Mechanistically, we provide compelling evidence that visual inputs are not mapped to a similar space as text ones, and that intervening on intermediary attention layers can reduce this bias. Our findings provide important insights to researchers and engineers seeking to understand the crossover between multimodal and multilingual spaces, and contribute to the goal of develo** capable and inclusive VLMs for non-English contexts.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
SK-VQA: Synthetic Knowledge Generation at Scale for Training Context-Augmented Multimodal LLMs
Authors:
Xin Su,
Man Luo,
Kris W Pan,
Tien Pei Chou,
Vasudev Lal,
Phillip Howard
Abstract:
Synthetic data generation has gained significant attention recently for its utility in training large vision and language models. However, the application of synthetic data to the training of multimodal context-augmented generation systems has been relatively unexplored. This gap in existing work is important because existing vision and language models (VLMs) are not trained specifically for conte…
▽ More
Synthetic data generation has gained significant attention recently for its utility in training large vision and language models. However, the application of synthetic data to the training of multimodal context-augmented generation systems has been relatively unexplored. This gap in existing work is important because existing vision and language models (VLMs) are not trained specifically for context-augmented generation. Resources for adapting such models are therefore crucial for enabling their use in retrieval-augmented generation (RAG) settings, where a retriever is used to gather relevant information that is then subsequently provided to a generative model via context augmentation. To address this challenging problem, we generate SK-VQA: a large synthetic multimodal dataset containing over 2 million question-answer pairs which require external knowledge to determine the final answer. Our dataset is both larger and significantly more diverse than existing resources of its kind, possessing over 11x more unique questions and containing images from a greater variety of sources than previously-proposed datasets. Through extensive experiments, we demonstrate that our synthetic dataset can not only serve as a challenging benchmark, but is also highly effective for adapting existing generative multimodal models for context-augmented generation.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
L-MAGIC: Language Model Assisted Generation of Images with Coherence
Authors:
Zhipeng Cai,
Matthias Mueller,
Reiner Birkl,
Diana Wofk,
Shao-Yen Tseng,
JunDa Cheng,
Gabriela Ben-Melech Stan,
Vasudev Lal,
Michael Paulitsch
Abstract:
In the current era of generative AI breakthroughs, generating panoramic scenes from a single input image remains a key challenge. Most existing methods use diffusion-based iterative or simultaneous multi-view inpainting. However, the lack of global scene layout priors leads to subpar outputs with duplicated objects (e.g., multiple beds in a bedroom) or requires time-consuming human text inputs for…
▽ More
In the current era of generative AI breakthroughs, generating panoramic scenes from a single input image remains a key challenge. Most existing methods use diffusion-based iterative or simultaneous multi-view inpainting. However, the lack of global scene layout priors leads to subpar outputs with duplicated objects (e.g., multiple beds in a bedroom) or requires time-consuming human text inputs for each view. We propose L-MAGIC, a novel method leveraging large language models for guidance while diffusing multiple coherent views of 360 degree panoramic scenes. L-MAGIC harnesses pre-trained diffusion and language models without fine-tuning, ensuring zero-shot performance. The output quality is further enhanced by super-resolution and multi-view fusion techniques. Extensive experiments demonstrate that the resulting panoramic scenes feature better scene layouts and perspective view rendering quality compared to related works, with >70% preference in human evaluations. Combined with conditional diffusion models, L-MAGIC can accept various input modalities, including but not limited to text, depth maps, sketches, and colored scripts. Applying depth estimation further enables 3D point cloud generation and dynamic scene exploration with fluid camera motion. Code is available at https://github.com/IntelLabs/MMPano. The video presentation is available at https://youtu.be/XDMNEzH4-Ec?list=PLG9Zyvu7iBa0-a7ccNLO8LjcVRAoMn57s.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
X-ray Cool Core Remnants Heated by Strong Radio AGN Feedback
Authors:
Wenhao Liu,
Ming Sun,
G. Mark Voit,
Dharam Vir Lal,
Paul Nulsen,
Massimo Gaspari,
Craig Sarazin,
Steven Ehlert,
Xianzhong Zheng
Abstract:
Strong AGN heating provides an alternative means for the disruption of cluster cool cores (CCs) to cluster mergers. In this work we present a systematic Chandra study of a sample of 108 nearby ($z<0.1$) galaxy clusters, to investigate the effect of AGN heating on CCs. About 40% of clusters with small offsets between the BCG and the X-ray centre ($\le50$ kpc) have small CCs. For comparison, 14 of 1…
▽ More
Strong AGN heating provides an alternative means for the disruption of cluster cool cores (CCs) to cluster mergers. In this work we present a systematic Chandra study of a sample of 108 nearby ($z<0.1$) galaxy clusters, to investigate the effect of AGN heating on CCs. About 40% of clusters with small offsets between the BCG and the X-ray centre ($\le50$ kpc) have small CCs. For comparison, 14 of 17 clusters with large offsets have small CCs, which suggests that mergers or sloshing can be efficient in reducing the CC size. Relaxed, small CC clusters generally have weak radio AGNs ($P_{1.4\rm GHz}<10^{23}$ W Hz$^{-1}$), and they show a lack of systems hosting a radio AGN with intermediate radio power ($2\times10^{23}<P_{1.4\rm GHz}<2\times10^{24}$ W Hz$^{-1}$). We found that the strongest circumnuclear ($<1$ kpc) X-ray emission only exists in clusters with strong radio AGN. The duty cycle of relaxed, small CC clusters is less than half of that for large CC clusters. It suggests that the radio activity of BCGs is affected by the properties of the surrounding gas beyond the central $\sim10$ kpc, and strong radio AGNs in small X-ray CCs fade more rapidly than those embedded in large X-ray CCs. A scenario is also presented for the transition of large CCs and coronae due to radio AGN feedback. We also present a detailed analysis of galaxy cluster 3C 129.1 as an example of a CC remnant possibly disrupted by radio AGN.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models
Authors:
Gabriela Ben Melech Stan,
Estelle Aflalo,
Raanan Yehezkel Rohekar,
Anahita Bhiwandiwalla,
Shao-Yen Tseng,
Matthew Lyle Olson,
Yaniv Gurwicz,
Chenfei Wu,
Nan Duan,
Vasudev Lal
Abstract:
In the rapidly evolving landscape of artificial intelligence, multi-modal large language models are emerging as a significant area of interest. These models, which combine various forms of data input, are becoming increasingly popular. However, understanding their internal mechanisms remains a complex task. Numerous advancements have been made in the field of explainability tools and mechanisms, y…
▽ More
In the rapidly evolving landscape of artificial intelligence, multi-modal large language models are emerging as a significant area of interest. These models, which combine various forms of data input, are becoming increasingly popular. However, understanding their internal mechanisms remains a complex task. Numerous advancements have been made in the field of explainability tools and mechanisms, yet there is still much to explore. In this work, we present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models. Our interface is designed to enhance the interpretability of the image patches, which are instrumental in generating an answer, and assess the efficacy of the language model in grounding its output in the image. With our application, a user can systematically investigate the model and uncover system limitations, paving the way for enhancements in system capabilities. Finally, we present a case study of how our application can aid in understanding failure mechanisms in a popular large multi-modal model: LLaVA.
△ Less
Submitted 24 June, 2024; v1 submitted 3 April, 2024;
originally announced April 2024.
-
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Authors:
Musashi Hinck,
Matthew L. Olson,
David Cobbley,
Shao-Yen Tseng,
Vasudev Lal
Abstract:
We train a suite of multimodal foundation models (MMFM) using the popular LLaVA framework with the recently released Gemma family of large language models (LLMs). Of particular interest is the 2B parameter Gemma model, which provides opportunities to construct capable small-scale MMFMs. In line with findings from other papers in this space, we test the effect of ablating three design features: pre…
▽ More
We train a suite of multimodal foundation models (MMFM) using the popular LLaVA framework with the recently released Gemma family of large language models (LLMs). Of particular interest is the 2B parameter Gemma model, which provides opportunities to construct capable small-scale MMFMs. In line with findings from other papers in this space, we test the effect of ablating three design features: pretraining the connector, utilizing a more powerful image backbone, and increasing the size of the language backbone. The resulting models, which we call LLaVA-Gemma, exhibit moderate performance on an array of evaluations, but fail to improve past the current comparably sized SOTA models. Closer analysis of performance shows mixed effects; skip** pretraining tends to reduce performance, larger vision models sometimes improve performance, and increasing language model size has inconsistent effects. We publicly release training recipes, code and weights for our models for the LLaVA-Gemma models.
△ Less
Submitted 10 June, 2024; v1 submitted 29 March, 2024;
originally announced April 2024.
-
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Authors:
Agneet Chatterjee,
Gabriela Ben Melech Stan,
Estelle Aflalo,
Sayak Paul,
Dhruba Ghosh,
Tejas Gokhale,
Ludwig Schmidt,
Hannaneh Hajishirzi,
Vasudev Lal,
Chitta Baral,
Yezhou Yang
Abstract:
One of the key shortcomings in current text-to-image (T2I) models is their inability to consistently generate images which faithfully follow the spatial relationships specified in the text prompt. In this paper, we offer a comprehensive investigation of this limitation, while also develo** datasets and methods that achieve state-of-the-art performance. First, we find that current vision-language…
▽ More
One of the key shortcomings in current text-to-image (T2I) models is their inability to consistently generate images which faithfully follow the spatial relationships specified in the text prompt. In this paper, we offer a comprehensive investigation of this limitation, while also develo** datasets and methods that achieve state-of-the-art performance. First, we find that current vision-language datasets do not represent spatial relationships well enough; to alleviate this bottleneck, we create SPRIGHT, the first spatially-focused, large scale dataset, by re-captioning 6 million images from 4 widely used vision datasets. Through a 3-fold evaluation and analysis pipeline, we find that SPRIGHT largely improves upon existing datasets in capturing spatial relationships. To demonstrate its efficacy, we leverage only ~0.25% of SPRIGHT and achieve a 22% improvement in generating spatially accurate images while also improving the FID and CMMD scores. Secondly, we find that training on images containing a large number of objects results in substantial improvements in spatial consistency. Notably, we attain state-of-the-art on T2I-CompBench with a spatial score of 0.2133, by fine-tuning on <500 images. Finally, through a set of controlled experiments and ablations, we document multiple findings that we believe will enhance the understanding of factors that affect spatial consistency in text-to-image models. We publicly release our dataset and model to foster further research in this area.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Spectroscopy of the $5s5p$ $ ^3 P_0 \rightarrow 5s5d$ $ ^3 D_1 $ transition of strontium using laser cooled atoms
Authors:
Kushal Patel,
Palki Gakkhar,
Korak Biswas,
S Sagar Maurya,
Pranab Dutta,
Vishal Lal,
Brajesh Mani,
Umakant D Rapol
Abstract:
This article presents spectroscopy results of the $5s5p{\;^3}P_0 \rightarrow 5s5d{\;^3}D_1$ transition in all isotopes of laser cooled Sr atoms and the utility of this transition for repum** application. By employing the $5s5p{\;^{3} P_{0}} \rightarrow 5s5d{\;^3}D_1 $ (483 nm) transition in combination with the excitation of $5s5p{\;^3}P_2 \rightarrow 5s6s{\;^3}S_1$ (707 nm) transition, we obser…
▽ More
This article presents spectroscopy results of the $5s5p{\;^3}P_0 \rightarrow 5s5d{\;^3}D_1$ transition in all isotopes of laser cooled Sr atoms and the utility of this transition for repum** application. By employing the $5s5p{\;^{3} P_{0}} \rightarrow 5s5d{\;^3}D_1 $ (483 nm) transition in combination with the excitation of $5s5p{\;^3}P_2 \rightarrow 5s6s{\;^3}S_1$ (707 nm) transition, we observe a significant increase ($\sim$ 13 fold) in the steady state number of atoms in the magneto-optic trap (MOT). This enhancement is attributed to the efficient repum** of Sr atoms that have decayed into the dark $5s5p{\;^3}P_2$ state by returning them to the ground state $5s^2{\;^1}S_0$ without any loss into the other states. The absolute transition frequencies were measured with an absolute accuracy of 30 MHz. To support our measurements, we performed Fock-space relativistic coupled-cluster calculations of the relevant parameters in Sr. To further increase the accuracy of the calculated properties, corrections from the Breit, QED and perturbative triples were also included. The calculated branching ratio for the repum** state confirms the significantly increased population in the ${^3}P_1$ state. Thereby, leading to an increase of population of atoms trapped due to the enhanced repum**. Our calculated hyperfine-splitting energies are in excellent agreement with the measured values. Moreover, our calculated isotope shifts in the transition frequencies are in good agreement with our measured values.
△ Less
Submitted 16 December, 2023;
originally announced December 2023.
-
SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples
Authors:
Phillip Howard,
Avinash Madasu,
Tiep Le,
Gustavo Lujan Moreno,
Anahita Bhiwandiwalla,
Vasudev Lal
Abstract:
While vision-language models (VLMs) have achieved remarkable performance improvements recently, there is growing evidence that these models also posses harmful biases with respect to social attributes such as gender and race. Prior studies have primarily focused on probing such bias attributes individually while ignoring biases associated with intersections between social attributes. This could be…
▽ More
While vision-language models (VLMs) have achieved remarkable performance improvements recently, there is growing evidence that these models also posses harmful biases with respect to social attributes such as gender and race. Prior studies have primarily focused on probing such bias attributes individually while ignoring biases associated with intersections between social attributes. This could be due to the difficulty of collecting an exhaustive set of image-text pairs for various combinations of social attributes. To address this challenge, we employ text-to-image diffusion models to produce counterfactual examples for probing intersectional social biases at scale. Our approach utilizes Stable Diffusion with cross attention control to produce sets of counterfactual image-text pairs that are highly similar in their depiction of a subject (e.g., a given occupation) while differing only in their depiction of intersectional social attributes (e.g., race & gender). Through our over-generate-then-filter methodology, we produce SocialCounterfactuals, a high-quality dataset containing 171k image-text pairs for probing intersectional biases related to gender, race, and physical characteristics. We conduct extensive experiments to demonstrate the usefulness of our generated dataset for probing and mitigating intersectional social biases in state-of-the-art VLMs.
△ Less
Submitted 9 April, 2024; v1 submitted 30 November, 2023;
originally announced December 2023.
-
NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation
Authors:
Shachar Rosenman,
Vasudev Lal,
Phillip Howard
Abstract:
Despite impressive recent advances in text-to-image diffusion models, obtaining high-quality images often requires prompt engineering by humans who have developed expertise in using them. In this work, we present NeuroPrompts, an adaptive framework that automatically enhances a user's prompt to improve the quality of generations produced by text-to-image models. Our framework utilizes constrained…
▽ More
Despite impressive recent advances in text-to-image diffusion models, obtaining high-quality images often requires prompt engineering by humans who have developed expertise in using them. In this work, we present NeuroPrompts, an adaptive framework that automatically enhances a user's prompt to improve the quality of generations produced by text-to-image models. Our framework utilizes constrained text decoding with a pre-trained language model that has been adapted to generate prompts similar to those produced by human prompt engineers. This approach enables higher-quality text-to-image generations and provides user control over stylistic features via constraint set specification. We demonstrate the utility of our framework by creating an interactive application for prompt enhancement and image generation using Stable Diffusion. Additionally, we conduct experiments utilizing a large dataset of human-engineered prompts for text-to-image generation and show that our approach automatically produces enhanced prompts that result in superior image quality. We make our code and a screencast video demo of NeuroPrompts publicly available.
△ Less
Submitted 5 April, 2024; v1 submitted 20 November, 2023;
originally announced November 2023.
-
LDM3D-VR: Latent Diffusion Model for 3D VR
Authors:
Gabriela Ben Melech Stan,
Diana Wofk,
Estelle Aflalo,
Shao-Yen Tseng,
Zhipeng Cai,
Michael Paulitsch,
Vasudev Lal
Abstract:
Latent diffusion models have proven to be state-of-the-art in the creation and manipulation of visual outputs. However, as far as we know, the generation of depth maps jointly with RGB is still limited. We introduce LDM3D-VR, a suite of diffusion models targeting virtual reality development that includes LDM3D-pano and LDM3D-SR. These models enable the generation of panoramic RGBD based on textual…
▽ More
Latent diffusion models have proven to be state-of-the-art in the creation and manipulation of visual outputs. However, as far as we know, the generation of depth maps jointly with RGB is still limited. We introduce LDM3D-VR, a suite of diffusion models targeting virtual reality development that includes LDM3D-pano and LDM3D-SR. These models enable the generation of panoramic RGBD based on textual prompts and the upscaling of low-resolution inputs to high-resolution RGBD, respectively. Our models are fine-tuned from existing pretrained models on datasets containing panoramic/high-resolution RGB images, depth maps and captions. Both models are evaluated in comparison to existing related methods.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
Radio-continuum spectra of ram pressure stripped galaxies in the Coma Cluster
Authors:
I. D. Roberts,
R. J. van Weeren,
D. V. Lal,
M. Sun,
H. Chen,
A. Ignesti,
M. Brüggen,
N. Lyskova,
T. Venturi,
M. Yagi
Abstract:
$Aims:$ We used the nearby Coma Cluster as a laboratory in order to probe the impact of ram pressure on star formation as well as to constrain the characteristic timescales and velocities for the strip** of the non-thermal ISM. $Methods:$ We used high-resolution ($6.5'' \approx 3\,\mathrm{kpc}$), multi-frequency ($144\,\mathrm{MHz} - 1.5\,\mathrm{GHz}…
▽ More
$Aims:$ We used the nearby Coma Cluster as a laboratory in order to probe the impact of ram pressure on star formation as well as to constrain the characteristic timescales and velocities for the strip** of the non-thermal ISM. $Methods:$ We used high-resolution ($6.5'' \approx 3\,\mathrm{kpc}$), multi-frequency ($144\,\mathrm{MHz} - 1.5\,\mathrm{GHz}$) radio continuum imaging of the Coma Cluster to resolve the low-frequency radio spectrum across the discs and tails of 25 ram pressure stripped galaxies. With resolved spectral index maps across these galaxy discs, we constrained the impact of ram pressure perturbations on galaxy star formation. We measured multi-frequency flux-density profiles along each of the ram pressure stripped tails in our sample. We then fit the resulting radio continuum spectra with a simple synchrotron aging model. $Results:$ We showed that ram pressure stripped tails in Coma have steep ($-2 \lesssim α\lesssim -1$) spectral indices. The discs of galaxies undergoing ram pressure strip** have integrated spectral indices within the expected range for shock acceleration from supernovae ($-0.8 \lesssim α\lesssim -0.5$), though there is a tail towards flatter values. In a resolved sense, there are gradients in spectral index across the discs of ram pressure stripped galaxies in Coma. These gradients are aligned with the direction of the observed radio tails, with the flattest spectral indices being found on the `leading half'. From best-fit break frequencies we estimated the projected plasma velocities along the tail to be on the order of hundreds of kilometers per second, with the precise magnitude depending on the assumed magnetic field strength.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks
Authors:
Avinash Madasu,
Anahita Bhiwandiwalla,
Vasudev Lal
Abstract:
Foundational multimodal models pre-trained on large scale image-text pairs or video-text pairs or both have shown strong generalization abilities on downstream tasks. However unlike image-text models, pretraining video-text models is always not feasible due to the difficulty in collecting large-scale clean and aligned data, and exponential computational costs involved in the pretraining phase. The…
▽ More
Foundational multimodal models pre-trained on large scale image-text pairs or video-text pairs or both have shown strong generalization abilities on downstream tasks. However unlike image-text models, pretraining video-text models is always not feasible due to the difficulty in collecting large-scale clean and aligned data, and exponential computational costs involved in the pretraining phase. Therefore, the pertinent question to ask is: Can image-text models be adapted to video tasks and is there any benefit to using these models over pretraining directly on videos? In this work, we focus on this question by proposing a detailed study on the generalization abilities of image-text models when evaluated on video understanding tasks in a zero-shot setting. We investigate 9 foundational image-text models on a diverse set of video tasks that include video action recognition (video AR), video retrieval (video RT), video question answering (video QA), video multiple choice (video MC) and video captioning (video CP). Our experiments show that image-text models exhibit impressive performance on video AR, video RT and video MC. Furthermore, they perform moderately on video captioning and poorly on video QA. These findings shed a light on the benefits of adapting foundational image-text models to an array of video tasks while avoiding the costly pretraining step.
△ Less
Submitted 24 November, 2023; v1 submitted 7 October, 2023;
originally announced October 2023.
-
Probing Intersectional Biases in Vision-Language Models with Counterfactual Examples
Authors:
Phillip Howard,
Avinash Madasu,
Tiep Le,
Gustavo Lujan Moreno,
Vasudev Lal
Abstract:
While vision-language models (VLMs) have achieved remarkable performance improvements recently, there is growing evidence that these models also posses harmful biases with respect to social attributes such as gender and race. Prior studies have primarily focused on probing such bias attributes individually while ignoring biases associated with intersections between social attributes. This could be…
▽ More
While vision-language models (VLMs) have achieved remarkable performance improvements recently, there is growing evidence that these models also posses harmful biases with respect to social attributes such as gender and race. Prior studies have primarily focused on probing such bias attributes individually while ignoring biases associated with intersections between social attributes. This could be due to the difficulty of collecting an exhaustive set of image-text pairs for various combinations of social attributes from existing datasets. To address this challenge, we employ text-to-image diffusion models to produce counterfactual examples for probing intserctional social biases at scale. Our approach utilizes Stable Diffusion with cross attention control to produce sets of counterfactual image-text pairs that are highly similar in their depiction of a subject (e.g., a given occupation) while differing only in their depiction of intersectional social attributes (e.g., race & gender). We conduct extensive experiments using our generated dataset which reveal the intersectional social biases present in state-of-the-art VLMs.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
COCO-Counterfactuals: Automatically Constructed Counterfactual Examples for Image-Text Pairs
Authors:
Tiep Le,
Vasudev Lal,
Phillip Howard
Abstract:
Counterfactual examples have proven to be valuable in the field of natural language processing (NLP) for both evaluating and improving the robustness of language models to spurious correlations in datasets. Despite their demonstrated utility for NLP, multimodal counterfactual examples have been relatively unexplored due to the difficulty of creating paired image-text data with minimal counterfactu…
▽ More
Counterfactual examples have proven to be valuable in the field of natural language processing (NLP) for both evaluating and improving the robustness of language models to spurious correlations in datasets. Despite their demonstrated utility for NLP, multimodal counterfactual examples have been relatively unexplored due to the difficulty of creating paired image-text data with minimal counterfactual changes. To address this challenge, we introduce a scalable framework for automatic generation of counterfactual examples using text-to-image diffusion models. We use our framework to create COCO-Counterfactuals, a multimodal counterfactual dataset of paired image and text captions based on the MS-COCO dataset. We validate the quality of COCO-Counterfactuals through human evaluations and show that existing multimodal models are challenged by our counterfactual image-text pairs. Additionally, we demonstrate the usefulness of COCO-Counterfactuals for improving out-of-domain generalization of multimodal vision-language models via training data augmentation.
△ Less
Submitted 31 October, 2023; v1 submitted 22 September, 2023;
originally announced September 2023.
-
ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval Models
Authors:
Avinash Madasu,
Vasudev Lal
Abstract:
Video retrieval (VR) involves retrieving the ground truth video from the video database given a text caption or vice-versa. The two important components of compositionality: objects & attributes and actions are joined using correct syntax to form a proper text query. These components (objects & attributes, actions and syntax) each play an important role to help distinguish among videos and retriev…
▽ More
Video retrieval (VR) involves retrieving the ground truth video from the video database given a text caption or vice-versa. The two important components of compositionality: objects & attributes and actions are joined using correct syntax to form a proper text query. These components (objects & attributes, actions and syntax) each play an important role to help distinguish among videos and retrieve the correct ground truth video. However, it is unclear what is the effect of these components on the video retrieval performance. We therefore, conduct a systematic study to evaluate the compositional and syntactic understanding of video retrieval models on standard benchmarks such as MSRVTT, MSVD and DIDEMO. The study is performed on two categories of video retrieval models: (i) which are pre-trained on video-text pairs and fine-tuned on downstream video retrieval datasets (Eg. Frozen-in-Time, Violet, MCQ etc.) (ii) which adapt pre-trained image-text representations like CLIP for video retrieval (Eg. CLIP4Clip, XCLIP, CLIP2Video etc.). Our experiments reveal that actions and syntax play a minor role compared to objects & attributes in video understanding. Moreover, video retrieval models that use pre-trained image-text representations (CLIP) have better syntactic and compositional understanding as compared to models pre-trained on video-text data. The code is available at https://github.com/IntelLabs/multimodal_cognitive_ai/tree/main/ICSVR
△ Less
Submitted 10 June, 2024; v1 submitted 28 June, 2023;
originally announced June 2023.
-
ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
Authors:
Xiao Xu,
Bei Li,
Chenfei Wu,
Shao-Yen Tseng,
Anahita Bhiwandiwalla,
Shachar Rosenman,
Vasudev Lal,
Wanxiang Che,
Nan Duan
Abstract:
Two-Tower Vision-Language (VL) models have shown promising improvements on various downstream VL tasks. Although the most advanced work improves performance by building bridges between encoders, it suffers from ineffective layer-by-layer utilization of uni-modal representations and cannot flexibly exploit different levels of uni-modal semantic knowledge. In this work, we propose ManagerTower, a no…
▽ More
Two-Tower Vision-Language (VL) models have shown promising improvements on various downstream VL tasks. Although the most advanced work improves performance by building bridges between encoders, it suffers from ineffective layer-by-layer utilization of uni-modal representations and cannot flexibly exploit different levels of uni-modal semantic knowledge. In this work, we propose ManagerTower, a novel VL model architecture that gathers and combines the insights of pre-trained uni-modal experts at different levels. The managers introduced in each cross-modal layer can adaptively aggregate uni-modal semantic knowledge to facilitate more comprehensive cross-modal alignment and fusion. ManagerTower outperforms previous strong baselines both with and without Vision-Language Pre-training (VLP). With only 4M VLP data, ManagerTower achieves superior performances on various downstream VL tasks, especially 79.15% accuracy on VQAv2 Test-Std, 86.56% IR@1 and 95.64% TR@1 on Flickr30K. Code and checkpoints are available at https://github.com/LooperXX/ManagerTower.
△ Less
Submitted 31 May, 2023;
originally announced June 2023.
-
Brain encoding models based on multimodal transformers can transfer across language and vision
Authors:
Jerry Tang,
Meng Du,
Vy A. Vo,
Vasudev Lal,
Alexander G. Huth
Abstract:
Encoding models have been used to assess how the human brain represents concepts in language and vision. While language and vision rely on similar concept representations, current encoding models are typically trained and tested on brain responses to each modality in isolation. Recent advances in multimodal pretraining have produced transformers that can extract aligned representations of concepts…
▽ More
Encoding models have been used to assess how the human brain represents concepts in language and vision. While language and vision rely on similar concept representations, current encoding models are typically trained and tested on brain responses to each modality in isolation. Recent advances in multimodal pretraining have produced transformers that can extract aligned representations of concepts in language and vision. In this work, we used representations from multimodal transformers to train encoding models that can transfer across fMRI responses to stories and movies. We found that encoding models trained on brain responses to one modality can successfully predict brain responses to the other modality, particularly in cortical regions that represent conceptual meaning. Further analysis of these encoding models revealed shared semantic dimensions that underlie concept representations in language and vision. Comparing encoding models trained using representations from multimodal and unimodal transformers, we found that multimodal transformers learn more aligned representations of concepts in language and vision. Our results demonstrate how multimodal transformers can provide insights into the brain's capacity for multimodal processing.
△ Less
Submitted 20 May, 2023;
originally announced May 2023.
-
LDM3D: Latent Diffusion Model for 3D
Authors:
Gabriela Ben Melech Stan,
Diana Wofk,
Scottie Fox,
Alex Redden,
Will Saxton,
Jean Yu,
Estelle Aflalo,
Shao-Yen Tseng,
Fabio Nonato,
Matthias Muller,
Vasudev Lal
Abstract:
This research paper proposes a Latent Diffusion Model for 3D (LDM3D) that generates both image and depth map data from a given text prompt, allowing users to generate RGBD images from text prompts. The LDM3D model is fine-tuned on a dataset of tuples containing an RGB image, depth map and caption, and validated through extensive experiments. We also develop an application called DepthFusion, which…
▽ More
This research paper proposes a Latent Diffusion Model for 3D (LDM3D) that generates both image and depth map data from a given text prompt, allowing users to generate RGBD images from text prompts. The LDM3D model is fine-tuned on a dataset of tuples containing an RGB image, depth map and caption, and validated through extensive experiments. We also develop an application called DepthFusion, which uses the generated RGB images and depth maps to create immersive and interactive 360-degree-view experiences using TouchDesigner. This technology has the potential to transform a wide range of industries, from entertainment and gaming to architecture and design. Overall, this paper presents a significant contribution to the field of generative AI and computer vision, and showcases the potential of LDM3D and DepthFusion to revolutionize content creation and digital experiences. A short video summarizing the approach can be found at https://t.ly/tdi2.
△ Less
Submitted 21 May, 2023; v1 submitted 18 May, 2023;
originally announced May 2023.
-
Spectral age distribution for radio-loud active galaxies in the XMM-LSS field
Authors:
Siddhant Pinjarkar,
Martin J. Hardcastle,
Jeremy J. Harwood,
Dharam V. Lal,
Peter W. Hatfield,
Matt J. Jarvis,
Zara Randriamanakoto,
Imogen H. Whittam
Abstract:
Jets of energetic particles, as seen in FR type-I and FR type-II sources, ejected from the center of Radio-Loud AGN affect the sources surrounding intracluster medium/intergalactic medium. Placing constraints on the age of such sources is important in order to measure the jet powers and determine the effects on feedback. To evaluate the age of these sources using spectral age models, we require hi…
▽ More
Jets of energetic particles, as seen in FR type-I and FR type-II sources, ejected from the center of Radio-Loud AGN affect the sources surrounding intracluster medium/intergalactic medium. Placing constraints on the age of such sources is important in order to measure the jet powers and determine the effects on feedback. To evaluate the age of these sources using spectral age models, we require high-resolution multi-wavelength data. The new sensitive and high-resolution MIGHTEE survey of the XMM-LSS field along with data from the Low Frequency Array (LOFAR) and the Giant Metrewave Radio Telescope (GMRT) provide data taken at different frequencies with similar resolution, which enables us to determine the spectral age distribution for radio loud AGN in the survey field. In this study we present a sample of 28 radio galaxies with their best fitting spectral age distribution analyzed using the Jaffe-Perola (JP) model on a pixel-by-pixel basis. Fits are generally good and objects in our sample show maximum ages within the range of 2.8 Myr to 115 Myr with a median of 8.71 Myr. High-resolution maps over a range of frequencies are required to observe detailed age distributions for small sources and high-sensitivity maps will be needed in order to observe fainter extended emission. We do not observe any correlation between the total physical size of the sources and their age and we speculate both dynamical models and the approach to spectral age analysis may need some modification to account for our observations.
△ Less
Submitted 12 May, 2023;
originally announced May 2023.
-
NeuroComparatives: Neuro-Symbolic Distillation of Comparative Knowledge
Authors:
Phillip Howard,
Junlin Wang,
Vasudev Lal,
Gadi Singer,
Ye** Choi,
Swabha Swayamdipta
Abstract:
Comparative knowledge (e.g., steel is stronger and heavier than styrofoam) is an essential component of our world knowledge, yet understudied in prior literature. In this paper, we harvest the dramatic improvements in knowledge capabilities of language models into a large-scale comparative knowledge base. While the ease of acquisition of such comparative knowledge is much higher from extreme-scale…
▽ More
Comparative knowledge (e.g., steel is stronger and heavier than styrofoam) is an essential component of our world knowledge, yet understudied in prior literature. In this paper, we harvest the dramatic improvements in knowledge capabilities of language models into a large-scale comparative knowledge base. While the ease of acquisition of such comparative knowledge is much higher from extreme-scale models like GPT-4, compared to their considerably smaller and weaker counterparts such as GPT-2, not even the most powerful models are exempt from making errors. We thus ask: to what extent are models at different scales able to generate valid and diverse comparative knowledge?
We introduce NeuroComparatives, a novel framework for comparative knowledge distillation overgenerated from language models such as GPT-variants and LLaMA, followed by stringent filtering of the generated knowledge. Our framework acquires comparative knowledge between everyday objects, producing a corpus of up to 8.8M comparisons over 1.74M entity pairs - 10X larger and 30% more diverse than existing resources. Moreover, human evaluations show that NeuroComparatives outperform existing resources in terms of validity (up to 32% absolute improvement). Our acquired NeuroComparatives leads to performance improvements on five downstream tasks. We find that neuro-symbolic manipulation of smaller models offers complementary benefits to the currently dominant practice of prompting extreme-scale language models for knowledge distillation.
△ Less
Submitted 5 April, 2024; v1 submitted 8 May, 2023;
originally announced May 2023.
-
Thrill-K Architecture: Towards a Solution to the Problem of Knowledge Based Understanding
Authors:
Gadi Singer,
Joscha Bach,
Tetiana Grinberg,
Nagib Hakim,
Phillip Howard,
Vasudev Lal,
Zev Rivlin
Abstract:
While end-to-end learning systems are rapidly gaining capabilities and popularity, the increasing computational demands for deploying such systems, along with a lack of flexibility, adaptability, explainability, reasoning and verification capabilities, require new types of architectures. Here we introduce a classification of hybrid systems which, based on an analysis of human knowledge and intelli…
▽ More
While end-to-end learning systems are rapidly gaining capabilities and popularity, the increasing computational demands for deploying such systems, along with a lack of flexibility, adaptability, explainability, reasoning and verification capabilities, require new types of architectures. Here we introduce a classification of hybrid systems which, based on an analysis of human knowledge and intelligence, combines neural learning with various types of knowledge and knowledge sources. We present the Thrill-K architecture as a prototypical solution for integrating instantaneous knowledge, standby knowledge and external knowledge sources in a framework capable of inference, learning and intelligent control.
△ Less
Submitted 28 February, 2023;
originally announced March 2023.
-
Is Multimodal Vision Supervision Beneficial to Language?
Authors:
Avinash Madasu,
Vasudev Lal
Abstract:
Vision (image and video) - Language (VL) pre-training is the recent popular paradigm that achieved state-of-the-art results on multi-modal tasks like image-retrieval, video-retrieval, visual question answering etc. These models are trained in an unsupervised way and greatly benefit from the complementary modality supervision. In this paper, we explore if the language representations trained using…
▽ More
Vision (image and video) - Language (VL) pre-training is the recent popular paradigm that achieved state-of-the-art results on multi-modal tasks like image-retrieval, video-retrieval, visual question answering etc. These models are trained in an unsupervised way and greatly benefit from the complementary modality supervision. In this paper, we explore if the language representations trained using vision supervision perform better than vanilla language representations on Natural Language Understanding and commonsense reasoning benchmarks. We experiment with a diverse set of image-text models such as ALBEF, BLIP, METER and video-text models like ALPRO, Frozen-in-Time (FiT), VIOLET. We compare the performance of language representations of stand-alone text encoders of these models to the language representations of text encoders learnt through vision supervision. Our experiments suggest that vanilla language representations show superior performance on most of the tasks. These results shed light on the current drawbacks of the vision-language models.
△ Less
Submitted 14 April, 2023; v1 submitted 9 February, 2023;
originally announced February 2023.
-
NeuroCounterfactuals: Beyond Minimal-Edit Counterfactuals for Richer Data Augmentation
Authors:
Phillip Howard,
Gadi Singer,
Vasudev Lal,
Ye** Choi,
Swabha Swayamdipta
Abstract:
While counterfactual data augmentation offers a promising step towards robust generalization in natural language processing, producing a set of counterfactuals that offer valuable inductive bias for models remains a challenge. Most existing approaches for producing counterfactuals, manual or automated, rely on small perturbations via minimal edits, resulting in simplistic changes. We introduce Neu…
▽ More
While counterfactual data augmentation offers a promising step towards robust generalization in natural language processing, producing a set of counterfactuals that offer valuable inductive bias for models remains a challenge. Most existing approaches for producing counterfactuals, manual or automated, rely on small perturbations via minimal edits, resulting in simplistic changes. We introduce NeuroCounterfactuals, designed as loose counterfactuals, allowing for larger edits which result in naturalistic generations containing linguistic diversity, while still bearing similarity to the original document. Our novel generative approach bridges the benefits of constrained decoding, with those of language model adaptation for sentiment steering. Training data augmentation with our generations results in both in-domain and out-of-domain improvements for sentiment classification, outperforming even manually curated counterfactuals, under select settings. We further present detailed analyses to show the advantages of NeuroCounterfactuals over approaches involving simple, minimal edits.
△ Less
Submitted 22 October, 2022;
originally announced October 2022.
-
Cross-Domain Aspect Extraction using Transformers Augmented with Knowledge Graphs
Authors:
Phillip Howard,
Arden Ma,
Vasudev Lal,
Ana Paula Simoes,
Daniel Korat,
Oren Pereg,
Moshe Wasserblat,
Gadi Singer
Abstract:
The extraction of aspect terms is a critical step in fine-grained sentiment analysis of text. Existing approaches for this task have yielded impressive results when the training and testing data are from the same domain. However, these methods show a drastic decrease in performance when applied to cross-domain settings where the domain of the testing data differs from that of the training data. To…
▽ More
The extraction of aspect terms is a critical step in fine-grained sentiment analysis of text. Existing approaches for this task have yielded impressive results when the training and testing data are from the same domain. However, these methods show a drastic decrease in performance when applied to cross-domain settings where the domain of the testing data differs from that of the training data. To address this lack of extensibility and robustness, we propose a novel approach for automatically constructing domain-specific knowledge graphs that contain information relevant to the identification of aspect terms. We introduce a methodology for injecting information from these knowledge graphs into Transformer models, including two alternative mechanisms for knowledge insertion: via query enrichment and via manipulation of attention patterns. We demonstrate state-of-the-art performance on benchmark datasets for cross-domain aspect term extraction using our approach and investigate how the amount of external knowledge available to the Transformer impacts model performance.
△ Less
Submitted 18 October, 2022;
originally announced October 2022.
-
MuMUR : Multilingual Multimodal Universal Retrieval
Authors:
Avinash Madasu,
Estelle Aflalo,
Gabriela Ben Melech Stan,
Shachar Rosenman,
Shao-Yen Tseng,
Gedas Bertasius,
Vasudev Lal
Abstract:
Multi-modal retrieval has seen tremendous progress with the development of vision-language models. However, further improving these models require additional labelled data which is a huge manual effort. In this paper, we propose a framework MuMUR, that utilizes knowledge transfer from a multilingual model to boost the performance of multi-modal (image and video) retrieval. We first use state-of-th…
▽ More
Multi-modal retrieval has seen tremendous progress with the development of vision-language models. However, further improving these models require additional labelled data which is a huge manual effort. In this paper, we propose a framework MuMUR, that utilizes knowledge transfer from a multilingual model to boost the performance of multi-modal (image and video) retrieval. We first use state-of-the-art machine translation models to construct pseudo ground-truth multilingual visual-text pairs. We then use this data to learn a joint vision-text representation where English and non-English text queries are represented in a common embedding space based on pretrained multilingual models. We evaluate our proposed approach on a diverse set of retrieval datasets: five video retrieval datasets such as MSRVTT, MSVD, DiDeMo, Charades and MSRVTT multilingual, two image retrieval datasets such as Flickr30k and Multi30k . Experimental results demonstrate that our approach achieves state-of-the-art results on all video retrieval datasets outperforming previous models. Additionally, our framework MuMUR significantly beats other multilingual video retrieval dataset. We also observe that MuMUR exhibits strong performance on image retrieval. This demonstrates the universal ability of MuMUR to perform retrieval across all visual inputs (image and video) and text inputs (monolingual and multilingual).
△ Less
Submitted 19 September, 2023; v1 submitted 24 August, 2022;
originally announced August 2022.
-
High-resolution, High-sensitivity, Low-frequency uGMRT View of Coma Cluster of Galaxies
Authors:
D. V. Lal,
N. Lyskova,
C. Zhang,
T. Venturi,
W. R. Forman,
C. Jones,
E. M. Churazov,
R. J. van Weeren,
A. Bonafede,
N. A. Miller,
I. D. Roberts,
A. M. Bykov,
L. Di Mascolo,
M. Brüggen,
G. Brunetti
Abstract:
We present high-resolution, high-sensitivity upgraded Giant Metrewave Radio Telescope observations of the Coma cluster (A1656) at 250-500 MHz and 550-850 MHz. At 250-500 MHz, 135 sources have extensions $>$ 0.45 arcmin (with peak-to-local-noise ratio $> 4$). Of these, 24 sources are associated with Coma-member galaxies. In addition, we supplement this sample of 24 galaxies with 20 ram pressure str…
▽ More
We present high-resolution, high-sensitivity upgraded Giant Metrewave Radio Telescope observations of the Coma cluster (A1656) at 250-500 MHz and 550-850 MHz. At 250-500 MHz, 135 sources have extensions $>$ 0.45 arcmin (with peak-to-local-noise ratio $> 4$). Of these, 24 sources are associated with Coma-member galaxies. In addition, we supplement this sample of 24 galaxies with 20 ram pressure stripped galaxies from Chen et al. (2020, eight are included in the original extended radio source sample) and an additional five are detected and extended. We present radio morphologies, radio spectra, spectral index maps, and equipartition properties for these two samples. In general, we find the equipartition properties lie within a narrow range (e.g., $P_{\rm min}$ = 1-3 dynes cm$^{-2}$). Only NGC 4874, one of the two brightest central Coma cluster galaxies, has a central energy density and pressure about five times higher and a radio source age about 50 % lower than that of the other Coma galaxies. We find a diffuse tail of radio emission trailing the dominant galaxy of the merging NGC 4839 group that coincides with the "slingshot" tail, seen in X-rays. The southwestern radio relic, B1253$+$275, has a large extent $\approx$ 32$^\prime$ $\times$ 10$^\prime$ ($\simeq$ 1.08 $\times$ 0.34 Mpc$^2$). For NGC 4789, whose long radio tails merge into the relic and may be a source of its relativistic seed electrons, and we find a transverse radio spectral gradient, a steepening from southwest to northeast across the width of the radio source. Finally, radio morphologies of the extended and RPS samples suggest that these galaxies are on their first infall into Coma on (predominantly) radial orbits.
△ Less
Submitted 13 July, 2022;
originally announced July 2022.
-
BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning
Authors:
Xiao Xu,
Chenfei Wu,
Shachar Rosenman,
Vasudev Lal,
Wanxiang Che,
Nan Duan
Abstract:
Vision-Language (VL) models with the Two-Tower architecture have dominated visual-language representation learning in recent years. Current VL models either use lightweight uni-modal encoders and learn to extract, align and fuse both modalities simultaneously in a deep cross-modal encoder, or feed the last-layer uni-modal representations from the deep pre-trained uni-modal encoders into the top cr…
▽ More
Vision-Language (VL) models with the Two-Tower architecture have dominated visual-language representation learning in recent years. Current VL models either use lightweight uni-modal encoders and learn to extract, align and fuse both modalities simultaneously in a deep cross-modal encoder, or feed the last-layer uni-modal representations from the deep pre-trained uni-modal encoders into the top cross-modal encoder. Both approaches potentially restrict vision-language representation learning and limit model performance. In this paper, we propose BridgeTower, which introduces multiple bridge layers that build a connection between the top layers of uni-modal encoders and each layer of the cross-modal encoder. This enables effective bottom-up cross-modal alignment and fusion between visual and textual representations of different semantic levels of pre-trained uni-modal encoders in the cross-modal encoder. Pre-trained with only 4M images, BridgeTower achieves state-of-the-art performance on various downstream vision-language tasks. In particular, on the VQAv2 test-std set, BridgeTower achieves an accuracy of 78.73%, outperforming the previous state-of-the-art model METER by 1.09% with the same pre-training data and almost negligible additional parameters and computational costs. Notably, when further scaling the model, BridgeTower achieves an accuracy of 81.15%, surpassing models that are pre-trained on orders-of-magnitude larger datasets. Code and checkpoints are available at https://github.com/microsoft/BridgeTower.
△ Less
Submitted 26 March, 2024; v1 submitted 17 June, 2022;
originally announced June 2022.
-
A multiwavelength study of the W33 Main ultracompact HII region
Authors:
Sarwar Khan,
Jagadheep D. Pandian,
Dharam V. Lal,
Michael R. Rugel,
Andreas Brunthaler,
Karl M. Menten,
F. Wyrowski,
S-N. X. Medina,
S. A. Dzib,
H. Nguyen
Abstract:
The dynamics of ionized gas around the W33 Main ultracompact HII region is studied using observations of hydrogen radio recombination lines and a detailed multiwavelength characterization of the massive star-forming region W33 Main is performed. We used the Giant Meterwave Radio Telescope (GMRT) to observe the H167$α$ recombination line at 1.4 GHz at an angular resolution of 10 arcsec, and Karl. G…
▽ More
The dynamics of ionized gas around the W33 Main ultracompact HII region is studied using observations of hydrogen radio recombination lines and a detailed multiwavelength characterization of the massive star-forming region W33 Main is performed. We used the Giant Meterwave Radio Telescope (GMRT) to observe the H167$α$ recombination line at 1.4 GHz at an angular resolution of 10 arcsec, and Karl. G. Jansky Very Large Array (VLA) data acquired in the GLOSTAR survey to study the dynamics of ionized gas. We also observed the radio continuum at 1.4 GHz and 610 MHz with the GMRT and used GLOSTAR 4$-$8 GHz continuum data to characterize the nature of the radio emission. In addition, archival data from submillimeter to near-infrared wavelengths were used to study the dust emission and identify YSOs in the W33 Main star-forming region. The radio recombination lines were detected at good signal to noise in the GLOSTAR data, while the H167$α$ radio recombination line was marginally detected with the GMRT. The spectral index of radio emission in the region determined from GMRT and GLOSTAR shows the emission to be thermal in the entire region. Along with W33 Main, an arc-shaped diffuse continuum source, G12.81$-$0.22, was detected with the GMRT data. The GLOSTAR recombination line data reveal a velocity gradient across W33 Main and G12.81$-$0.22. The electron temperature is found to be 6343 K and 4843 K in W33 Main and G12.81$-$0.22, respectively. The physical properties of the W33 Main molecular clump were derived by modeling the dust emission using data from the ATLASGAL and Hi-GAL surveys and they are consistent with the region being a relatively evolved site of massive star formation. The gas dynamics and physical properties of G12.81$-$0.22 are consistent with the HII region being in an evolved phase and its expansion on account of the pressure difference is slowing down.
△ Less
Submitted 31 May, 2022;
originally announced May 2022.
-
VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers
Authors:
Estelle Aflalo,
Meng Du,
Shao-Yen Tseng,
Yongfei Liu,
Chenfei Wu,
Nan Duan,
Vasudev Lal
Abstract:
Breakthroughs in transformer-based models have revolutionized not only the NLP field, but also vision and multimodal systems. However, although visualization and interpretability tools have become available for NLP models, internal mechanisms of vision and multimodal transformers remain largely opaque. With the success of these transformers, it is increasingly critical to understand their inner wo…
▽ More
Breakthroughs in transformer-based models have revolutionized not only the NLP field, but also vision and multimodal systems. However, although visualization and interpretability tools have become available for NLP models, internal mechanisms of vision and multimodal transformers remain largely opaque. With the success of these transformers, it is increasingly critical to understand their inner workings, as unraveling these black-boxes will lead to more capable and trustworthy models. To contribute to this quest, we propose VL-InterpreT, which provides novel interactive visualizations for interpreting the attentions and hidden representations in multimodal transformers. VL-InterpreT is a task agnostic and integrated tool that (1) tracks a variety of statistics in attention heads throughout all layers for both vision and language components, (2) visualizes cross-modal and intra-modal attentions through easily readable heatmaps, and (3) plots the hidden representations of vision and language tokens as they pass through the transformer layers. In this paper, we demonstrate the functionalities of VL-InterpreT through the analysis of KD-VLP, an end-to-end pretraining vision-language multimodal transformer-based model, in the tasks of Visual Commonsense Reasoning (VCR) and WebQA, two visual question answering benchmarks. Furthermore, we also present a few interesting findings about multimodal transformer behaviors that were learned through our tool.
△ Less
Submitted 22 August, 2022; v1 submitted 30 March, 2022;
originally announced March 2022.
-
Chandra view of Abell 407: the central compact group of galaxies and the interaction between the radio AGN and the ICM
Authors:
Chao Geng,
Chong Ge,
Dharam V. Lal,
Ming Sun,
Li Ji,
Haiguang Xu,
Wenhao Liu,
Martin Hardcastle,
William Forman,
Ralph Kraft,
Christine Jones
Abstract:
Abell 407 (A407) is a unique galaxy cluster hosting a central compact group of nine galaxies (named as 'Zwicky's Nonet'; G1 - G9 in this work) within a 30 kpc radius region. The cluster core also hosts a luminous radio active galactic nucleus (AGN), 4C 35.06 with helically twisted jets extending over 200 kpc. With a 44 ks Chandra observation of A407, we characterize the X-ray properties of its int…
▽ More
Abell 407 (A407) is a unique galaxy cluster hosting a central compact group of nine galaxies (named as 'Zwicky's Nonet'; G1 - G9 in this work) within a 30 kpc radius region. The cluster core also hosts a luminous radio active galactic nucleus (AGN), 4C 35.06 with helically twisted jets extending over 200 kpc. With a 44 ks Chandra observation of A407, we characterize the X-ray properties of its intracluster medium (ICM) and central galaxies. The mean X-ray temperature of A407 is 2.7 keV and the $M_{200}$ is $1.9 \times 10^{14} {M_{\odot}}$. We suggest that A407 has a weak cool core at $r < 60$ kpc scales and at its very center, $< 1$-2 kpc radius, a small galaxy corona associated with the strong radio AGN. We also conclude that the AGN 4C 35.06 host galaxy is most likely G3. We suggest that the central group of galaxies is undergoing a `slow merge' procedure. The range of the merging time-scale is $0.3\sim2.3$ Gyr and the stellar mass of the future brightest cluster galaxy (BCG) will be $7.4\times10^{11} M_{\odot}$. We find that the regions which overlap with the radio jets have higher temperature and metallicity. This is consistent with AGN feedback activity. The central entropy is higher than that for other clusters, which may be due to the AGN feedback and/or merging activity. With all these facts, we suggest that A407 is a unique and rare system in the local universe that could help us to understand the formation of a massive BCG.
△ Less
Submitted 9 February, 2022; v1 submitted 8 February, 2022;
originally announced February 2022.
-
The unusually weak and exceptionally steep radio relic in Abell 2108
Authors:
Gerrit Schellenberger,
Simona Giacintucci,
Lorenzo Lovisari,
Ewan O'Sullivan,
Jan Vrtilek,
Laurence P. David,
Jean-Baptiste Melin,
Dharam Vir Lal,
Stefano Ettori,
Konstantinos Kolokythas,
Mauro Sereno,
Somak Raychaudhury
Abstract:
Mergers between galaxy clusters often drive shocks into the intra cluster medium (ICM), the effects of which are sometimes visible via temperature and density jumps in the X-ray, and via radio emission from relativistic particles energized by the shock's passage. Abell2108 was selected as a likely merger system through comparing the X-ray luminosity to the Planck Sunyaev-Zeldovich signal, where th…
▽ More
Mergers between galaxy clusters often drive shocks into the intra cluster medium (ICM), the effects of which are sometimes visible via temperature and density jumps in the X-ray, and via radio emission from relativistic particles energized by the shock's passage. Abell2108 was selected as a likely merger system through comparing the X-ray luminosity to the Planck Sunyaev-Zeldovich signal, where this cluster appeared highly X-ray underluminous. Follow up observations confirmed it to be a merging low mass cluster featuring two distinct subclusters, both with a highly disturbed X-ray morphology. Giant Metrewave Radio Telescope (GMRT) data covering 120-750MHz show an extended radio feature resembling a radio relic, near the location of a temperature discontinuity in the X-rays. We measure a Mach number from the X-ray temperature jump. Several characteristics of radio relics are found in Abell2108, making this cluster one of the few low mass mergers likely hosting a radio relic. The radio spectrum is exceptionally steep, and the radio power is very weak (P1.4GHz=1E22W/Hz). To account for the shock/relic offset, we propose a scenario in which the shock created the relic by re-accelerating a cloud of pre-existing relativistic electrons and then moved away, leaving behind a fading relic. The electron aging timescale derived from the high-frequency steepening in the relic spectrum is consistent with the shock travel time to the observed X-ray discontinuity. However, the lower flux in GMRT band 4 data causing the steepening could be due to instrumental limitations, and deeper radio data are needed to constrain the spectral slope of the relic at high frequencies.
△ Less
Submitted 17 November, 2021;
originally announced November 2021.
-
Gas dynamics in the star forming region G18.148$-$0.283: Is it a manifestation of two colliding molecular clouds?
Authors:
Jyotirmoy Dey,
Jagadheep D. Pandian,
Dharam Vir Lal
Abstract:
We report the results obtained from a multi-wavelength study of the HII region, G18.148$-$0.283, using the upgraded Giant Metre-wave Radio Telescope (uGMRT) at 1350 MHz along with other archival data. In addition to the radio continuum emission, we have detected the H169$α$ and H170$α$ radio recombination lines towards G18.148$-$0.283 using a correlator bandwidth of 100 MHz. The moment-1 map of th…
▽ More
We report the results obtained from a multi-wavelength study of the HII region, G18.148$-$0.283, using the upgraded Giant Metre-wave Radio Telescope (uGMRT) at 1350 MHz along with other archival data. In addition to the radio continuum emission, we have detected the H169$α$ and H170$α$ radio recombination lines towards G18.148$-$0.283 using a correlator bandwidth of 100 MHz. The moment-1 map of the ionized gas reveals a velocity gradient of approximately 10 km s$^{-1}$ across the radio continuum peaks. The $^{12}$CO ($J$=3$-$2) molecular line data from the COHRS survey also shows the presence of two velocity components that are very close to the velocities detected in the ionized gas. The spectrum and position-velocity diagram from CO emission reveal molecular gas at an intermediate velocity range bridging the velocity components. We see mid-infrared absorption and far-infrared emission establishing the presence of a filamentary infrared dark cloud, the extent of which includes the targeted HII region. The magnetic field inferred from dust polarization is perpendicular to the filament within the HII region. We have also identified two O9 stars and 30 young stellar objects towards the target using data from the 2MASS, UKIDSS, and GLIMPSE surveys. Cumulatively, this suggests that the region is the site of a cloud-cloud collision that has triggered massive star formation and subsequent formation of an HII region.
△ Less
Submitted 15 November, 2021;
originally announced November 2021.
-
KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation
Authors:
Yongfei Liu,
Chenfei Wu,
Shao-yen Tseng,
Vasudev Lal,
Xuming He,
Nan Duan
Abstract:
Self-supervised vision-and-language pretraining (VLP) aims to learn transferable multi-modal representations from large-scale image-text data and to achieve strong performances on a broad scope of vision-language tasks after finetuning. Previous mainstream VLP approaches typically adopt a two-step strategy relying on external object detectors to encode images in a multi-modal Transformer framework…
▽ More
Self-supervised vision-and-language pretraining (VLP) aims to learn transferable multi-modal representations from large-scale image-text data and to achieve strong performances on a broad scope of vision-language tasks after finetuning. Previous mainstream VLP approaches typically adopt a two-step strategy relying on external object detectors to encode images in a multi-modal Transformer framework, which suffer from restrictive object concept space, limited image context and inefficient computation. In this paper, we propose an object-aware end-to-end VLP framework, which directly feeds image grid features from CNNs into the Transformer and learns the multi-modal representations jointly. More importantly, we propose to perform object knowledge distillation to facilitate learning cross-modal alignment at different semantic levels. To achieve that, we design two novel pretext tasks by taking object features and their semantic labels from external detectors as supervision: 1.) Object-guided masked vision modeling task focuses on enforcing object-aware representation learning in the multi-modal Transformer; 2.) Phrase-region alignment task aims to improve cross-modal alignment by utilizing the similarities between noun phrases and object labels in the linguistic space. Extensive experiments on a wide range of vision-language tasks demonstrate the efficacy of our proposed framework, and we achieve competitive or superior performances over the existing pretraining strategies.
△ Less
Submitted 7 August, 2022; v1 submitted 21 September, 2021;
originally announced September 2021.
-
The Discovery of a Remnant Radio Galaxy in A2065 Using GMRT
Authors:
Dharam V. Lal
Abstract:
The upgraded Giant Metrewave Radio Telescope (GMRT) has been used to map the cluster A2065 at z = 0.0726. We report the discovery of a remnant radio galaxy at the peripheral cluster region. The spatially resolved radio emission from the remnant radio galaxy shows an elongated, bar-shaped structure, whose size is $\approx$ 52$^{\prime\prime}$ $\times$ 110$^{\prime\prime}$ ($\simeq$ 72 $\times$ 152…
▽ More
The upgraded Giant Metrewave Radio Telescope (GMRT) has been used to map the cluster A2065 at z = 0.0726. We report the discovery of a remnant radio galaxy at the peripheral cluster region. The spatially resolved radio emission from the remnant radio galaxy shows an elongated, bar-shaped structure, whose size is $\approx$ 52$^{\prime\prime}$ $\times$ 110$^{\prime\prime}$ ($\simeq$ 72 $\times$ 152 kpc$^2$). Our study with the multiwavelength GMRT data and \textit{Chandra} data shows that across the remnant radio galaxy there is a hint of a surface-brightness edge in the hot X-ray gas. We detect tentative flattening of the radio spectral index as the old plasma at the near end of the surface-brightness edge is reinvigorated by the passage of possible shock front and shows the expected change in radio emission characteristics. We suggest that the remnant radio galaxy has been seeded by the lobes of the active galactic nucleus (AGN), hosted by the WISEA J152228.01$+$274141.3 source, demonstrating the connection between AGNs and remnant radio sources. Although the number of known remnant radio sources is beginning to increase, we emphasize the need for better data to understand the physics and nature of poorly understood remnant radio sources.
△ Less
Submitted 5 July, 2021;
originally announced July 2021.
-
A new look at old friends. I. Imaging classical radio galaxies with uGMRT and MeerKAT
Authors:
B. Fanaroff,
D. V. Lal,
T. Venturi,
O. Smirnov,
M. Bondi,
K. Thorat,
L. Bester,
G. Jozsa,
D. Kleiner,
F. Loi,
S. Makhathini,
S. V. White
Abstract:
We have undertaken a systematic study of FRI and FRII radio galaxies with the upgraded Giant Metrewave Radio Telescope (uGMRT) and MeerKAT. The main goal is to explore whether the unprecedented few $μ$Jy sensitivity reached in the range 550-1712 MHz at the resolution of $\sim4^{\prime\prime}-7^{\prime\prime}$ reveals new features in the radio emission which might need us to revise our current clas…
▽ More
We have undertaken a systematic study of FRI and FRII radio galaxies with the upgraded Giant Metrewave Radio Telescope (uGMRT) and MeerKAT. The main goal is to explore whether the unprecedented few $μ$Jy sensitivity reached in the range 550-1712 MHz at the resolution of $\sim4^{\prime\prime}-7^{\prime\prime}$ reveals new features in the radio emission which might need us to revise our current classification scheme for classical radio galaxies. In this paper we present the results for the first set of four radio galaxies, i.e. 4C 12.02, 4C 12.03, CGCG 044-046 and CGCG 021-063. The sources have been selected from the 4C sample with well-defined criteria, and have been imaged with the uGMRT in the range 550-850 MHz (band 4) and with the MeerKAT in the range 856-1712 MHz (L-band). Full resolution images are presented for all sources in the sample, together with MeerKAT in-band spectral images. Additionally, the uGMRT-MeerKAT spectral image and MeerKAT L-band polarisation structure are provided for CGCG 044-046. Our images contain a wealth of morphological details, such as filamentary structure in the emission from the lobes, radio emission beyond the hot-spots in three sources, and misalignments. We briefly discuss the overall properties of CGCG 044-046 in the light of the local environment as well, and show possible restarted activity in 4C 12.03 which needs to be confirmed. We conclude that at least for the sources presented here, the classical FRI/FRII morphological classification still holds with the current improved imaging capabilities, but the richness in details also suggests caution in the systematic morphological classification carried out with automatic procedures in surveys with poorer sensitivity and angular resolution.
△ Less
Submitted 29 May, 2021; v1 submitted 25 May, 2021;
originally announced May 2021.
-
NGC 4869 in the Coma cluster: twist, wrap, overlap and bend
Authors:
Dharam V. Lal
Abstract:
The upgraded Giant Metrewave Radio Telescope (GMRT) has been used to image the head-tail radio galaxy NGC 4869 in the Coma cluster with an angular resolution of 6.26 arcsec at 250-500 MHz and 2.18 arcsec at the 1050-1450 MHz bands. The archival legacy GMRT data have also been used to image the source with angular resolutions from 4.9 to 21.8 arcsec at 610 MHz, 325 MHz, 240 MHz, and 150 MHz. We fin…
▽ More
The upgraded Giant Metrewave Radio Telescope (GMRT) has been used to image the head-tail radio galaxy NGC 4869 in the Coma cluster with an angular resolution of 6.26 arcsec at 250-500 MHz and 2.18 arcsec at the 1050-1450 MHz bands. The archival legacy GMRT data have also been used to image the source with angular resolutions from 4.9 to 21.8 arcsec at 610 MHz, 325 MHz, 240 MHz, and 150 MHz. We find that the ~200 kpc scale radio morphology consists of five distinct regions with the clear presence of a pinch at ~1.4 arcmin (= 38.8 kpc) and a ridge at ~3.4 arcmin (= 94.2 kpc) from the head. The sharp bend by ~70 deg at ~3.5 arcmin (= 97 kpc) from the head is possibly due to projection effects. The radio spectra show progressive spectral steepening as a function of distance from the head and there is possibly re-acceleration of the synchrotron electrons and perhaps also magnetic field re-generation in the 6-208 arcsec (= 2.8-96.1 kpc) region of the jet. We report a steep spectrum sheath layer envelo** a flat spectrum spine, hinting at a transverse velocity structure with a fast-moving spine surrounded by a slow-moving sheath layer. We also derive the lifetimes of the radiating electrons and equipartition parameters. A plausible explanation for the characteristic feature, a ridge of emission perpendicular to the direction of tail is the flaring of a straight, collimated radio jet as it crosses a surface brightness edge due to Kelvin-Helmholtz instabilities.
△ Less
Submitted 15 September, 2020;
originally announced September 2020.
-
Upgraded GMRT observations of the Coma cluster of galaxies: The observations
Authors:
Dharam V. Lal
Abstract:
We have used the upgraded Giant Metrewave Radio Telescope to map the Coma cluster of galaxies at 250-500 MHz and 1050-1450 MHz bands. These 6.26 arcsec and 2.18 arcsec resolutions observations allow detailed radio structures to be determined of all detected radio sources that show both discrete pointlike and extended morphologies. We present images of a subset of 32 brightest (flux density >= 30 m…
▽ More
We have used the upgraded Giant Metrewave Radio Telescope to map the Coma cluster of galaxies at 250-500 MHz and 1050-1450 MHz bands. These 6.26 arcsec and 2.18 arcsec resolutions observations allow detailed radio structures to be determined of all detected radio sources that show both discrete pointlike and extended morphologies. We present images of a subset of 32 brightest (flux density >= 30 mJy) and dominant sources, and several sources show discrete pointlike radio morphologies. We find the steepening of the spectra consistent with synchrotron cooling in the majority of sources and the median for spectral indices is -0.78, suggesting that ~59% sources have steep spectra. The nature and the statistical properties of the radio sources in the Coma cluster will be discussed in subsequent papers.
△ Less
Submitted 15 September, 2020;
originally announced September 2020.
-
VLBI20-30: a scientific roadmap for the next decade -- The future of the European VLBI Network
Authors:
Tiziana Venturi,
Zsolt Paragi,
Michael Lindqvist,
Anna Bartkiewicz,
Rob Beswick,
Tamara Bogdanović,
Walter Brisken,
Patrick Charlot,
Francisco Colomer,
John Conway,
Sándor Frey,
José Carlos Guirado,
Leonid Gurvits,
Huib van Langevelde,
Andrei Lobanov,
John McKean,
Raffaella Morganti,
Tom Muxlow,
Miguel Pérez-Torres,
Kazi Rygl,
Robert Schulz,
Arpad Szomoru,
Pablo de Vicente,
Tao An,
Guillem Anglada
, et al. (55 additional authors not shown)
Abstract:
This white paper describes the science case for Very Long Baseline Interferometry (VLBI) and provides suggestions towards upgrade paths for the European VLBI Network (EVN). The EVN is a distributed long-baseline radio interferometric array, that operates at the very forefront of astronomical research. Recent results, together with the new science possibilities outlined in this vision document, dem…
▽ More
This white paper describes the science case for Very Long Baseline Interferometry (VLBI) and provides suggestions towards upgrade paths for the European VLBI Network (EVN). The EVN is a distributed long-baseline radio interferometric array, that operates at the very forefront of astronomical research. Recent results, together with the new science possibilities outlined in this vision document, demonstrate the EVN's potential to generate new and exciting results that will transform our view of the cosmos. Together with e-MERLIN, the EVN provides a range of baseline lengths that permit unique studies of faint radio sources to be made over a wide range of spatial scales.
The science cases are reviewed in six chapters that cover the following broad areas: cosmology, galaxy formation and evolution, innermost regions of active galactic nuclei, explosive phenomena and transients, stars and stellar masers in the Milky Way, celestial reference frames and space applications. The document concludes with identifying the synergies with other radio, as well as multi-band/multi-messenger instruments, and provide the recommendations for future improvements. The appendices briefly describe other radio VLBI arrays, the technological framework for EVN developments, and a selection of spectral lines of astrophysical interest below 100 GHz. The document includes a glossary for non-specialists, and a list of acronyms at the end.
△ Less
Submitted 5 July, 2020;
originally announced July 2020.
-
The discovery of radio halos in the Frontier Fields clusters Abell S1063 and Abell 370
Authors:
C. Xie,
R. J. van Weeren,
L. Lovisari,
F. Andrade-Santos,
A. Botteon,
M. Brüggen,
E. Bulbul,
E. Churazov,
T. E. Clarke,
W. R. Forman,
H. T. Intema,
C. Jones,
R. P. Kraft,
D. V. Lal,
T. Mroczkowski,
A. Zitrin
Abstract:
Massive merging galaxy clusters often host diffuse Mpc-scale radio synchrotron emission. This emission originates from relativistic electrons in the ionized intracluster medium (ICM). An important question is how these synchrotron emitting relativistic electrons are accelerated. Our aim is to search for diffuse emission in the Frontier Fields clusters Abell S1063 and Abell 370 and characterize its…
▽ More
Massive merging galaxy clusters often host diffuse Mpc-scale radio synchrotron emission. This emission originates from relativistic electrons in the ionized intracluster medium (ICM). An important question is how these synchrotron emitting relativistic electrons are accelerated. Our aim is to search for diffuse emission in the Frontier Fields clusters Abell S1063 and Abell 370 and characterize its properties. While these clusters are very massive and well studied at some other wavelengths, no diffuse emission has been reported for these clusters so far. We obtained 325 MHz Giant Metrewave Radio Telescope (GMRT) and 1--4 GHz Jansky Very Large Array (VLA) observations of Abell S1063 and Abell 370. We complement these data with Chandra and XMM-Newton X-ray observations. In our sensitive images, we discover radio halos in both clusters. In Abell S1063, a giant radio halo is found with a size of $\sim 1.2$ Mpc. The integrated spectral index between 325 MHz and 1.5 GHz is $-0.94\pm0.08$ and it steepens to $-1.77 \pm 0.20$ between 1.5 and 3.0 GHz. This spectral steepening provides support for the turbulent re-acceleration model for radio halo formation. Abell 370 hosts a faint radio halo mostly centred on the southern part of this binary merging cluster, with a size of $\sim 500-700$ kpc. The spectral index between 325 MHz and 1.5 GHz is $-1.10\pm0.09$. Both radio halos follow the known scaling relation between the cluster mass proxy $Y_{500}$ and radio power, consistent with the idea that they are related to ongoing cluster merger events.
△ Less
Submitted 14 January, 2020;
originally announced January 2020.
-
Chandra Observations of the Spectacular A3411-12 Merger Event
Authors:
Felipe Andrade-Santos,
Reinout J. van Weeren,
Gabriella Di Gennaro,
David Wittman,
Dongsu Ryu,
Dharam Vir Lal,
Vinicius M. Placco,
Kevin Fogarty,
M. James Jee,
Andra Stroe,
David Sobral,
William R. Forman,
Christine Jones,
Ralph P. Kraft,
Stephen S. Murray,
Marcus Brüggen,
Hyesung Kang,
Rafael Santucci,
Nathan Golovich,
William Dawson
Abstract:
We present deep Chandra observations of A3411-12, a remarkable merging cluster that hosts the most compelling evidence for electron re-acceleration at cluster shocks to date. Using the $Y_X-M$ scaling relation, we find $r_{500} \sim 1.3$ Mpc, $M_{500} = (7.1 \pm 0.7) \times 10^{14} \ M_{\rm{\odot}}$, $kT=6.5\pm 0.1$ keV, and a gas mass of $M_{\rm g,500} = (9.7 \pm 0.1) \times 10^{13} M_\odot$. The…
▽ More
We present deep Chandra observations of A3411-12, a remarkable merging cluster that hosts the most compelling evidence for electron re-acceleration at cluster shocks to date. Using the $Y_X-M$ scaling relation, we find $r_{500} \sim 1.3$ Mpc, $M_{500} = (7.1 \pm 0.7) \times 10^{14} \ M_{\rm{\odot}}$, $kT=6.5\pm 0.1$ keV, and a gas mass of $M_{\rm g,500} = (9.7 \pm 0.1) \times 10^{13} M_\odot$. The gas mass fraction within $r_{500}$ is $f_{\rm g} = 0.14 \pm 0.01$. We compute the shock strength using density jumps to conclude that the Mach number of the merging subcluster is small ($M \leq 1.15_{-0.09}^{+0.14}$). We also present pseudo-density, projected temperature, pseudo-pressure, and pseudo-entropy maps. Based on the pseudo-entropy map we conclude that the cluster is undergoing a mild merger, consistent with the small Mach number. On the other hand, radio relics extend over Mpc scale in the A3411-12 system, which strongly suggests that a population of energetic electrons already existed over extended regions of the cluster.
△ Less
Submitted 16 October, 2019;
originally announced October 2019.
-
GMRT Low-frequency Imaging of an Extended Sample of X-shaped Radio Galaxies
Authors:
Dharam V. Lal,
Biny Sebastian,
C. C. Cheung,
A. Pramesh Rao
Abstract:
We present a low-frequency imaging study of an extended sample of X-shaped radio sources using the Giant Metrewave radio telescope (GMRT) at two frequencies (610 and 240 MHz). The sources were drawn from a Very Large Array FIRST-selected sample and extends an initial GMRT study at the same frequencies, of 12 X-shaped radio galaxies predominantly from the 3CR catalog (Lal & Rao 2007). Both the inte…
▽ More
We present a low-frequency imaging study of an extended sample of X-shaped radio sources using the Giant Metrewave radio telescope (GMRT) at two frequencies (610 and 240 MHz). The sources were drawn from a Very Large Array FIRST-selected sample and extends an initial GMRT study at the same frequencies, of 12 X-shaped radio galaxies predominantly from the 3CR catalog (Lal & Rao 2007). Both the intensity maps and spectral index maps of the 16 newly observed sources are presented. With the combined sample of 28 X-shaped radio sources, we found no systematic differences in the spectral properties of the higher surface brightness, active lobes versus the lower surface brightness, off-axis emission. The properties of the combined sample are discussed, including the possible role of a twin active galactic nuclei model in the formation of such objects.
△ Less
Submitted 27 March, 2019;
originally announced March 2019.
-
Non-thermal emission from massive star forming regions: A possible SNR candidate G351.7-1.2?
Authors:
V. S. Veena,
S. Vig,
B. Sebastian,
D. V. Lal,
A. Tej,
S. K. Ghosh
Abstract:
We present low frequency wide band observations (300-500 MHz) of the star forming complex G351.7-1.2 using upgraded Giant Metrewave Radio Telescope (uGMRT), India. Combining this with the optical, infrared and submillimeter data, we analyse the large scale diffuse radio emission associated with the region that exhibits a broken shell morphology. The spectral index of the emission in the shell is -…
▽ More
We present low frequency wide band observations (300-500 MHz) of the star forming complex G351.7-1.2 using upgraded Giant Metrewave Radio Telescope (uGMRT), India. Combining this with the optical, infrared and submillimeter data, we analyse the large scale diffuse radio emission associated with the region that exhibits a broken shell morphology. The spectral index of the emission in the shell is -0.8, indicating non-thermal emission. H-alpha emission that mimics the morphology of the radio shell on a smaller scale is also detected here. Based on the non-thermal emission from the radio shell and the presence of its optical counterpart, we classify G351.7-1.2 as a candidate SNR. A gamma-ray source detected by Fermi LAT (1FGLJ1729.1-3641c) is located towards the south-west of the radio shell and could have a possible origin in the interaction between high velocity particles from the SNR and the ambient molecular cloud.
△ Less
Submitted 5 November, 2018;
originally announced November 2018.
-
AGN Feedback in Galaxy Group 3C 88: Cavities, Shock and Jet Reorientation
Authors:
Wenhao Liu,
Ming Sun,
Paul Nulsen,
Tracy Clarke,
Craig Sarazin,
William Forman,
Massimo Gaspari,
Simona Giacintucci,
Dharam Vir Lal,
Tim Edge
Abstract:
We present results from the deep Chandra observation (105 ksec), together with new Giant Metrewave Radio Telescope and Very Large Array data of the AGN outburst in the radio-loud galaxy group 3C 88. The system shows a prominent X-ray cavity on the eastern side with a diameter of $\sim$50 kpc at $\sim28$ kpc from the nucleus. The total enthalpy of the cavity is $3.8\times10^{58}$ erg and the averag…
▽ More
We present results from the deep Chandra observation (105 ksec), together with new Giant Metrewave Radio Telescope and Very Large Array data of the AGN outburst in the radio-loud galaxy group 3C 88. The system shows a prominent X-ray cavity on the eastern side with a diameter of $\sim$50 kpc at $\sim28$ kpc from the nucleus. The total enthalpy of the cavity is $3.8\times10^{58}$ erg and the average power required to inflate the X-ray bubble is $\sim2.0\times10^{43}$ erg s^-1. From surface brightness profiles we detect a shock with a Mach number of $M=1.4\pm0.2$, consistent with the value obtained from temperature jump. The shock energy is estimated to be $1.9\times10^{59}$ erg. The size and total enthalpy of the cavity in 3C 88 are the largest known in galaxy groups, as well as the shock energy. The eastern X-ray cavity is not aligned with the radio jet axis. This factor, combined with the radio morphology, strongly suggests jet reorientation in the last tens of million years. The bright rim and arm features surrounding the cavity show metallicity enhancement, suggesting they originated as high metallicity gas from the group center, lifted by the rising X-ray bubbles. Our Chandra study of 3C 88 also reveals that galaxy groups with powerful radio AGN can have high cavity power, although deep X-ray observations are typically required to confirm the cavities in galaxy groups.
△ Less
Submitted 11 February, 2019; v1 submitted 12 June, 2018;
originally announced June 2018.
-
Giant Metrewave Radio Telescope Observations of Head-Tail Radio Galaxies
Authors:
Biny Sebastian,
Dharam V. Lal,
A. Pramesh Rao
Abstract:
We present results from a study of seven large known head-tail radio galaxies based on observations using the Giant Metrewave Radio Telescope at 240 and 610 MHz. These observations are used to study the radio morphologies and distribution of the spectral indices across the sources. The overall morphology of the radio tails of these sources is suggestive of random motions of the optical host around…
▽ More
We present results from a study of seven large known head-tail radio galaxies based on observations using the Giant Metrewave Radio Telescope at 240 and 610 MHz. These observations are used to study the radio morphologies and distribution of the spectral indices across the sources. The overall morphology of the radio tails of these sources is suggestive of random motions of the optical host around the cluster potential. The presence of the multiple bends an d wiggles in several head-tail sources is possibly due to the precessing radio jets. We find steepening of the spectral index along the radio tails. The prevailing equipartition magnetic field also decreases a long the radio tails of these sources. These steepening trends are attributed to the synchrotron aging of plasma toward the ends of the tails. The dynamical ages of these sample sources have been estimated to be ~100 Myr, which is a factor of six more than the age estimates from the radiative losses due to synchrotron cooling.
△ Less
Submitted 21 September, 2017;
originally announced September 2017.
-
NuSTAR + XMM-Newton monitoring of the neutron star transient AX J1745.6-2901
Authors:
G. Ponti,
S. Bianchi,
T. Munos-Darias,
K. Mori,
K. De,
A. Rau,
B. De Marco,
C. Hailey,
J. Tomsick,
K. K. Madsen,
M. Clavel,
F. Rahoui,
D. V. Lal,
S. Roy,
D. Stern
Abstract:
AX J1745.6-2901 is a high-inclination (eclipsing) transient neutron star (NS) Low Mass X-ray Binary (LMXB) showcasing intense ionised Fe K absorption. We present here the analysis of 11 XMM-Newton and 15 NuSTAR new data-sets (obtained between 2013-2016), therefore tripling the number of observations of AX J1745.6-2901 in outburst. Thanks to simultaneous XMM-Newton and NuSTAR spectra, we greatly im…
▽ More
AX J1745.6-2901 is a high-inclination (eclipsing) transient neutron star (NS) Low Mass X-ray Binary (LMXB) showcasing intense ionised Fe K absorption. We present here the analysis of 11 XMM-Newton and 15 NuSTAR new data-sets (obtained between 2013-2016), therefore tripling the number of observations of AX J1745.6-2901 in outburst. Thanks to simultaneous XMM-Newton and NuSTAR spectra, we greatly improve on the fitting of the X-ray continuum. During the soft state the emission can be described by a disk black body ($kT\sim1.1-1.2$ keV and inner disc radius $r_{DBB}\sim14$ km), plus hot ($kT\sim2.2-3.0$ keV) black body radiation with a small emitting radius ($r_{BB}\sim0.5-0.8$ km) likely associated with the boundary layer or NS surface, plus a faint Comptonisation component. Imprinted on the spectra are clear absorption features created by both neutral and ionised matter. Additionally, positive residuals suggestive of an emission Fe K$α$ disc line and consistent with relativistic ionised reflection are present during the soft state, while such residuals are not significant during the hard state. The hard state spectra are characterised by a hard ($Γ\sim1.9-2.1$) power law, showing no evidence for a high energy cut off ($kT_e>60-140$ keV) and implying a small optical depth ($τ<1.6$). The new observations confirm the previously witnessed trend of exhibiting strong Fe K absorption in the soft state, that significantly weakens during the hard state. Optical (GROND) and radio (GMRT) observations suggest for AX J1745.6-2901 a standard broad band SED as typically observed in accreting neutron stars.
△ Less
Submitted 19 September, 2017;
originally announced September 2017.
-
A candidate sub-parsec binary black hole in the Seyfert galaxy NGC 7674
Authors:
Preeti Kharb,
Dharam Vir Lal,
David Merritt
Abstract:
The existence of binary supermassive black holes (SBHs) is predicted by models of hierarchical galaxy formation. To date, only a single binary SBH has been imaged, at a projected separation of 7.3 parsecs. Here we report the detection of a candidate dual SBH with projected separation of 0.35 pc in the gas-rich interacting spiral galaxy NGC 7674 (Mrk 533). This peculiar Seyfert galaxy possesses a…
▽ More
The existence of binary supermassive black holes (SBHs) is predicted by models of hierarchical galaxy formation. To date, only a single binary SBH has been imaged, at a projected separation of 7.3 parsecs. Here we report the detection of a candidate dual SBH with projected separation of 0.35 pc in the gas-rich interacting spiral galaxy NGC 7674 (Mrk 533). This peculiar Seyfert galaxy possesses a $\sim$0.7 kpc Z-shaped radio jet; the leading model for the formation of such sources postulates the presence of an uncoalesced binary SBH created during the infall of a satellite galaxy. Using very long baseline interferometry (VLBI), we imaged the central region of Mrk 533 at radio frequencies of 2, 5, 8 and 15 GHz. Two, possibly inverted-spectrum radio cores were detected at 15 GHz only; the 8-15 GHz spectral indices of the two cores are $\ge-0.33$ and $\ge-0.38$ ($\pm 30\%$), consistent with accreting SBHs. We derive a jet speed $\sim0.28c$ from multi-epoch parsec-scale data of the hotspot region, and a source age $\ge8.2\times10^3$ yrs.
△ Less
Submitted 19 September, 2017;
originally announced September 2017.
-
The two-component giant radio halo in the galaxy cluster Abell 2142
Authors:
T. Venturi,
M. Rossetti,
G. Brunetti,
D. Farnsworth,
F. Gastaldello,
S. Giacintucci,
D. V. Lal,
L. Rudnick,
T. W. Shimwell,
D. Eckert,
S. Molendi,
M. Owers
Abstract:
We report on a spectral study at radio frequencies of the giant radio halo in A2142 (z=0.0909), which we performed to explore its nature and origin. A2142 is not a major merger and the presence of a giant radio halo is somewhat surprising. We performed deep radio observations with the GMRT at 608 MHz, 322 MHz, and 234 MHz and with the VLA in the 1-2 GHz band. We obtained high-quality images at all…
▽ More
We report on a spectral study at radio frequencies of the giant radio halo in A2142 (z=0.0909), which we performed to explore its nature and origin. A2142 is not a major merger and the presence of a giant radio halo is somewhat surprising. We performed deep radio observations with the GMRT at 608 MHz, 322 MHz, and 234 MHz and with the VLA in the 1-2 GHz band. We obtained high-quality images at all frequencies in a wide range of resolutions. The radio halo is well detected at all frequencies and extends out to the most distant cold front in A2142. We studied the spectral index in two regions: the central part of the halo and a second region in the direction of the most distant south-eastern cold front, selected to follow the bright part of the halo and X-ray emission. We complemented our observations with a preliminary LOFAR image at 118 MHz and with the re-analysis of archival VLA data at 1.4 GHz. The two components of the radio halo show different observational properties. The central brightest part has higher surface brightess and a spectrum whose steepness is similar to those of the known radio halos, i.e. $α^{\rm 1.78~GHz}_{\rm 118~MHz}=1.33\pm 0.08$. The ridge, which fades into the larger scale emission, is broader in size and has considerably lower surface brightess and a moderately steeper spectrum, i.e. $α^{\rm 1.78~GHz}_{\rm 118~MHz}\sim 1.5$. We propose that the brightest part of the radio halo is powered by the central sloshing in A2142, similar to what has been suggested for mini-halos, or by secondary electrons generated by hadronic collisions in the ICM. On the other hand, the steeper ridge may probe particle re-acceleration by turbulence generated either by stirring the gas and magnetic fields on a larger scale or by less energetic mechanisms, such as continuous infall of galaxy groups or an off-axis merger.
△ Less
Submitted 20 March, 2017;
originally announced March 2017.
-
The Case for Electron Re-Acceleration at Galaxy Cluster Shocks
Authors:
Reinout J. van Weeren,
Felipe Andrade-Santos,
William A. Dawson,
Nathan Golovich,
Dharam V. Lal,
Hyesung Kang,
Dongsu Ryu,
Marcus Brüggen,
Georgiana A. Ogrean,
William R. Forman,
Christine Jones,
Vinicius M. Placco,
Rafael M. Santucci,
David Wittman,
M. James Jee,
Ralph P. Kraft,
David Sobral,
Andra Stroe,
Kevin Fogarty
Abstract:
On the largest scales, the Universe consists of voids and filaments making up the cosmic web. Galaxy clusters are located at the knots in this web, at the intersection of filaments. Clusters grow through accretion from these large-scale filaments and by mergers with other clusters and groups. In a growing number of galaxy clusters, elongated Mpc-size radio sources have been found, so-called radio…
▽ More
On the largest scales, the Universe consists of voids and filaments making up the cosmic web. Galaxy clusters are located at the knots in this web, at the intersection of filaments. Clusters grow through accretion from these large-scale filaments and by mergers with other clusters and groups. In a growing number of galaxy clusters, elongated Mpc-size radio sources have been found, so-called radio relics. These relics are thought to trace relativistic electrons in the intracluster plasma accelerated by low-Mach number collisionless shocks generated by cluster-cluster merger events. A long-standing problem is how low-Mach number shocks can accelerate electrons so efficiently to explain the observed radio relics. Here we report on the discovery of a direct connection between a radio relic and a radio galaxy in the merging galaxy cluster Abell 3411-3412. This discovery indicates that fossil relativistic electrons from active galactic nuclei are re-accelerated at cluster shocks. It also implies that radio galaxies play an important role in governing the non-thermal component of the intracluster medium in merging clusters.
△ Less
Submitted 9 January, 2017; v1 submitted 5 January, 2017;
originally announced January 2017.
-
Clusters of galaxies and the cosmic web with SKA
Authors:
Ruta Kale,
K. S. Dwarakanath,
Dharam Vir Lal,
Joydeep Bagchi,
Surajit Paul,
Siddharth Malu,
Abhirup Datta,
Viral Parekh,
Prateek Sharma,
Mamta Pandey-Pommier
Abstract:
The intra-cluster and inter-galactic media (ICM, IGM) that pervade the large scale structure of the Universe are known to be magnetised at sub-micro Gauss to micro Gauss levels and to contain cosmic rays (CRs). The acceleration of CRs and their evolution along with that of magnetic fields in these media is still not well understood. Diffuse radio sources of synchrotron origin associated with the I…
▽ More
The intra-cluster and inter-galactic media (ICM, IGM) that pervade the large scale structure of the Universe are known to be magnetised at sub-micro Gauss to micro Gauss levels and to contain cosmic rays (CRs). The acceleration of CRs and their evolution along with that of magnetic fields in these media is still not well understood. Diffuse radio sources of synchrotron origin associated with the ICM such as radio halos, relics and mini-halos are direct probes of the underlying mechanisms of CR acceleration. Observations with radiotelescopes such as the GMRT, the VLA and the WSRT (0.15 - 2 GHz) have revealed scaling relations between the thermal and non-thermal properties of clusters and favour the role of shocks in the formation of radio relics and of turbulent re-acceleration in the formation of radio halos and mini-halos. Due to the limitations of current radio telescopes, wide-band studies and exploration of low mass and supercluster-scale systems is difficult. The Square Kilometer Array (SKA) is a next generation radio telescope that will operate in the frequency range of 0.05 - 20 GHz with unprecedented sensitivities and resolutions. The expected detection limits of SKA will reveal a few hundred to thousand new radio halos, relics and mini-halos providing the first large and comprehensive samples for their study. The wide frequency coverage along with sensitivity to extended structures will be able to constrain the CR acceleration mechanisms. The higher frequency (> 5 GHz) observations will be able to use the Sunyaev-Zel'dovich effect to probe the ICM pressure in addition to the tracers such as lobes of head-tail radio sources. The SKA also opens prospects to detect the "off-state" radio emission from the ICM predicted by the hadronic models and the turbulent re-acceleration models. [abridged]
△ Less
Submitted 26 October, 2016;
originally announced October 2016.