Search | arXiv e-print repository

Estimating the Hallucination Rate of Generative AI

Authors: Andrew Jesson, Nicolas Beltran-Velez, Quentin Chu, Sweta Karlekar, Jannik Kossen, Yarin Gal, John P. Cunningham, David Blei

Abstract: This work is about estimating the hallucination rate for in-context learning (ICL) with Generative AI. In ICL, a conditional generative model (CGM) is prompted with a dataset and asked to make a prediction based on that dataset. The Bayesian interpretation of ICL assumes that the CGM is calculating a posterior predictive distribution over an unknown Bayesian model of a latent parameter and data. W… ▽ More This work is about estimating the hallucination rate for in-context learning (ICL) with Generative AI. In ICL, a conditional generative model (CGM) is prompted with a dataset and asked to make a prediction based on that dataset. The Bayesian interpretation of ICL assumes that the CGM is calculating a posterior predictive distribution over an unknown Bayesian model of a latent parameter and data. With this perspective, we define a \textit{hallucination} as a generated prediction that has low-probability under the true latent parameter. We develop a new method that takes an ICL problem -- that is, a CGM, a dataset, and a prediction question -- and estimates the probability that a CGM will generate a hallucination. Our method only requires generating queries and responses from the model and evaluating its response log probability. We empirically evaluate our method on synthetic regression and natural language ICL tasks using large language models. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.05227 [pdf, other]

Mixed-Curvature Decision Trees and Random Forests

Authors: Philippe Chlenski, Quentin Chu, Itsik Pe'er

Abstract: We extend decision tree and random forest algorithms to mixed-curvature product spaces. Such spaces, defined as Cartesian products of Euclidean, hyperspherical, and hyperbolic manifolds, can often embed points from pairwise distances with much lower distortion than in single manifolds. To date, all classifiers for product spaces fit a single linear decision boundary, and no regressor has been desc… ▽ More We extend decision tree and random forest algorithms to mixed-curvature product spaces. Such spaces, defined as Cartesian products of Euclidean, hyperspherical, and hyperbolic manifolds, can often embed points from pairwise distances with much lower distortion than in single manifolds. To date, all classifiers for product spaces fit a single linear decision boundary, and no regressor has been described. Our method overcomes these limitations by enabling simple, expressive classification and regression in product manifolds. We demonstrate the superior accuracy of our tool compared to Euclidean methods operating in the ambient space for component manifolds covering a wide range of curvatures, as well as on a selection of product manifolds. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2405.06699 [pdf]

ChatSOS: Vector Database Augmented Generative Question Answering Assistant in Safety Engineering

Authors: Haiyang Tang, Dong** Chen, Qingzhao Chu

Abstract: With the rapid advancement of natural language processing technologies, generative artificial intelligence techniques, represented by large language models (LLMs), are gaining increasing prominence and demonstrating significant potential for applications in safety engineering. However, fundamental LLMs face constraints such as limited training data coverage and unreliable responses. This study dev… ▽ More With the rapid advancement of natural language processing technologies, generative artificial intelligence techniques, represented by large language models (LLMs), are gaining increasing prominence and demonstrating significant potential for applications in safety engineering. However, fundamental LLMs face constraints such as limited training data coverage and unreliable responses. This study develops a vector database from 117 explosion accident reports in China spanning 2013 to 2023, employing techniques such as corpus segmenting and vector embedding. By utilizing the vector database, which outperforms the relational database in information retrieval quality, we provide LLMs with richer, more relevant knowledge. Comparative analysis of LLMs demonstrates that ChatSOS significantly enhances reliability, accuracy, and comprehensiveness, improves adaptability and clarification of responses. These results illustrate the effectiveness of supplementing LLMs with an external database, highlighting their potential to handle professional queries in safety engineering and laying a foundation for broader applications. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2404.00513 [pdf, other]

Transformer based Pluralistic Image Completion with Reduced Information Loss

Authors: Qiankun Liu, Yuqi Jiang, Zhentao Tan, Dongdong Chen, Ying Fu, Qi Chu, Gang Hua, Nenghai Yu

Abstract: Transformer based methods have achieved great success in image inpainting recently. However, we find that these solutions regard each pixel as a token, thus suffering from an information loss issue from two aspects: 1) They downsample the input image into much lower resolutions for efficiency consideration. 2) They quantize $256^3$ RGB values to a small number (such as 512) of quantized color valu… ▽ More Transformer based methods have achieved great success in image inpainting recently. However, we find that these solutions regard each pixel as a token, thus suffering from an information loss issue from two aspects: 1) They downsample the input image into much lower resolutions for efficiency consideration. 2) They quantize $256^3$ RGB values to a small number (such as 512) of quantized color values. The indices of quantized pixels are used as tokens for the inputs and prediction targets of the transformer. To mitigate these issues, we propose a new transformer based framework called "PUT". Specifically, to avoid input downsampling while maintaining computation efficiency, we design a patch-based auto-encoder P-VQVAE. The encoder converts the masked image into non-overlapped patch tokens and the decoder recovers the masked regions from the inpainted tokens while kee** the unmasked regions unchanged. To eliminate the information loss caused by input quantization, an Un-quantized Transformer is applied. It directly takes features from the P-VQVAE encoder as input without any quantization and only regards the quantized tokens as prediction targets. Furthermore, to make the inpainting process more controllable, we introduce semantic and structural conditions as extra guidance. Extensive experiments show that our method greatly outperforms existing transformer based methods on image fidelity and achieves much higher diversity and better fidelity than state-of-the-art pluralistic inpainting methods on complex large-scale datasets (e.g., ImageNet). Codes are available at https://github.com/liuqk3/PUT. △ Less

Submitted 14 April, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

Comments: Accepted by TPAMI (2024). arXiv admin note: text overlap with arXiv:2205.05076

arXiv:2403.18405 [pdf, other]

Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval

Authors: Shengjie Ma, Chong Chen, Qi Chu, Jiaxin Mao

Abstract: Collecting relevant judgments for legal case retrieval is a challenging and time-consuming task. Accurately judging the relevance between two legal cases requires a considerable effort to read the lengthy text and a high level of domain expertise to extract Legal Facts and make juridical judgments. With the advent of advanced large language models, some recent studies have suggested that it is pro… ▽ More Collecting relevant judgments for legal case retrieval is a challenging and time-consuming task. Accurately judging the relevance between two legal cases requires a considerable effort to read the lengthy text and a high level of domain expertise to extract Legal Facts and make juridical judgments. With the advent of advanced large language models, some recent studies have suggested that it is promising to use LLMs for relevance judgment. Nonetheless, the method of employing a general large language model for reliable relevance judgments in legal case retrieval is yet to be thoroughly explored. To fill this research gap, we devise a novel few-shot workflow tailored to the relevant judgment of legal cases. The proposed workflow breaks down the annotation process into a series of stages, imitating the process employed by human annotators and enabling a flexible integration of expert reasoning to enhance the accuracy of relevance judgments. By comparing the relevance judgments of LLMs and human experts, we empirically show that we can obtain reliable relevance judgments with the proposed workflow. Furthermore, we demonstrate the capacity to augment existing legal case retrieval models through the synthesis of data generated by the large language model. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.02148 [pdf, other]

MiM-ISTD: Mamba-in-Mamba for Efficient Infrared Small Target Detection

Authors: Tianxiang Chen, Zi Ye, Zhentao Tan, Tao Gong, Yue Wu, Qi Chu, Bin Liu, Nenghai Yu, Jie** Ye

Abstract: Recently, infrared small target detection (ISTD) has made significant progress, thanks to the development of basic models. Specifically, the models combining CNNs with transformers can successfully extract both local and global features. However, the disadvantage of the transformer is also inherited, i.e., the quadratic computational complexity to sequence length. Inspired by the recent basic mode… ▽ More Recently, infrared small target detection (ISTD) has made significant progress, thanks to the development of basic models. Specifically, the models combining CNNs with transformers can successfully extract both local and global features. However, the disadvantage of the transformer is also inherited, i.e., the quadratic computational complexity to sequence length. Inspired by the recent basic model with linear complexity for long-distance modeling, Mamba, we explore the potential of this state space model for ISTD task in terms of effectiveness and efficiency in the paper. However, directly applying Mamba achieves suboptimal performances due to the insufficient harnessing of local features, which are imperative for detecting small targets. Instead, we tailor a nested structure, Mamba-in-Mamba (MiM-ISTD), for efficient ISTD. It consists of Outer and Inner Mamba blocks to adeptly capture both global and local features. Specifically, we treat the local patches as "visual sentences" and use the Outer Mamba to explore the global information. We then decompose each visual sentence into sub-patches as "visual words" and use the Inner Mamba to further explore the local information among words in the visual sentence with negligible computational costs. By aggregating the visual word and visual sentence features, our MiM-ISTD can effectively explore both global and local information. Experiments on NUAA-SIRST and IRSTD-1k show the superior accuracy and efficiency of our method. Specifically, MiM-ISTD is $8 \times$ faster than the SOTA method and reduces GPU memory usage by 62.2$\%$ when testing on $2048 \times 2048$ images, overcoming the computation and memory constraints on high-resolution infrared images. △ Less

Submitted 24 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: The first Mamba-based model for infrared small target detection

arXiv:2402.02327 [pdf, other]

Bootstrap** Audio-Visual Segmentation by Strengthening Audio Cues

Authors: Tianxiang Chen, Zhentao Tan, Tao Gong, Qi Chu, Yue Wu, Bin Liu, Le Lu, Jie** Ye, Nenghai Yu

Abstract: How to effectively interact audio with vision has garnered considerable interest within the multi-modality research field. Recently, a novel audio-visual segmentation (AVS) task has been proposed, aiming to segment the sounding objects in video frames under the guidance of audio cues. However, most existing AVS methods are hindered by a modality imbalance where the visual features tend to dominate… ▽ More How to effectively interact audio with vision has garnered considerable interest within the multi-modality research field. Recently, a novel audio-visual segmentation (AVS) task has been proposed, aiming to segment the sounding objects in video frames under the guidance of audio cues. However, most existing AVS methods are hindered by a modality imbalance where the visual features tend to dominate those of the audio modality, due to a unidirectional and insufficient integration of audio cues. This imbalance skews the feature representation towards the visual aspect, impeding the learning of joint audio-visual representations and potentially causing segmentation inaccuracies. To address this issue, we propose AVSAC. Our approach features a Bidirectional Audio-Visual Decoder (BAVD) with integrated bidirectional bridges, enhancing audio cues and fostering continuous interplay between audio and visual modalities. This bidirectional interaction narrows the modality imbalance, facilitating more effective learning of integrated audio-visual representations. Additionally, we present a strategy for audio-visual frame-wise synchrony as fine-grained guidance of BAVD. This strategy enhances the share of auditory components in visual features, contributing to a more balanced audio-visual representation learning. Extensive experiments show that our method attains new benchmarks in AVS performance. △ Less

Submitted 6 February, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

arXiv:2402.02046 [pdf, other]

TCI-Former: Thermal Conduction-Inspired Transformer for Infrared Small Target Detection

Authors: Tianxiang Chen, Zhentao Tan, Qi Chu, Yue Wu, Bin Liu, Nenghai Yu

Abstract: Infrared small target detection (ISTD) is critical to national security and has been extensively applied in military areas. ISTD aims to segment small target pixels from background. Most ISTD networks focus on designing feature extraction blocks or feature fusion modules, but rarely describe the ISTD process from the feature map evolution perspective. In the ISTD process, the network attention gra… ▽ More Infrared small target detection (ISTD) is critical to national security and has been extensively applied in military areas. ISTD aims to segment small target pixels from background. Most ISTD networks focus on designing feature extraction blocks or feature fusion modules, but rarely describe the ISTD process from the feature map evolution perspective. In the ISTD process, the network attention gradually shifts towards target areas. We abstract this process as the directional movement of feature map pixels to target areas through convolution, pooling and interactions with surrounding pixels, which can be analogous to the movement of thermal particles constrained by surrounding variables and particles. In light of this analogy, we propose Thermal Conduction-Inspired Transformer (TCI-Former) based on the theoretical principles of thermal conduction. According to thermal conduction differential equation in heat dynamics, we derive the pixel movement differential equation (PMDE) in the image domain and further develop two modules: Thermal Conduction-Inspired Attention (TCIA) and Thermal Conduction Boundary Module (TCBM). TCIA incorporates finite difference method with PMDE to reach a numerical approximation so that target body features can be extracted. To further remove errors in boundary areas, TCBM is designed and supervised by boundary masks to refine target body features with fine boundary details. Experiments on IRSTD-1k and NUAA-SIRST demonstrate the superiority of our method. △ Less

Submitted 3 February, 2024; originally announced February 2024.

arXiv:2401.12737 [pdf]

doi 10.1515/nanoph-2023-0754

Controlling thermal emission with metasurfaces and its applications

Authors: Qiongqiong Chu, Fan Zhong, Xiaohe Shang, Ye Zhang, Shining Zhu, Hui Liu

Abstract: Thermal emission caused by the thermal motion of the charged particles is commonly broadband, un-polarized, and incoherent, like a melting pot of electromagnetic waves, which makes it unsuitable for infrared applications in many cases requiring specific thermal emission properties. Metasurfaces, characterized by two-dimensional subwavelength artificial nanostructures, have been extensively investi… ▽ More Thermal emission caused by the thermal motion of the charged particles is commonly broadband, un-polarized, and incoherent, like a melting pot of electromagnetic waves, which makes it unsuitable for infrared applications in many cases requiring specific thermal emission properties. Metasurfaces, characterized by two-dimensional subwavelength artificial nanostructures, have been extensively investigated for their flexibility in tuning optical properties, which provide an ideal platform for sha** thermal emission. Recently, remarkable progress was achieved not only in tuning thermal emission in multiple degrees of freedom, such as wavelength, polarization, radiation angle, coherence, and so on but also in applications of compact and integrated optical devices. Here, we review the recent advances in the regulation of thermal emission through metasurfaces and corresponding infrared applications, such as infrared sensing, radiative cooling, and thermophotovoltaic devices. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: 28 pages, 10 figures

Journal ref: Nanophotonics

arXiv:2312.08629 [pdf]

ChatSOS: LLM-based knowledge Q&A system for safety engineering

Authors: Haiyang Tang, Zhenyi Liu, Dong** Chen, Qingzhao Chu

Abstract: Recent advancements in large language models (LLMs) have notably propelled natural language processing (NLP) capabilities, demonstrating significant potential in safety engineering applications. Despite these advancements, LLMs face constraints in processing specialized tasks, attributed to factors such as corpus size, input processing limitations, and privacy concerns. Obtaining useful informatio… ▽ More Recent advancements in large language models (LLMs) have notably propelled natural language processing (NLP) capabilities, demonstrating significant potential in safety engineering applications. Despite these advancements, LLMs face constraints in processing specialized tasks, attributed to factors such as corpus size, input processing limitations, and privacy concerns. Obtaining useful information from reliable sources in a limited time is crucial for LLM. Addressing this, our study introduces an LLM-based Q&A system for safety engineering, enhancing the comprehension and response accuracy of the model. We employed prompt engineering to incorporate external knowledge databases, thus enriching the LLM with up-to-date and reliable information. The system analyzes historical incident reports through statistical methods, utilizes vector embedding to construct a vector database, and offers an efficient similarity-based search functionality. Our findings indicate that the integration of external knowledge significantly augments the capabilities of LLM for in-depth problem analysis and autonomous task assignment. It effectively summarizes accident reports and provides pertinent recommendations. This integration approach not only expands LLM applications in safety engineering but also sets a precedent for future developments towards automation and intelligent systems. △ Less

Submitted 13 December, 2023; originally announced December 2023.

Comments: in Chinese language

arXiv:2312.02520 [pdf, other]

Towards More Unified In-context Visual Understanding

Authors: Dianmo Sheng, Dongdong Chen, Zhentao Tan, Qiankun Liu, Qi Chu, Jianmin Bao, Tao Gong, Bin Liu, Shengwei Xu, Nenghai Yu

Abstract: The rapid advancement of large language models (LLMs) has accelerated the emergence of in-context learning (ICL) as a cutting-edge approach in the natural language processing domain. Recently, ICL has been employed in visual understanding tasks, such as semantic segmentation and image captioning, yielding promising results. However, existing visual ICL framework can not enable producing content ac… ▽ More The rapid advancement of large language models (LLMs) has accelerated the emergence of in-context learning (ICL) as a cutting-edge approach in the natural language processing domain. Recently, ICL has been employed in visual understanding tasks, such as semantic segmentation and image captioning, yielding promising results. However, existing visual ICL framework can not enable producing content across multiple modalities, which limits their potential usage scenarios. To address this issue, we present a new ICL framework for visual understanding with multi-modal output enabled. First, we quantize and embed both text and visual prompt into a unified representational space, structured as interleaved in-context sequences. Then a decoder-only sparse transformer architecture is employed to perform generative modeling on them, facilitating in-context learning. Thanks to this design, the model is capable of handling in-context vision understanding tasks with multimodal output in a unified pipeline.Experimental results demonstrate that our model achieves competitive performance compared with specialized models and previous ICL baselines. Overall, our research takes a further step toward unified multimodal in-context learning. △ Less

Submitted 16 March, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

Comments: Accepted by CVPR 2024

arXiv:2311.06718 [pdf]

Sustainable Development Goal (SDG) 8: New Zealand Prospects while Yield Curve Inverts in Central Bank Digital Currency (CBDC) Era

Authors: Qionghua Chu

Abstract: In the inverted yield curve environment, I intend to assess the feasibility of fulfilling Sustainable Development Goal (SDG) 8, decent work and economic growth, of the United Nations by 2030 in New Zealand. Central Bank Digital Currency (CBDC) issuance supports SDG 8, based on the Cobb-Douglas production function, the growth accounting relation, and the Theory of Aggregate Demand. Bright prospects… ▽ More In the inverted yield curve environment, I intend to assess the feasibility of fulfilling Sustainable Development Goal (SDG) 8, decent work and economic growth, of the United Nations by 2030 in New Zealand. Central Bank Digital Currency (CBDC) issuance supports SDG 8, based on the Cobb-Douglas production function, the growth accounting relation, and the Theory of Aggregate Demand. Bright prospects exist for New Zealand. △ Less

Submitted 8 April, 2024; v1 submitted 11 November, 2023; originally announced November 2023.

arXiv:2311.06716 [pdf]

Sustainable Development Goals (SDGs): New Zealand Outlook with Central Bank Digital Currency and SDG 8 Realization on the Horizon

Authors: Qionghua Chu

Abstract: Central Bank Digital Currency (CBDC) may assist New Zealand accomplish SDG 8. I aim to evaluate if SDGs could be achieved together because of mutual interactions between SDG 8 and other SDGs. The SDGs are categorized by their shared qualities to affect and effect SDG 8. Also, additional SDGs may help each other achieve. Considering the CBDC as a fundamental stimulus to achieving decent work and ec… ▽ More Central Bank Digital Currency (CBDC) may assist New Zealand accomplish SDG 8. I aim to evaluate if SDGs could be achieved together because of mutual interactions between SDG 8 and other SDGs. The SDGs are categorized by their shared qualities to affect and effect SDG 8. Also, additional SDGs may help each other achieve. Considering the CBDC as a fundamental stimulus to achieving decent work and economic growth, detailed study and analysis of mutual interactions suggests that SDG 8 and other SDGs can be achieved. △ Less

Submitted 4 December, 2023; v1 submitted 11 November, 2023; originally announced November 2023.

arXiv:2311.06566 [pdf]

doi 10.1364/OE.504375

Indirect measurement of infrared absorption spectrum through thermal emission of meta-cavity array

Authors: Qiongqiong Chu, Fengyuan Zhang, Ye Zhang, Shining Zhu, Hui Liu

Abstract: Controlling thermal emission is essential for various infrared spectroscopy applications. Metasurfaces can be utilized to control multiple degrees of freedom of thermal emission, enabling the compact thermal emission materials and devices. Infrared spectroscopy such as FTIR (Fourier transform infrared spectroscopy), usually requires external infrared radiation source and complex spectroscopic devi… ▽ More Controlling thermal emission is essential for various infrared spectroscopy applications. Metasurfaces can be utilized to control multiple degrees of freedom of thermal emission, enabling the compact thermal emission materials and devices. Infrared spectroscopy such as FTIR (Fourier transform infrared spectroscopy), usually requires external infrared radiation source and complex spectroscopic devices for absorption spectrum measurement, which hinders the implementation of integrated compact and portable measurement equipment. Measuring absorption spectrum through the thermal emission of pixelated thermal emitter array can facilitate the integration and miniaturization of measurement setup, which is highly demanded for on-chip spectroscopy applications. Here, we experimentally demonstrate an integrated technology that allows for indirect measurement of the absorption spectrum through the thermal emission of meta-cavity array. This indirect measurement method opens a new avenue for compact infrared spectroscopy analysis. △ Less

Submitted 11 November, 2023; originally announced November 2023.

Comments: 14 pages, 9 figures

Journal ref: Optics Express 31, 39832 (2023)

arXiv:2310.15624 [pdf, other]

GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection

Authors: Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Tong He, Yonghui Li, Wanli Ouyang

Abstract: Geometry plays a significant role in monocular 3D object detection. It can be used to estimate object depth by using the perspective projection between object's physical size and 2D projection in the image plane, which can introduce mathematical priors into deep models. However, this projection process also introduces error amplification, where the error of the estimated height is amplified and re… ▽ More Geometry plays a significant role in monocular 3D object detection. It can be used to estimate object depth by using the perspective projection between object's physical size and 2D projection in the image plane, which can introduce mathematical priors into deep models. However, this projection process also introduces error amplification, where the error of the estimated height is amplified and reflected into the projected depth. It leads to unreliable depth inferences and also impairs training stability. To tackle this problem, we propose a novel Geometry Uncertainty Propagation Network (GUPNet++) by modeling geometry projection in a probabilistic manner. This ensures depth predictions are well-bounded and associated with a reasonable uncertainty. The significance of introducing such geometric uncertainty is two-fold: (1). It models the uncertainty propagation relationship of the geometry projection during training, improving the stability and efficiency of the end-to-end model learning. (2). It can be derived to a highly reliable confidence to indicate the quality of the 3D detection result, enabling more reliable detection inference. Experiments show that the proposed approach not only obtains (state-of-the-art) SOTA performance in image-based monocular 3D detection but also demonstrates superiority in efficacy with a simplified framework. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: 18 pages, 9 figures

arXiv:2309.16668 [pdf, other]

doi 10.1145/3658237

RealFill: Reference-Driven Generation for Authentic Image Completion

Authors: Luming Tang, Nataniel Ruiz, Qinghao Chu, Yuanzhen Li, Aleksander Holynski, David E. Jacobs, Bharath Hariharan, Yael Pritch, Neal Wadhwa, Kfir Aberman, Michael Rubinstein

Abstract: Recent advances in generative imagery have brought forth outpainting and inpainting models that can produce high-quality, plausible image content in unknown regions. However, the content these models hallucinate is necessarily inauthentic, since they are unaware of the true scene. In this work, we propose RealFill, a novel generative approach for image completion that fills in missing regions of a… ▽ More Recent advances in generative imagery have brought forth outpainting and inpainting models that can produce high-quality, plausible image content in unknown regions. However, the content these models hallucinate is necessarily inauthentic, since they are unaware of the true scene. In this work, we propose RealFill, a novel generative approach for image completion that fills in missing regions of an image with the content that should have been there. RealFill is a generative inpainting model that is personalized using only a few reference images of a scene. These reference images do not have to be aligned with the target image, and can be taken with drastically varying viewpoints, lighting conditions, camera apertures, or image styles. Once personalized, RealFill is able to complete a target image with visually compelling contents that are faithful to the original scene. We evaluate RealFill on a new image completion benchmark that covers a set of diverse and challenging scenarios, and find that it outperforms existing approaches by a large margin. Project page: https://realfill.github.io △ Less

Submitted 14 May, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

Comments: SIGGRAPH 2024 (Journal Track). Project page: https://realfill.github.io

arXiv:2309.12657 [pdf, other]

Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding

Authors: Jiazhen Wang, Bin Liu, Changtao Miao, Zhiwei Zhao, Wanyi Zhuang, Qi Chu, Nenghai Yu

Abstract: AI-synthesized text and images have gained significant attention, particularly due to the widespread dissemination of multi-modal manipulations on the internet, which has resulted in numerous negative impacts on society. Existing methods for multi-modal manipulation detection and grounding primarily focus on fusing vision-language features to make predictions, while overlooking the importance of m… ▽ More AI-synthesized text and images have gained significant attention, particularly due to the widespread dissemination of multi-modal manipulations on the internet, which has resulted in numerous negative impacts on society. Existing methods for multi-modal manipulation detection and grounding primarily focus on fusing vision-language features to make predictions, while overlooking the importance of modality-specific features, leading to sub-optimal results. In this paper, we construct a simple and novel transformer-based framework for multi-modal manipulation detection and grounding tasks. Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment. To achieve this, we introduce visual/language pre-trained encoders and dual-branch cross-attention (DCA) to extract and fuse modality-unique features. Furthermore, we design decoupled fine-grained classifiers (DFC) to enhance modality-specific feature mining and mitigate modality competition. Moreover, we propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality using learnable queries, thereby improving the discovery of forged details. Extensive experiments on the $\rm DGM^4$ dataset demonstrate the superior performance of our proposed model compared to state-of-the-art approaches. △ Less

Submitted 13 January, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. Camera-ready version and supplementary material

arXiv:2308.13666 [pdf, other]

A Joint Fermi-GBM and Swift-BAT Analysis of Gravitational-Wave Candidates from the Third Gravitational-wave Observing Run

Authors: C. Fletcher, J. Wood, R. Hamburg, P. Veres, C. M. Hui, E. Bissaldi, M. S. Briggs, E. Burns, W. H. Cleveland, M. M. Giles, A. Goldstein, B. A. Hristov, D. Kocevski, S. Lesage, B. Mailyan, C. Malacaria, S. Poolakkil, A. von Kienlin, C. A. Wilson-Hodge, The Fermi Gamma-ray Burst Monitor Team, M. Crnogorčević, J. DeLaunay, A. Tohuvavohu, R. Caputo, S. B. Cenko , et al. (1674 additional authors not shown)

Abstract: We present Fermi Gamma-ray Burst Monitor (Fermi-GBM) and Swift Burst Alert Telescope (Swift-BAT) searches for gamma-ray/X-ray counterparts to gravitational wave (GW) candidate events identified during the third observing run of the Advanced LIGO and Advanced Virgo detectors. Using Fermi-GBM on-board triggers and sub-threshold gamma-ray burst (GRB) candidates found in the Fermi-GBM ground analyses,… ▽ More We present Fermi Gamma-ray Burst Monitor (Fermi-GBM) and Swift Burst Alert Telescope (Swift-BAT) searches for gamma-ray/X-ray counterparts to gravitational wave (GW) candidate events identified during the third observing run of the Advanced LIGO and Advanced Virgo detectors. Using Fermi-GBM on-board triggers and sub-threshold gamma-ray burst (GRB) candidates found in the Fermi-GBM ground analyses, the Targeted Search and the Untargeted Search, we investigate whether there are any coincident GRBs associated with the GWs. We also search the Swift-BAT rate data around the GW times to determine whether a GRB counterpart is present. No counterparts are found. Using both the Fermi-GBM Targeted Search and the Swift-BAT search, we calculate flux upper limits and present joint upper limits on the gamma-ray luminosity of each GW. Given these limits, we constrain theoretical models for the emission of gamma-rays from binary black hole mergers. △ Less

Submitted 25 August, 2023; originally announced August 2023.

arXiv:2307.02756 [pdf, other]

doi 10.1093/mnras/stac3418

On the detection of the electromagnetic counterparts from lensed gravitational wave events by binary neutron star mergers

Authors: Hao Ma, Youjun Lu, Xiao Guo, Siqi Zhang, Qingbo Chu

Abstract: Future ground-based gravitational wave (GW) detectors, i.e., Einstein telescope (ET) and Cosmic Explorer (CE), are expected to detect a significant number of lensed binary neutron star (BNS) mergers, which may provide a unique tool to probe cosmology. In this paper, we investigate the detectability of the optical/infrared electromagnetic (EM) counterparts (kilonovae/afterglows) from these lensed B… ▽ More Future ground-based gravitational wave (GW) detectors, i.e., Einstein telescope (ET) and Cosmic Explorer (CE), are expected to detect a significant number of lensed binary neutron star (BNS) mergers, which may provide a unique tool to probe cosmology. In this paper, we investigate the detectability of the optical/infrared electromagnetic (EM) counterparts (kilonovae/afterglows) from these lensed BNS mergers by future GW detectors and EM telescopes using simple kilonova, afterglow, and lens models. ET and CE are expected to detect $\sim5.32^{+26.1}_{-5.10}$ and $67.3^{+332}_{-64.7}$ lensed BNS mergers per year. We find that the EM counterparts associated with all these mergers will be detectable by an all sky-survey in the H-band with the limiting magnitude $m_{\textrm{lim}}\gtrsim27$, while the detectable fraction is $\lesssim0.4\%$ in the g-/z-band if with $m_{\textrm{lim}}\lesssim24$. Generally it is more efficient to search the lensed EM counterparts by adopting the infrared bands than the optical/UV bands with the same $m_{\textrm{lim}}$. Future telescopes like Vera C. Rubin Observatory, China Space Station Telescope, and Euclid can hardly detect the EM counterparts of even one lensed BNS merger. Roman Space Telescope (RST) and James Webb Space Telescope (JWST) have the capability to detect about a few or more such events per year. Moreover, the time delays and separations between the lensed image pairs are typically in the ranges from minutes to months and from $0.1$ to $1$\,arcsec, suggesting that both the GW and EM images of most lensed BNS mergers can be well resolved by not only CE/ET in the time domain but also RST/JWST spatially. △ Less

Submitted 5 July, 2023; originally announced July 2023.

Comments: 16 pages, 10 figures, MNRAS in press

arXiv:2306.10900 [pdf, other]

MotionGPT: Finetuned LLMs Are General-Purpose Motion Generators

Authors: Yaqi Zhang, Di Huang, Bin Liu, Shixiang Tang, Yan Lu, Lu Chen, Lei Bai, Qi Chu, Nenghai Yu, Wanli Ouyang

Abstract: Generating realistic human motion from given action descriptions has experienced significant advancements because of the emerging requirement of digital humans. While recent works have achieved impressive results in generating motion directly from textual action descriptions, they often support only a single modality of the control signal, which limits their application in the real digital human i… ▽ More Generating realistic human motion from given action descriptions has experienced significant advancements because of the emerging requirement of digital humans. While recent works have achieved impressive results in generating motion directly from textual action descriptions, they often support only a single modality of the control signal, which limits their application in the real digital human industry. This paper presents a Motion General-Purpose generaTor (MotionGPT) that can use multimodal control signals, e.g., text and single-frame poses, for generating consecutive human motions by treating multimodal signals as special input tokens in large language models (LLMs). Specifically, we first quantize multimodal control signals into discrete codes and then formulate them in a unified prompt instruction to ask the LLMs to generate the motion answer. Our MotionGPT demonstrates a unified human motion generation model with multimodal control signals by tuning a mere 0.4% of LLM parameters. To the best of our knowledge, MotionGPT is the first method to generate human motion by multimodal control signals, which we hope can shed light on this new direction. Visit our webpage at https://qiqiapink.github.io/MotionGPT/. △ Less

Submitted 18 March, 2024; v1 submitted 19 June, 2023; originally announced June 2023.

Comments: 18 pages, 8 figures, accepted by AAAI 2024

arXiv:2306.09615 [pdf, other]

doi 10.1109/ICASSP49357.2023.10095302

EVOPOSE: A Recursive Transformer For 3D Human Pose Estimation With Kinematic Structure Priors

Authors: Yaqi Zhang, Yan Lu, Bin Liu, Zhiwei Zhao, Qi Chu, Nenghai Yu

Abstract: Transformer is popular in recent 3D human pose estimation, which utilizes long-term modeling to lift 2D keypoints into the 3D space. However, current transformer-based methods do not fully exploit the prior knowledge of the human skeleton provided by the kinematic structure. In this paper, we propose a novel transformer-based model EvoPose to introduce the human body prior knowledge for 3D human p… ▽ More Transformer is popular in recent 3D human pose estimation, which utilizes long-term modeling to lift 2D keypoints into the 3D space. However, current transformer-based methods do not fully exploit the prior knowledge of the human skeleton provided by the kinematic structure. In this paper, we propose a novel transformer-based model EvoPose to introduce the human body prior knowledge for 3D human pose estimation effectively. Specifically, a Structural Priors Representation (SPR) module represents human priors as structural features carrying rich body patterns, e.g. joint relationships. The structural features are interacted with 2D pose sequences and help the model to achieve more informative spatiotemporal features. Moreover, a Recursive Refinement (RR) module is applied to refine the 3D pose outputs by utilizing estimated results and further injects human priors simultaneously. Extensive experiments demonstrate the effectiveness of EvoPose which achieves a new state of the art on two most popular benchmarks, Human3.6M and MPI-INF-3DHP. △ Less

Submitted 16 June, 2023; originally announced June 2023.

Comments: 5 pages, 2 figures, 4 tables, published in the proceedings of IEEE ICASSP 2023

arXiv:2306.09008 [pdf, other]

Exploring the Application of Large-scale Pre-trained Models on Adverse Weather Removal

Authors: Zhentao Tan, Yue Wu, Qiankun Liu, Qi Chu, Le Lu, Jie** Ye, Nenghai Yu

Abstract: Image restoration under adverse weather conditions (e.g., rain, snow and haze) is a fundamental computer vision problem and has important indications for various downstream applications. Different from early methods that are specially designed for specific type of weather, most recent works tend to remove various adverse weather effects simultaneously through either spatial feature representation… ▽ More Image restoration under adverse weather conditions (e.g., rain, snow and haze) is a fundamental computer vision problem and has important indications for various downstream applications. Different from early methods that are specially designed for specific type of weather, most recent works tend to remove various adverse weather effects simultaneously through either spatial feature representation learning or semantic information embedding. Inspired by the various successful applications of large-scale pre-trained models (e.g, CLIP), in this paper, we explore the potential benefits of them for this task through both spatial feature representation learning and semantic information embedding aspects: 1) for spatial feature representation learning, we design a Spatially-Adaptive Residual (\textbf{SAR}) Encoder to extract degraded areas adaptively. To facilitate its training, we propose a Soft Residual Distillation (\textbf{CLIP-SRD}) strategy to transfer the spatial knowledge from CLIP between clean and adverse weather images; 2) for semantic information embedding, we propose a CLIP Weather Prior (\textbf{CWP}) embedding module to make the network handle different weather conditions adaptively. This module integrates the sample specific weather prior extracted by CLIP image encoder together with the distribution specific information learned by a set of parameters, and embeds them through a cross attention mechanism. Extensive experiments demonstrate that our proposed method can achieve state-of-the-art performance under different and challenging adverse weather conditions. Code will be made available. △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2306.05390 [pdf, other]

HQ-50K: A Large-scale, High-quality Dataset for Image Restoration

Authors: Qinhong Yang, Dongdong Chen, Zhentao Tan, Qiankun Liu, Qi Chu, Jianmin Bao, Lu Yuan, Gang Hua, Nenghai Yu

Abstract: This paper introduces a new large-scale image restoration dataset, called HQ-50K, which contains 50,000 high-quality images with rich texture details and semantic diversity. We analyze existing image restoration datasets from five different perspectives, including data scale, resolution, compression rates, texture details, and semantic coverage. However, we find that all of these datasets are defi… ▽ More This paper introduces a new large-scale image restoration dataset, called HQ-50K, which contains 50,000 high-quality images with rich texture details and semantic diversity. We analyze existing image restoration datasets from five different perspectives, including data scale, resolution, compression rates, texture details, and semantic coverage. However, we find that all of these datasets are deficient in some aspects. In contrast, HQ-50K considers all of these five aspects during the data curation process and meets all requirements. We also present a new Degradation-Aware Mixture of Expert (DAMoE) model, which enables a single model to handle multiple corruption types and unknown levels. Our extensive experiments demonstrate that HQ-50K consistently improves the performance on various image restoration tasks, such as super-resolution, denoising, dejpeg, and deraining. Furthermore, our proposed DAMoE, trained on our \dataset, outperforms existing state-of-the-art unified models designed for multiple restoration tasks and levels. The dataset and code are available at \url{https://github.com/littleYaang/HQ-50K}. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: Dataset and code will be available at https://github.com/littleYaang/HQ-50K

arXiv:2305.10794 [pdf, other]

Multi-spectral Class Center Network for Face Manipulation Detection and Localization

Authors: Changtao Miao, Qi Chu, Zhentao Tan, Zhenchao **, Wanyi Zhuang, Yue Wu, Bin Liu, Honggang Hu, Nenghai Yu

Abstract: As Deepfake contents continue to proliferate on the internet, advancing face manipulation forensics has become a pressing issue. To combat this emerging threat, previous methods mainly focus on studying how to distinguish authentic and manipulated face images. Despite impressive, image-level classification lacks explainability and is limited to some specific application scenarios. Existing forgery… ▽ More As Deepfake contents continue to proliferate on the internet, advancing face manipulation forensics has become a pressing issue. To combat this emerging threat, previous methods mainly focus on studying how to distinguish authentic and manipulated face images. Despite impressive, image-level classification lacks explainability and is limited to some specific application scenarios. Existing forgery localization methods suffer from imprecise and inconsistent pixel-level annotations. To alleviate these problems, this paper first re-constructs the FaceForensics++ dataset by introducing pixel-level annotations, then builds an extensive benchmark for localizing tampered regions. Next, a novel Multi-Spectral Class Center Network (MSCCNet) is proposed for face manipulation detection and localization. Specifically, inspired by the power of frequency-related forgery traces, we design Multi-Spectral Class Center (MSCC) module to learn more generalizable and semantic-agnostic features. Based on the features of different frequency bands, the MSCC module collects multispectral class centers and computes pixel-to-class relations. Applying multi-spectral class-level representations suppresses the semantic information of the visual concepts, which is insensitive to manipulations. Furthermore, we propose a Multi-level Features Aggregation (MFA) module to employ more low-level forgery artifacts and structure textures. Experimental results quantitatively and qualitatively indicate the effectiveness and superiority of the proposed MSCCNet on comprehensive localization benchmarks. We expect this work to inspire more studies on pixel-level face manipulation localization. The annotations and codes are available. △ Less

Submitted 19 September, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

Comments: Email Address: [email protected]

arXiv:2305.09131 [pdf]

doi 10.1515/nanoph-2022-0781

Multiple symmetry protected BIC lines in two dimensional synthetic parameter space

Authors: Fengyuan Zhang, Qiongqiong Chu, Qiang Wang, Shining Zhu, Hui Liu

Abstract: Bound states in the continuum (BICs) have attracted significant interest in recent years due to their unique optical properties, such as infinite quality factor and wave localization. In order to improve the optical performance of BICs based devices, more degrees of freedom are required to tune BICs in high-dimension parameter space for practical applications. To effectively tune more BICs, we for… ▽ More Bound states in the continuum (BICs) have attracted significant interest in recent years due to their unique optical properties, such as infinite quality factor and wave localization. In order to improve the optical performance of BICs based devices, more degrees of freedom are required to tune BICs in high-dimension parameter space for practical applications. To effectively tune more BICs, we form a 2D synthetic parameter space based on a nanohole metasurface array. Multiple symmetry protected BIC modes with high Q factors can be achieved at high-order symmetry point. Through manipulating asymmetry parameters, BIC lines formed by a series of BIC modes can be found in the 2D synthetic parameter space. Moreover, the electric field distributions are investigated to demonstrate the generation and evolution of BICs. By measuring the absorption spectra, the tuning of multiple BICs with synthet-ic asymmetry parameters is experimentally explored, which agrees well with theoretical results. Therefore, our de-sign can provide new insight for a variety of on-chip applications, such as non-linear devices, integrated nanolasing array and high-resolution sensors for infrared molecular detection. △ Less

Submitted 15 May, 2023; originally announced May 2023.

Journal ref: Nanophotonics February 6, 2023

arXiv:2305.06145 [pdf, other]

Clothes-Invariant Feature Learning by Causal Intervention for Clothes-Changing Person Re-identification

Authors: Xulin Li, Yan Lu, Bin Liu, Yuenan Hou, Yating Liu, Qi Chu, Wanli Ouyang, Nenghai Yu

Abstract: Clothes-invariant feature extraction is critical to the clothes-changing person re-identification (CC-ReID). It can provide discriminative identity features and eliminate the negative effects caused by the confounder--clothing changes. But we argue that there exists a strong spurious correlation between clothes and human identity, that restricts the common likelihood-based ReID method P(Y|X) to ex… ▽ More Clothes-invariant feature extraction is critical to the clothes-changing person re-identification (CC-ReID). It can provide discriminative identity features and eliminate the negative effects caused by the confounder--clothing changes. But we argue that there exists a strong spurious correlation between clothes and human identity, that restricts the common likelihood-based ReID method P(Y|X) to extract clothes-irrelevant features. In this paper, we propose a new Causal Clothes-Invariant Learning (CCIL) method to achieve clothes-invariant feature learning by modeling causal intervention P(Y|do(X)). This new causality-based model is inherently invariant to the confounder in the causal view, which can achieve the clothes-invariant features and avoid the barrier faced by the likelihood-based methods. Extensive experiments on three CC-ReID benchmarks, including PRCC, LTCC, and VC-Clothes, demonstrate the effectiveness of our approach, which achieves a new state of the art. △ Less

Submitted 10 May, 2023; originally announced May 2023.

arXiv:2304.08393 [pdf, other]

Search for gravitational-lensing signatures in the full third observing run of the LIGO-Virgo network

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, R. Abbott, H. Abe, F. Acernese, K. Ackley, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi, C. Alléné, A. Allocca, P. A. Altin , et al. (1670 additional authors not shown)

Abstract: Gravitational lensing by massive objects along the line of sight to the source causes distortions of gravitational wave-signals; such distortions may reveal information about fundamental physics, cosmology and astrophysics. In this work, we have extended the search for lensing signatures to all binary black hole events from the third observing run of the LIGO--Virgo network. We search for repeated… ▽ More Gravitational lensing by massive objects along the line of sight to the source causes distortions of gravitational wave-signals; such distortions may reveal information about fundamental physics, cosmology and astrophysics. In this work, we have extended the search for lensing signatures to all binary black hole events from the third observing run of the LIGO--Virgo network. We search for repeated signals from strong lensing by 1) performing targeted searches for subthreshold signals, 2) calculating the degree of overlap amongst the intrinsic parameters and sky location of pairs of signals, 3) comparing the similarities of the spectrograms amongst pairs of signals, and 4) performing dual-signal Bayesian analysis that takes into account selection effects and astrophysical knowledge. We also search for distortions to the gravitational waveform caused by 1) frequency-independent phase shifts in strongly lensed images, and 2) frequency-dependent modulation of the amplitude and phase due to point masses. None of these searches yields significant evidence for lensing. Finally, we use the non-detection of gravitational-wave lensing to constrain the lensing rate based on the latest merger-rate estimates and the fraction of dark matter composed of compact objects. △ Less

Submitted 17 April, 2023; originally announced April 2023.

Comments: 28 pages, 11 figures

Report number: LIGO-P2200031

arXiv:2304.05779 [pdf, other]

doi 10.1093/mnras/stad1028

The luminosity functions of kilonovae from binary neutron star mergers under different equation of states

Authors: Chunyang Zhao, Youjun Lu, Qingbo Chu, Wen Zhao

Abstract: Kilonovae produced by mergers of binary neutron stars (BNSs) are important transient events to be detected by time domain surveys with the alerts from the ground-based gravitational wave detectors. The observational properties of these kilonovae depend on the physical processes involved in the merging processes and the equation of state (EOS) of neutron stars (NSs). In this paper, we investigate t… ▽ More Kilonovae produced by mergers of binary neutron stars (BNSs) are important transient events to be detected by time domain surveys with the alerts from the ground-based gravitational wave detectors. The observational properties of these kilonovae depend on the physical processes involved in the merging processes and the equation of state (EOS) of neutron stars (NSs). In this paper, we investigate the dependence of kilonova luminosities on the parameters of BNS mergers, and estimate the distribution functions of kilonova peak luminosities (KLFs) at the u, g, r, i, y, and z bands as well as its dependence on the NS EOS, by adopting a comprehensive semi-analytical model for kilonovae (calibrated by the observations of GW170817), a population synthesis model for the cosmic BNSs, and the ejecta properties of BNS mergers predicted by numerical simulations. We find that the kilonova light curves depend on both the BNS properties and the NS EOS, and the KLFs at the considered bands are bimodal with the bright components mostly contributed by BNS mergers with total mass $\lesssim 3.2M_\odot$/$2.8M_\odot$ and fainter components mostly contributed by BNS mergers with total mass $\gtrsim 3.2M_\odot$/$2.8M_\odot$ by assuming a stiff/soft (DD2/SLy) EOS. The emission of the kilonovae in the KLF bright components is mostly due to the radiation from the wind ejecta by the remnant discs of BNS mergers, while the emission of the kilonovae in the KLF faint components is mostly due to the radiation from the dynamical ejecta by the BNS mergers. △ Less

Submitted 13 April, 2023; v1 submitted 12 April, 2023; originally announced April 2023.

Comments: 28 pages, 16 figures, to appear in MNRAS

arXiv:2303.09522 [pdf, other]

P+: Extended Textual Conditioning in Text-to-Image Generation

Authors: Andrey Voynov, Qinghao Chu, Daniel Cohen-Or, Kfir Aberman

Abstract: We introduce an Extended Textual Conditioning space in text-to-image models, referred to as $P+$. This space consists of multiple textual conditions, derived from per-layer prompts, each corresponding to a layer of the denoising U-net of the diffusion model. We show that the extended space provides greater disentangling and control over image synthesis. We further introduce Extended Textual Inve… ▽ More We introduce an Extended Textual Conditioning space in text-to-image models, referred to as $P+$. This space consists of multiple textual conditions, derived from per-layer prompts, each corresponding to a layer of the denoising U-net of the diffusion model. We show that the extended space provides greater disentangling and control over image synthesis. We further introduce Extended Textual Inversion (XTI), where the images are inverted into $P+$, and represented by per-layer tokens. We show that XTI is more expressive and precise, and converges faster than the original Textual Inversion (TI) space. The extended inversion method does not involve any noticeable trade-off between reconstruction and editability and induces more regular inversions. We conduct a series of extensive experiments to analyze and understand the properties of the new space, and to showcase the effectiveness of our method for personalizing text-to-image models. Furthermore, we utilize the unique properties of this space to achieve previously unattainable results in object-style mixing using text-to-image models. Project page: https://prompt-plus.github.io △ Less

Submitted 15 July, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

arXiv:2301.04265 [pdf, other]

Adversarial Alignment for Source Free Object Detection

Authors: Qiaosong Chu, Shuyan Li, Guangyi Chen, Kai Li, Xiu Li

Abstract: Source-free object detection (SFOD) aims to transfer a detector pre-trained on a label-rich source domain to an unlabeled target domain without seeing source data. While most existing SFOD methods generate pseudo labels via a source-pretrained model to guide training, these pseudo labels usually contain high noises due to heavy domain discrepancy. In order to obtain better pseudo supervisions, we… ▽ More Source-free object detection (SFOD) aims to transfer a detector pre-trained on a label-rich source domain to an unlabeled target domain without seeing source data. While most existing SFOD methods generate pseudo labels via a source-pretrained model to guide training, these pseudo labels usually contain high noises due to heavy domain discrepancy. In order to obtain better pseudo supervisions, we divide the target domain into source-similar and source-dissimilar parts and align them in the feature space by adversarial learning. Specifically, we design a detection variance-based criterion to divide the target domain. This criterion is motivated by a finding that larger detection variances denote higher recall and larger similarity to the source domain. Then we incorporate an adversarial module into a mean teacher framework to drive the feature spaces of these two subsets indistinguishable. Extensive experiments on multiple cross-domain object detection datasets demonstrate that our proposed method consistently outperforms the compared SFOD methods. △ Less

Submitted 10 January, 2023; originally announced January 2023.

arXiv:2212.03863 [pdf, other]

X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion

Authors: Hanqing Zhao, Dianmo Sheng, Jianmin Bao, Dongdong Chen, Dong Chen, Fang Wen, Lu Yuan, Ce Liu, Wenbo Zhou, Qi Chu, Weiming Zhang, Nenghai Yu

Abstract: Copy-Paste is a simple and effective data augmentation strategy for instance segmentation. By randomly pasting object instances onto new background images, it creates new training data for free and significantly boosts the segmentation performance, especially for rare object categories. Although diverse, high-quality object instances used in Copy-Paste result in more performance gain, previous wor… ▽ More Copy-Paste is a simple and effective data augmentation strategy for instance segmentation. By randomly pasting object instances onto new background images, it creates new training data for free and significantly boosts the segmentation performance, especially for rare object categories. Although diverse, high-quality object instances used in Copy-Paste result in more performance gain, previous works utilize object instances either from human-annotated instance segmentation datasets or rendered from 3D object models, and both approaches are too expensive to scale up to obtain good diversity. In this paper, we revisit Copy-Paste at scale with the power of newly emerged zero-shot recognition models (e.g., CLIP) and text2image models (e.g., StableDiffusion). We demonstrate for the first time that using a text2image model to generate images or zero-shot recognition model to filter noisily crawled images for different object categories is a feasible way to make Copy-Paste truly scalable. To make such success happen, we design a data acquisition and processing framework, dubbed ``X-Paste", upon which a systematic study is conducted. On the LVIS dataset, X-Paste provides impressive improvements over the strong baseline CenterNet2 with Swin-L as the backbone. Specifically, it archives +2.6 box AP and +2.1 mask AP gains on all classes and even more significant gains with +6.8 box AP, +6.5 mask AP on long-tail classes. Our code and models are available at https://github.com/yoctta/XPaste. △ Less

Submitted 31 May, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

Comments: ICML 2023, code is available at https://github.com/yoctta/XPaste

arXiv:2212.01477 [pdf, other]

doi 10.1093/mnras/stad3120

Search for subsolar-mass black hole binaries in the second part of Advanced LIGO's and Advanced Virgo's third observing run

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, R. Abbott, H. Abe, F. Acernese, K. Ackley, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi, C. Alléné, A. Allocca, P. A. Altin , et al. (1680 additional authors not shown)

Abstract: We describe a search for gravitational waves from compact binaries with at least one component with mass 0.2 $M_\odot$ -- $1.0 M_\odot$ and mass ratio $q \geq 0.1$ in Advanced LIGO and Advanced Virgo data collected between 1 November 2019, 15:00 UTC and 27 March 2020, 17:00 UTC. No signals were detected. The most significant candidate has a false alarm rate of 0.2 $\mathrm{yr}^{-1}$. We estimate t… ▽ More We describe a search for gravitational waves from compact binaries with at least one component with mass 0.2 $M_\odot$ -- $1.0 M_\odot$ and mass ratio $q \geq 0.1$ in Advanced LIGO and Advanced Virgo data collected between 1 November 2019, 15:00 UTC and 27 March 2020, 17:00 UTC. No signals were detected. The most significant candidate has a false alarm rate of 0.2 $\mathrm{yr}^{-1}$. We estimate the sensitivity of our search over the entirety of Advanced LIGO's and Advanced Virgo's third observing run, and present the most stringent limits to date on the merger rate of binary black holes with at least one subsolar-mass component. We use the upper limits to constrain two fiducial scenarios that could produce subsolar-mass black holes: primordial black holes (PBH) and a model of dissipative dark matter. The PBH model uses recent prescriptions for the merger rate of PBH binaries that include a rate suppression factor to effectively account for PBH early binary disruptions. If the PBHs are monochromatically distributed, we can exclude a dark matter fraction in PBHs $f_\mathrm{PBH} \gtrsim 0.6$ (at 90% confidence) in the probed subsolar-mass range. However, if we allow for broad PBH mass distributions we are unable to rule out $f_\mathrm{PBH} = 1$. For the dissipative model, where the dark matter has chemistry that allows a small fraction to cool and collapse into black holes, we find an upper bound $f_{\mathrm{DBH}} < 10^{-5}$ on the fraction of atomic dark matter collapsed into black holes. △ Less

Submitted 26 January, 2024; v1 submitted 2 December, 2022; originally announced December 2022.

Comments: https://dcc.ligo.org/P2200139

arXiv:2210.12752 [pdf, other]

UIA-ViT: Unsupervised Inconsistency-Aware Method based on Vision Transformer for Face Forgery Detection

Authors: Wanyi Zhuang, Qi Chu, Zhentao Tan, Qiankun Liu, Haojie Yuan, Changtao Miao, Zixiang Luo, Nenghai Yu

Abstract: Intra-frame inconsistency has been proved to be effective for the generalization of face forgery detection. However, learning to focus on these inconsistency requires extra pixel-level forged location annotations. Acquiring such annotations is non-trivial. Some existing methods generate large-scale synthesized data with location annotations, which is only composed of real images and cannot capture… ▽ More Intra-frame inconsistency has been proved to be effective for the generalization of face forgery detection. However, learning to focus on these inconsistency requires extra pixel-level forged location annotations. Acquiring such annotations is non-trivial. Some existing methods generate large-scale synthesized data with location annotations, which is only composed of real images and cannot capture the properties of forgery regions. Others generate forgery location labels by subtracting paired real and fake images, yet such paired data is difficult to collected and the generated label is usually discontinuous. To overcome these limitations, we propose a novel Unsupervised Inconsistency-Aware method based on Vision Transformer, called UIA-ViT, which only makes use of video-level labels and can learn inconsistency-aware feature without pixel-level annotations. Due to the self-attention mechanism, the attention map among patch embeddings naturally represents the consistency relation, making the vision Transformer suitable for the consistency representation learning. Based on vision Transformer, we propose two key components: Unsupervised Patch Consistency Learning (UPCL) and Progressive Consistency Weighted Assemble (PCWA). UPCL is designed for learning the consistency-related representation with progressive optimized pseudo annotations. PCWA enhances the final classification embedding with previous patch embeddings optimized by UPCL to further improve the detection performance. Extensive experiments demonstrate the effectiveness of the proposed method. △ Less

Submitted 23 October, 2022; originally announced October 2022.

Comments: accepted by ECCV 2022 (oral)

arXiv:2210.10931 [pdf, other]

Search for gravitational-wave transients associated with magnetar bursts in Advanced LIGO and Advanced Virgo data from the third observing run

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, R. Abbott, H. Abe, F. Acernese, K. Ackley, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi, A. Allocca, P. A. Altin , et al. (1645 additional authors not shown)

Abstract: Gravitational waves are expected to be produced from neutron star oscillations associated with magnetar giant flares and short bursts. We present the results of a search for short-duration (milliseconds to seconds) and long-duration ($\sim$ 100 s) transient gravitational waves from 13 magnetar short bursts observed during Advanced LIGO, Advanced Virgo and KAGRA's third observation run. These 13 bu… ▽ More Gravitational waves are expected to be produced from neutron star oscillations associated with magnetar giant flares and short bursts. We present the results of a search for short-duration (milliseconds to seconds) and long-duration ($\sim$ 100 s) transient gravitational waves from 13 magnetar short bursts observed during Advanced LIGO, Advanced Virgo and KAGRA's third observation run. These 13 bursts come from two magnetars, SGR 1935$+$2154 and Swift J1818.0$-$1607. We also include three other electromagnetic burst events detected by Fermi GBM which were identified as likely coming from one or more magnetars, but they have no association with a known magnetar. No magnetar giant flares were detected during the analysis period. We find no evidence of gravitational waves associated with any of these 16 bursts. We place upper bounds on the root-sum-square of the integrated gravitational-wave strain that reach $2.2 \times 10^{-23}$ $/\sqrt{\text{Hz}}$ at 100 Hz for the short-duration search and $8.7 \times 10^{-23}$ $/\sqrt{\text{Hz}}$ at $450$ Hz for the long-duration search, given a detection efficiency of 50%. For a ringdown signal at 1590 Hz targeted by the short-duration search the limit is set to $1.8 \times 10^{-22}$ $/\sqrt{\text{Hz}}$. Using the estimated distance to each magnetar, we derive upper bounds on the emitted gravitational-wave energy of $3.2 \times 10^{43}$ erg ($7.3 \times 10^{43}$ erg) for SGR 1935$+$2154 and $8.2 \times 10^{42}$ erg ($2.8 \times 10^{43}$ erg) for Swift J1818.0$-$1607, for the short-duration (long-duration) search. Assuming isotropic emission of electromagnetic radiation of the burst fluences, we constrain the ratio of gravitational-wave energy to electromagnetic energy for bursts from SGR 1935$+$2154 with available fluence information. The lowest of these ratios is $3 \times 10^3$. △ Less

Submitted 19 October, 2022; originally announced October 2022.

Comments: 30 pages with appendices, 5 figures, 10 tables

Report number: LIGO-P2100387

arXiv:2209.02863 [pdf]

doi 10.3847/2041-8213/aca1b0

Model-based cross-correlation search for gravitational waves from the low-mass X-ray binary Scorpius X-1 in LIGO O3 data

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, R. Abbott, H. Abe, F. Acernese, K. Ackley, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi, C. Alléné, A. Allocca, P. A. Altin , et al. (1670 additional authors not shown)

Abstract: We present the results of a model-based search for continuous gravitational waves from the low-mass X-ray binary Scorpius X-1 using LIGO detector data from the third observing run of Advanced LIGO, Advanced Virgo and KAGRA. This is a semicoherent search which uses details of the signal model to coherently combine data separated by less than a specified coherence time, which can be adjusted to bala… ▽ More We present the results of a model-based search for continuous gravitational waves from the low-mass X-ray binary Scorpius X-1 using LIGO detector data from the third observing run of Advanced LIGO, Advanced Virgo and KAGRA. This is a semicoherent search which uses details of the signal model to coherently combine data separated by less than a specified coherence time, which can be adjusted to balance sensitivity with computing cost. The search covered a range of gravitational-wave frequencies from 25Hz to 1600Hz, as well as ranges in orbital speed, frequency and phase determined from observational constraints. No significant detection candidates were found, and upper limits were set as a function of frequency. The most stringent limits, between 100Hz and 200Hz, correspond to an amplitude h0 of about 1e-25 when marginalized isotropically over the unknown inclination angle of the neutron star's rotation axis, or less than 4e-26 assuming the optimal orientation. The sensitivity of this search is now probing amplitudes predicted by models of torque balance equilibrium. For the usual conservative model assuming accretion at the surface of the neutron star, our isotropically-marginalized upper limits are close to the predicted amplitude from about 70Hz to 100Hz; the limits assuming the neutron star spin is aligned with the most likely orbital angular momentum are below the conservative torque balance predictions from 40Hz to 200Hz. Assuming a broader range of accretion models, our direct limits on gravitational-wave amplitude delve into the relevant parameter space over a wide range of frequencies, to 500Hz or more. △ Less

Submitted 2 January, 2023; v1 submitted 6 September, 2022; originally announced September 2022.

Comments: 19 pages, Open Access Journal PDF

Report number: LIGO-P2100110-v13

Journal ref: The Astrophysical Journal Letters, 941, L30 (2022)

arXiv:2208.00967 [pdf, other]

Counterfactual Intervention Feature Transfer for Visible-Infrared Person Re-identification

Authors: Xulin Li, Yan Lu, Bin Liu, Yating Liu, Guojun Yin, Qi Chu, **yang Huang, Feng Zhu, Rui Zhao, Nenghai Yu

Abstract: Graph-based models have achieved great success in person re-identification tasks recently, which compute the graph topology structure (affinities) among different people first and then pass the information across them to achieve stronger features. But we find existing graph-based methods in the visible-infrared person re-identification task (VI-ReID) suffer from bad generalization because of two i… ▽ More Graph-based models have achieved great success in person re-identification tasks recently, which compute the graph topology structure (affinities) among different people first and then pass the information across them to achieve stronger features. But we find existing graph-based methods in the visible-infrared person re-identification task (VI-ReID) suffer from bad generalization because of two issues: 1) train-test modality balance gap, which is a property of VI-ReID task. The number of two modalities data are balanced in the training stage, but extremely unbalanced in inference, causing the low generalization of graph-based VI-ReID methods. 2) sub-optimal topology structure caused by the end-to-end learning manner to the graph module. We analyze that the well-trained input features weaken the learning of graph topology, making it not generalized enough during the inference process. In this paper, we propose a Counterfactual Intervention Feature Transfer (CIFT) method to tackle these problems. Specifically, a Homogeneous and Heterogeneous Feature Transfer (H2FT) is designed to reduce the train-test modality balance gap by two independent types of well-designed graph modules and an unbalanced scenario simulation. Besides, a Counterfactual Relation Intervention (CRI) is proposed to utilize the counterfactual intervention and causal effect tools to highlight the role of topology structure in the whole training process, which makes the graph topology structure more reliable. Extensive experiments on standard VI-ReID benchmarks demonstrate that CIFT outperforms the state-of-the-art methods under various settings. △ Less

Submitted 14 November, 2022; v1 submitted 1 August, 2022; originally announced August 2022.

arXiv:2207.03776 [pdf, other]

Towards Intrinsic Common Discriminative Features Learning for Face Forgery Detection using Adversarial Learning

Authors: Wanyi Zhuang, Qi Chu, Haojie Yuan, Changtao Miao, Bin Liu, Nenghai Yu

Abstract: Existing face forgery detection methods usually treat face forgery detection as a binary classification problem and adopt deep convolution neural networks to learn discriminative features. The ideal discriminative features should be only related to the real/fake labels of facial images. However, we observe that the features learned by vanilla classification networks are correlated to unnecessary p… ▽ More Existing face forgery detection methods usually treat face forgery detection as a binary classification problem and adopt deep convolution neural networks to learn discriminative features. The ideal discriminative features should be only related to the real/fake labels of facial images. However, we observe that the features learned by vanilla classification networks are correlated to unnecessary properties, such as forgery methods and facial identities. Such phenomenon would limit forgery detection performance especially for the generalization ability. Motivated by this, we propose a novel method which utilizes adversarial learning to eliminate the negative effect of different forgery methods and facial identities, which helps classification network to learn intrinsic common discriminative features for face forgery detection. To leverage data lacking ground truth label of facial identities, we design a special identity discriminator based on similarity information derived from off-the-shelf face recognition model. With the help of adversarial learning, our face forgery detection model learns to extract common discriminative features through eliminating the effect of forgery methods and facial identities. Extensive experiments demonstrate the effectiveness of the proposed method under both intra-dataset and cross-dataset evaluation settings. △ Less

Submitted 8 July, 2022; originally announced July 2022.

arXiv:2205.05076 [pdf, other]

Reduce Information Loss in Transformers for Pluralistic Image Inpainting

Authors: Qiankun Liu, Zhentao Tan, Dongdong Chen, Qi Chu, Xiyang Dai, Yinpeng Chen, Mengchen Liu, Lu Yuan, Nenghai Yu

Abstract: Transformers have achieved great success in pluralistic image inpainting recently. However, we find existing transformer based solutions regard each pixel as a token, thus suffer from information loss issue from two aspects: 1) They downsample the input image into much lower resolutions for efficiency consideration, incurring information loss and extra misalignment for the boundaries of masked reg… ▽ More Transformers have achieved great success in pluralistic image inpainting recently. However, we find existing transformer based solutions regard each pixel as a token, thus suffer from information loss issue from two aspects: 1) They downsample the input image into much lower resolutions for efficiency consideration, incurring information loss and extra misalignment for the boundaries of masked regions. 2) They quantize $256^3$ RGB pixels to a small number (such as 512) of quantized pixels. The indices of quantized pixels are used as tokens for the inputs and prediction targets of transformer. Although an extra CNN network is used to upsample and refine the low-resolution results, it is difficult to retrieve the lost information back.To keep input information as much as possible, we propose a new transformer based framework "PUT". Specifically, to avoid input downsampling while maintaining the computation efficiency, we design a patch-based auto-encoder P-VQVAE, where the encoder converts the masked image into non-overlapped patch tokens and the decoder recovers the masked regions from inpainted tokens while kee** the unmasked regions unchanged. To eliminate the information loss caused by quantization, an Un-Quantized Transformer (UQ-Transformer) is applied, which directly takes the features from P-VQVAE encoder as input without quantization and regards the quantized tokens only as prediction targets. Extensive experiments show that PUT greatly outperforms state-of-the-art methods on image fidelity, especially for large masked regions and complex large-scale datasets. Code is available at https://github.com/liuqk3/PUT △ Less

Submitted 15 May, 2022; v1 submitted 10 May, 2022; originally announced May 2022.

Comments: CVPR 2022, code is available at https://github.com/liuqk3/PUT

arXiv:2204.04523 [pdf, other]

doi 10.1103/PhysRevD.106.042003

Search for continuous gravitational wave emission from the Milky Way center in O3 LIGO--Virgo data

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, R. Abbott, H. Abe, F. Acernese, K. Ackley, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi, A. Allocca, P. A. Altin , et al. (1645 additional authors not shown)

Abstract: We present a directed search for continuous gravitational wave (CW) signals emitted by spinning neutron stars located in the inner parsecs of the Galactic Center (GC). Compelling evidence for the presence of a numerous population of neutron stars has been reported in the literature, turning this region into a very interesting place to look for CWs. In this search, data from the full O3 LIGO--Virgo… ▽ More We present a directed search for continuous gravitational wave (CW) signals emitted by spinning neutron stars located in the inner parsecs of the Galactic Center (GC). Compelling evidence for the presence of a numerous population of neutron stars has been reported in the literature, turning this region into a very interesting place to look for CWs. In this search, data from the full O3 LIGO--Virgo run in the detector frequency band $[10,2000]\rm~Hz$ have been used. No significant detection was found and 95$\%$ confidence level upper limits on the signal strain amplitude were computed, over the full search band, with the deepest limit of about $7.6\times 10^{-26}$ at $\simeq 142\rm~Hz$. These results are significantly more constraining than those reported in previous searches. We use these limits to put constraints on the fiducial neutron star ellipticity and r-mode amplitude. These limits can be also translated into constraints in the black hole mass -- boson mass plane for a hypothetical population of boson clouds around spinning black holes located in the GC. △ Less

Submitted 9 April, 2022; originally announced April 2022.

Comments: 25 pages, 5 figures

arXiv:2203.12038 [pdf, other]

Search for Gravitational Waves Associated with Fast Radio Bursts Detected by CHIME/FRB During the LIGO--Virgo Observing Run O3a

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, the CHIME/FRB Collaboration, :, R. Abbott, T. D. Abbott, F. Acernese, K. Ackley, C. Adams, N. Adhikari, R. X. Adhikari, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, A. Allocca , et al. (1633 additional authors not shown)

Abstract: We search for gravitational-wave transients associated with fast radio bursts (FRBs) detected by the Canadian Hydrogen Intensity Map** Experiment Fast Radio Burst Project (CHIME/FRB), during the first part of the third observing run of Advanced LIGO and Advanced Virgo (1 April 2019 15:00 UTC-1 Oct 2019 15:00 UTC). Triggers from 22 FRBs were analyzed with a search that targets compact binary coal… ▽ More We search for gravitational-wave transients associated with fast radio bursts (FRBs) detected by the Canadian Hydrogen Intensity Map** Experiment Fast Radio Burst Project (CHIME/FRB), during the first part of the third observing run of Advanced LIGO and Advanced Virgo (1 April 2019 15:00 UTC-1 Oct 2019 15:00 UTC). Triggers from 22 FRBs were analyzed with a search that targets compact binary coalescences with at least one neutron star component. A targeted search for generic gravitational-wave transients was conducted on 40 FRBs. We find no significant evidence for a gravitational-wave association in either search. Given the large uncertainties in the distances of the FRBs inferred from the dispersion measures in our sample, however, this does not conclusively exclude any progenitor models that include emission of a gravitational wave of the types searched for from any of these FRB events. We report $90\%$ confidence lower bounds on the distance to each FRB for a range of gravitational-wave progenitor models. By combining the inferred maximum distance information for each FRB with the sensitivity of the gravitational-wave searches, we set upper limits on the energy emitted through gravitational waves for a range of emission scenarios. We find values of order $10^{51}$-$10^{57}$ erg for a range of different emission models with central gravitational wave frequencies in the range 70-3560 Hz. Finally, we also found no significant coincident detection of gravitational waves with the repeater, FRB 20200120E, which is the closest known extragalactic FRB. △ Less

Submitted 22 March, 2022; originally announced March 2022.

Comments: 35 pages, 6 figures, 8 tables

Report number: P2100124

arXiv:2203.01270 [pdf, other]

doi 10.1093/ptep/ptac073

First joint observation by the underground gravitational-wave detector, KAGRA, with GEO600

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, R. Abbott, H. Abe, F. Acernese, K. Ackley, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi, A. Allocca, P. A. Altin , et al. (1647 additional authors not shown)

Abstract: We report the results of the first joint observation of the KAGRA detector with GEO600. KAGRA is a cryogenic and underground gravitational-wave detector consisting of a laser interferometer with three-kilometer arms, and located in Kamioka, Gifu, Japan. GEO600 is a British--German laser interferometer with 600 m arms, and located near Hannover, Germany. GEO600 and KAGRA performed a joint observing… ▽ More We report the results of the first joint observation of the KAGRA detector with GEO600. KAGRA is a cryogenic and underground gravitational-wave detector consisting of a laser interferometer with three-kilometer arms, and located in Kamioka, Gifu, Japan. GEO600 is a British--German laser interferometer with 600 m arms, and located near Hannover, Germany. GEO600 and KAGRA performed a joint observing run from April 7 to 20, 2020. We present the results of the joint analysis of the GEO--KAGRA data for transient gravitational-wave signals, including the coalescence of neutron-star binaries and generic unmodeled transients. We also perform dedicated searches for binary coalescence signals and generic transients associated with gamma-ray burst events observed during the joint run. No gravitational-wave events were identified. We evaluate the minimum detectable amplitude for various types of transient signals and the spacetime volume for which the network is sensitive to binary neutron-star coalescences. We also place lower limits on the distances to the gamma-ray bursts analysed based on the non-detection of an associated gravitational-wave signal for several signal models, including binary coalescences. These analyses demonstrate the feasibility and utility of KAGRA as a member of the global gravitational-wave detector network. △ Less

Submitted 19 August, 2022; v1 submitted 2 March, 2022; originally announced March 2022.

Comments: Matches with published version

Report number: LIGO-P2100286

Journal ref: Progress of Theoretical and Experimental Physics, Volume 2022, Issue 6, 063F01 (2022)

arXiv:2201.10104 [pdf, other]

doi 10.1103/PhysRevD.106.062002

Search for gravitational waves from Scorpius X-1 with a hidden Markov model in O3 LIGO data

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, R. Abbott, H. Abe, F. Acernese, K. Ackley, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi, A. Allocca, P. A. Altin , et al. (1647 additional authors not shown)

Abstract: Results are presented for a semi-coherent search for continuous gravitational waves from the low-mass X-ray binary Scorpius X-1, using a hidden Markov model (HMM) to allow for spin wandering. This search improves on previous HMM-based searches of Laser Interferometer Gravitational-wave Observatory (LIGO) data by including the orbital period in the search template grid, and by analyzing data from t… ▽ More Results are presented for a semi-coherent search for continuous gravitational waves from the low-mass X-ray binary Scorpius X-1, using a hidden Markov model (HMM) to allow for spin wandering. This search improves on previous HMM-based searches of Laser Interferometer Gravitational-wave Observatory (LIGO) data by including the orbital period in the search template grid, and by analyzing data from the latest (third) observing run (O3). In the frequency range searched, from 60 to 500 Hz, we find no evidence of gravitational radiation. This is the most sensitive search for Scorpius X-1 using a HMM to date. For the most sensitive sub-band, starting at $256.06$Hz, we report an upper limit on gravitational wave strain (at $95 \%$ confidence) of $h_{0}^{95\%}=6.16\times10^{-26}$, assuming the orbital inclination angle takes its electromagnetically restricted value $ι=44^{\circ}$. The upper limits on gravitational wave strain reported here are on average a factor of $\sim 3$ lower than in the O2 HMM search. This is the first Scorpius X-1 HMM search with upper limits that reach below the indirect torque-balance limit for certain sub-bands, assuming $ι=44^{\circ}$. △ Less

Submitted 25 January, 2022; originally announced January 2022.

Comments: 23 pages, 5 figures

Report number: LIGO-P2100405

arXiv:2201.01297 [pdf, other]

Online Multi-Object Tracking with Unsupervised Re-Identification Learning and Occlusion Estimation

Authors: Qiankun Liu, Dongdong Chen, Qi Chu, Lu Yuan, Bin Liu, Lei Zhang, Nenghai Yu

Abstract: Occlusion between different objects is a typical challenge in Multi-Object Tracking (MOT), which often leads to inferior tracking results due to the missing detected objects. The common practice in multi-object tracking is re-identifying the missed objects after their reappearance. Though tracking performance can be boosted by the re-identification, the annotation of identity is required to train… ▽ More Occlusion between different objects is a typical challenge in Multi-Object Tracking (MOT), which often leads to inferior tracking results due to the missing detected objects. The common practice in multi-object tracking is re-identifying the missed objects after their reappearance. Though tracking performance can be boosted by the re-identification, the annotation of identity is required to train the model. In addition, such practice of re-identification still can not track those highly occluded objects when they are missed by the detector. In this paper, we focus on online multi-object tracking and design two novel modules, the unsupervised re-identification learning module and the occlusion estimation module, to handle these problems. Specifically, the proposed unsupervised re-identification learning module does not require any (pseudo) identity information nor suffer from the scalability issue. The proposed occlusion estimation module tries to predict the locations where occlusions happen, which are used to estimate the positions of missed objects by the detector. Our study shows that, when applied to state-of-the-art MOT methods, the proposed unsupervised re-identification learning is comparable to supervised re-identification learning, and the tracking performance is further improved by the proposed occlusion estimation module. △ Less

Submitted 4 January, 2022; originally announced January 2022.

Comments: To Appear at Neurocomputing 2022

arXiv:2201.00697 [pdf, other]

doi 10.1103/PhysRevD.106.102008

All-sky search for continuous gravitational waves from isolated neutron stars using Advanced LIGO and Advanced Virgo O3 data

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, R. Abbott, H. Abe, F. Acernese, K. Ackley, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi, A. Allocca, P. A. Altin , et al. (1645 additional authors not shown)

Abstract: We present results of an all-sky search for continuous gravitational waves which can be produced by spinning neutron stars with an asymmetry around their rotation axis, using data from the third observing run of the Advanced LIGO and Advanced Virgo detectors. Four different analysis methods are used to search in a gravitational-wave frequency band from 10 to 2048 Hz and a first frequency derivativ… ▽ More We present results of an all-sky search for continuous gravitational waves which can be produced by spinning neutron stars with an asymmetry around their rotation axis, using data from the third observing run of the Advanced LIGO and Advanced Virgo detectors. Four different analysis methods are used to search in a gravitational-wave frequency band from 10 to 2048 Hz and a first frequency derivative from $-10^{-8}$ to $10^{-9}$ Hz/s. No statistically-significant periodic gravitational-wave signal is observed by any of the four searches. As a result, upper limits on the gravitational-wave strain amplitude $h_0$ are calculated. The best upper limits are obtained in the frequency range of 100 to 200 Hz and they are ${\sim}1.1\times10^{-25}$ at 95\% confidence-level. The minimum upper limit of $1.10\times10^{-25}$ is achieved at a frequency 111.5 Hz. We also place constraints on the rates and abundances of nearby planetary- and asteroid-mass primordial black holes that could give rise to continuous gravitational-wave signals. △ Less

Submitted 3 January, 2022; originally announced January 2022.

Comments: 23 main text pages, 17 figures

Report number: LIGO-P2100367

arXiv:2112.10990 [pdf, other]

doi 10.3847/1538-4357/ac6ad0

Narrowband searches for continuous and long-duration transient gravitational waves from known pulsars in the LIGO-Virgo third observing run

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, R. Abbott, T. D. Abbott, F. Acernese, K. Ackley, C. Adams, N. Adhikari, R. X. Adhikari, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, A. Allocca, P. A. Altin, A. Amato , et al. (1636 additional authors not shown)

Abstract: Isolated neutron stars that are asymmetric with respect to their spin axis are possible sources of detectable continuous gravitational waves. This paper presents a fully-coherent search for such signals from eighteen pulsars in data from LIGO and Virgo's third observing run (O3). For known pulsars, efficient and sensitive matched-filter searches can be carried out if one assumes the gravitational… ▽ More Isolated neutron stars that are asymmetric with respect to their spin axis are possible sources of detectable continuous gravitational waves. This paper presents a fully-coherent search for such signals from eighteen pulsars in data from LIGO and Virgo's third observing run (O3). For known pulsars, efficient and sensitive matched-filter searches can be carried out if one assumes the gravitational radiation is phase-locked to the electromagnetic emission. In the search presented here, we relax this assumption and allow the frequency and frequency time-derivative of the gravitational waves to vary in a small range around those inferred from electromagnetic observations. We find no evidence for continuous gravitational waves, and set upper limits on the strain amplitude for each target. These limits are more constraining for seven of the targets than the spin-down limit defined by ascribing all rotational energy loss to gravitational radiation. In an additional search we look in O3 data for long-duration (hours-months) transient gravitational waves in the aftermath of pulsar glitches for six targets with a total of nine glitches. We report two marginal outliers from this search, but find no clear evidence for such emission either. The resulting duration-dependent strain upper limits do not surpass indirect energy constraints for any of these targets. △ Less

Submitted 27 June, 2022; v1 submitted 21 December, 2021; originally announced December 2021.

Comments: 37 pages, 9 figures, submitted to ApJ

Report number: LIGO-P2100267

Journal ref: ApJ, 932, 133 (2022)

arXiv:2112.06861 [pdf, other]

Tests of General Relativity with GWTC-3

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, R. Abbott, H. Abe, F. Acernese, K. Ackley, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, P. F. de Alarcón, S. Albanesi, R. A. Alfaidi, A. Allocca , et al. (1657 additional authors not shown)

Abstract: The ever-increasing number of detections of gravitational waves (GWs) from compact binaries by the Advanced LIGO and Advanced Virgo detectors allows us to perform ever-more sensitive tests of general relativity (GR) in the dynamical and strong-field regime of gravity. We perform a suite of tests of GR using the compact binary signals observed during the second half of the third observing run of th… ▽ More The ever-increasing number of detections of gravitational waves (GWs) from compact binaries by the Advanced LIGO and Advanced Virgo detectors allows us to perform ever-more sensitive tests of general relativity (GR) in the dynamical and strong-field regime of gravity. We perform a suite of tests of GR using the compact binary signals observed during the second half of the third observing run of those detectors. We restrict our analysis to the 15 confident signals that have false alarm rates $\leq 10^{-3}\, {\rm yr}^{-1}$. In addition to signals consistent with binary black hole (BH) mergers, the new events include GW200115_042309, a signal consistent with a neutron star--BH merger. We find the residual power, after subtracting the best fit waveform from the data for each event, to be consistent with the detector noise. Additionally, we find all the post-Newtonian deformation coefficients to be consistent with the predictions from GR, with an improvement by a factor of ~2 in the -1PN parameter. We also find that the spin-induced quadrupole moments of the binary BH constituents are consistent with those of Kerr BHs in GR. We find no evidence for dispersion of GWs, non-GR modes of polarization, or post-merger echoes in the events that were analyzed. We update the bound on the mass of the graviton, at 90% credibility, to $m_g \leq 1.27 \times 10^{-23} \mathrm{eV}/c^2$. The final mass and final spin as inferred from the pre-merger and post-merger parts of the waveform are consistent with each other. The studies of the properties of the remnant BHs, including deviations of the quasi-normal mode frequencies and dam** times, show consistency with the predictions of GR. In addition to considering signals individually, we also combine results from the catalog of GW signals to calculate more precise population constraints. We find no evidence in support of physics beyond GR. △ Less

Submitted 13 December, 2021; originally announced December 2021.

Report number: LIGO-P2100275

arXiv:2112.03597 [pdf, other]

doi 10.3847/2041-8213/ac5687

Early Warnings of Binary Neutron Star Coalescence using the SPIIR Search

Authors: Manoj Kovalam, Md Anwarul Kaium Patwary, Anala K Sreekumar, Linqing Wen, Fiona H Panther, Qi Chu

Abstract: Gravitational waves from binary neutron star mergers can be used as alerts to enable prompt follow-up observations. In particular, capturing prompt electromagnetic and astroparticle emissions from the moment of a binary merger presents unique constraints on the time scale and sky localization for online gravitational wave detection. Here we present the expected performance of the SPIIR online dete… ▽ More Gravitational waves from binary neutron star mergers can be used as alerts to enable prompt follow-up observations. In particular, capturing prompt electromagnetic and astroparticle emissions from the moment of a binary merger presents unique constraints on the time scale and sky localization for online gravitational wave detection. Here we present the expected performance of the SPIIR online detection pipeline that is designed for this purpose in the upcoming international LIGO-Virgo's 4th Science Run (O4). Using simulated Gaussian data for the two LIGO observatories with expected O4 sensitivity, we demonstrate that there is a non-negligible opportunity to deliver pre-merger warnings at least 10 s before the final plunge. These alerts are expected to be issued at a nominal rate of one binary neutron star coalescence per year and localized within a median searched area of 300 $deg^2$. We envision such a detection to be extremely useful for follow-up observatories with a large field of view such as the Murchison Widefield Array radio facility at Western Australia. △ Less

Submitted 22 February, 2022; v1 submitted 7 December, 2021; originally announced December 2021.

arXiv:2111.15507 [pdf, other]

doi 10.1103/PhysRevD.105.102001

All-sky search for gravitational wave emission from scalar boson clouds around spinning black holes in LIGO O3 data

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, R. Abbott, H. Abe, F. Acernese, K. Ackley, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi, A. Allocca, P. A. Altin , et al. (1647 additional authors not shown)

Abstract: This paper describes the first all-sky search for long-duration, quasi-monochromatic gravitational-wave signals emitted by ultralight scalar boson clouds around spinning black holes using data from the third observing run of Advanced LIGO. We analyze the frequency range from 20~Hz to 610~Hz, over a small frequency derivative range around zero, and use multiple frequency resolutions to be robust to… ▽ More This paper describes the first all-sky search for long-duration, quasi-monochromatic gravitational-wave signals emitted by ultralight scalar boson clouds around spinning black holes using data from the third observing run of Advanced LIGO. We analyze the frequency range from 20~Hz to 610~Hz, over a small frequency derivative range around zero, and use multiple frequency resolutions to be robust towards possible signal frequency wanderings. Outliers from this search are followed up using two different methods, one more suitable for nearly monochromatic signals, and the other more robust towards frequency fluctuations. We do not find any evidence for such signals and set upper limits on the signal strain amplitude, the most stringent being $\approx10^{-25}$ at around 130~Hz. We interpret these upper limits as both an "exclusion region" in the boson mass/black hole mass plane and the maximum detectable distance for a given boson mass, based on an assumption of the age of the black hole/boson cloud system. △ Less

Submitted 9 May, 2022; v1 submitted 30 November, 2021; originally announced November 2021.

Comments: 28 pages, 16 figures

Report number: P2100343

Journal ref: Phys. Rev. D 105, 102001, 2022

arXiv:2111.15116 [pdf, other]

doi 10.1103/PhysRevD.105.082005

Search of the Early O3 LIGO Data for Continuous Gravitational Waves from the Cassiopeia A and Vela Jr. Supernova Remnants

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, R. Abbott, T. D. Abbott, F. Acernese, K. Ackley, C. Adams, N. Adhikari, R. X. Adhikari, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, S. Albanesi, A. Allocca, P. A. Altin, A. Amato, C. Anand, S. Anand , et al. (1389 additional authors not shown)

Abstract: We present directed searches for continuous gravitational waves from the neutron stars in the Cassiopeia A (Cas A) and Vela Jr. supernova remnants. We carry out the searches in the LIGO data from the first six months of the third Advanced LIGO and Virgo observing run, using the Weave semi-coherent method, which sums matched-filter detection-statistic values over many time segments spanning the obs… ▽ More We present directed searches for continuous gravitational waves from the neutron stars in the Cassiopeia A (Cas A) and Vela Jr. supernova remnants. We carry out the searches in the LIGO data from the first six months of the third Advanced LIGO and Virgo observing run, using the Weave semi-coherent method, which sums matched-filter detection-statistic values over many time segments spanning the observation period. No gravitational wave signal is detected in the search band of 20--976 Hz for assumed source ages greater than 300 years for Cas A and greater than 700 years for Vela Jr. Estimates from simulated continuous wave signals indicate we achieve the most sensitive results to date across the explored parameter space volume, probing to strain magnitudes as low as ~$6.3\times10^{-26}$ for Cas A and ~$5.6\times10^{-26}$ for Vela Jr. at frequencies near 166 Hz at 95% efficiency. △ Less

Submitted 22 March, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

Comments: 24 pages, 8 figures. To appear in Physical Review D

Report number: LIGO-P2100298-v8

arXiv:2111.13106 [pdf, other]

doi 10.3847/1538-4357/ac6acf

Searches for Gravitational Waves from Known Pulsars at Two Harmonics in the Second and Third LIGO-Virgo Observing Runs

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, R. Abbott, H. Abe, F. Acernese, K. Ackley, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi, A. Allocca, P. A. Altin , et al. (1672 additional authors not shown)

Abstract: We present a targeted search for continuous gravitational waves (GWs) from 236 pulsars using data from the third observing run of LIGO and Virgo (O3) combined with data from the second observing run (O2). Searches were for emission from the $l=m=2$ mass quadrupole mode with a frequency at only twice the pulsar rotation frequency (single harmonic) and the $l=2, m=1,2$ modes with a frequency of both… ▽ More We present a targeted search for continuous gravitational waves (GWs) from 236 pulsars using data from the third observing run of LIGO and Virgo (O3) combined with data from the second observing run (O2). Searches were for emission from the $l=m=2$ mass quadrupole mode with a frequency at only twice the pulsar rotation frequency (single harmonic) and the $l=2, m=1,2$ modes with a frequency of both once and twice the rotation frequency (dual harmonic). No evidence of GWs was found so we present 95\% credible upper limits on the strain amplitudes $h_0$ for the single harmonic search along with limits on the pulsars' mass quadrupole moments $Q_{22}$ and ellipticities $\varepsilon$. Of the pulsars studied, 23 have strain amplitudes that are lower than the limits calculated from their electromagnetically measured spin-down rates. These pulsars include the millisecond pulsars J0437\textminus4715 and J0711\textminus6830 which have spin-down ratios of 0.87 and 0.57 respectively. For nine pulsars, their spin-down limits have been surpassed for the first time. For the Crab and Vela pulsars our limits are factors of $\sim 100$ and $\sim 20$ more constraining than their spin-down limits, respectively. For the dual harmonic searches, new limits are placed on the strain amplitudes $C_{21}$ and $C_{22}$. For 23 pulsars we also present limits on the emission amplitude assuming dipole radiation as predicted by Brans-Dicke theory. △ Less

Submitted 20 July, 2022; v1 submitted 25 November, 2021; originally announced November 2021.

Comments: 37 pages

Report number: LIGO-P2100049

Showing 1–50 of 227 results for author: Chu, Q