Search | arXiv e-print repository

Hallucination of Multimodal Large Language Models: A Survey

Authors: Zechen Bai, Pichao Wang, Tianjun Xiao, Tong He, Zongbo Han, Zheng Zhang, Mike Zheng Shou

Abstract: This survey presents a comprehensive analysis of the phenomenon of hallucination in multimodal large language models (MLLMs), also known as Large Vision-Language Models (LVLMs), which have demonstrated significant advancements and remarkable abilities in multimodal tasks. Despite these promising developments, MLLMs often generate outputs that are inconsistent with the visual content, a challenge k… ▽ More This survey presents a comprehensive analysis of the phenomenon of hallucination in multimodal large language models (MLLMs), also known as Large Vision-Language Models (LVLMs), which have demonstrated significant advancements and remarkable abilities in multimodal tasks. Despite these promising developments, MLLMs often generate outputs that are inconsistent with the visual content, a challenge known as hallucination, which poses substantial obstacles to their practical deployment and raises concerns regarding their reliability in real-world applications. This problem has attracted increasing attention, prompting efforts to detect and mitigate such inaccuracies. We review recent advances in identifying, evaluating, and mitigating these hallucinations, offering a detailed overview of the underlying causes, evaluation benchmarks, metrics, and strategies developed to address this issue. Additionally, we analyze the current challenges and limitations, formulating open questions that delineate potential pathways for future research. By drawing the granular classification and landscapes of hallucination causes, evaluation benchmarks, and mitigation methods, this survey aims to deepen the understanding of hallucinations in MLLMs and inspire further advancements in the field. Through our thorough and in-depth review, we contribute to the ongoing dialogue on enhancing the robustness and reliability of MLLMs, providing valuable insights and resources for researchers and practitioners alike. Resources are available at: https://github.com/showlab/Awesome-MLLM-Hallucination. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: 140 references

arXiv:2404.18077 [pdf, other]

Generative AI for Low-Carbon Artificial Intelligence of Things

Authors: **bo Wen, Ruichen Zhang, Dusit Niyato, Jiawen Kang, Hongyang Du, Yang Zhang, Zhu Han

Abstract: By integrating Artificial Intelligence (AI) with the Internet of Things (IoT), Artificial Intelligence of Things (AIoT) has revolutionized many fields. However, AIoT is facing the challenges of energy consumption and carbon emissions due to the continuous advancement of mobile technology. Fortunately, Generative AI (GAI) holds immense potential to reduce carbon emissions of AIoT due to its excelle… ▽ More By integrating Artificial Intelligence (AI) with the Internet of Things (IoT), Artificial Intelligence of Things (AIoT) has revolutionized many fields. However, AIoT is facing the challenges of energy consumption and carbon emissions due to the continuous advancement of mobile technology. Fortunately, Generative AI (GAI) holds immense potential to reduce carbon emissions of AIoT due to its excellent reasoning and generation capabilities. In this article, we explore the potential of GAI for carbon emissions reduction and propose a novel GAI-enabled solution for low-carbon AIoT. Specifically, we first study the main impacts that cause carbon emissions in AIoT, and then introduce GAI techniques and their relations to carbon emissions. We then explore the application prospects of GAI in low-carbon AIoT, focusing on how GAI can reduce carbon emissions of network components. Subsequently, we propose a Large Language Model (LLM)-enabled carbon emission optimization framework, in which we design pluggable LLM and Retrieval Augmented Generation (RAG) modules to generate more accurate and reliable optimization problems. Furthermore, we utilize Generative Diffusion Models (GDMs) to identify optimal strategies for carbon emission reduction. Simulation results demonstrate the effectiveness of the proposed framework. Finally, we insightfully provide open research directions for low-carbon AIoT. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.16258 [pdf, ps, other]

Central charges in local mirror symmetry via hypergeometric duality

Authors: Zengrui Han

Abstract: We apply the better-behaved GKZ hypergeometric systems to study toric Calabi-Yau Deligne-Mumford stacks and their Hori-Vafa mirrors given by affine hypersurfaces in algebraic tori. We show the equality between A-brane and B-brane central charges, in terms of period integrals and hypergeometric series respectively. This settles a conjecture of Hosono, which could also be considered as a generalizat… ▽ More We apply the better-behaved GKZ hypergeometric systems to study toric Calabi-Yau Deligne-Mumford stacks and their Hori-Vafa mirrors given by affine hypersurfaces in algebraic tori. We show the equality between A-brane and B-brane central charges, in terms of period integrals and hypergeometric series respectively. This settles a conjecture of Hosono, which could also be considered as a generalization of the Gamma conjecture for local mirror symmetry. △ Less

Submitted 14 May, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

Comments: 32 pages. Comparison with related results and references added; typos fixed; exposition improved

MSC Class: 14M25; 14J32; 33C99

arXiv:2404.14778 [pdf, other]

Channel Estimation for Optical Intelligent Reflecting Surface-Assisted VLC System: A Joint Space-Time Sampling Approach

Authors: Shiyuan Sun, Fang Yang, Weidong Mei, Jian Song, Zhu Han, Rui Zhang

Abstract: Optical intelligent reflecting surface (OIRS) has attracted increasing attention due to its capability of overcoming signal blockages in visible light communication (VLC), an emerging technology for the next-generation advanced transceivers. However, current works on OIRS predominantly assume known channel state information (CSI), which is essential to practical OIRS configuration. To bridge such… ▽ More Optical intelligent reflecting surface (OIRS) has attracted increasing attention due to its capability of overcoming signal blockages in visible light communication (VLC), an emerging technology for the next-generation advanced transceivers. However, current works on OIRS predominantly assume known channel state information (CSI), which is essential to practical OIRS configuration. To bridge such a gap, this paper proposes a new and customized channel estimation protocol for OIRSs under the alignment-based channel model. Specifically, we first unveil OIRS spatial and temporal coherence characteristics and derive the coherence distance and the coherence time in closed form. Next, to achieve fast beam alignment over different coherence time, we propose to dynamically tune the rotational angles of the OIRS reflecting elements following a geometric optics-based non-uniform codebook. Given the above beam alignment, we propose an efficient joint space-time sampling-based algorithm to estimate the OIRS channel. In particular, we divide the OIRS into multiple subarrays based on the coherence distance and sequentially estimate their associated CSI, followed by a spacetime interpolation to retrieve full CSI for other non-aligned transceiver antennas. Numerical results validate our theoretical analyses and demonstrate the efficacy of our proposed OIRS channel estimation scheme as compared to other benchmark schemes. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.14706 [pdf, other]

Channel Estimation for Optical IRS-Assisted VLC System via Spatial Coherence

Authors: Shiyuan Sun, Fang Yang, Weidong Mei, Jian Song, Zhu Han, Rui Zhang

Abstract: Optical intelligent reflecting surface (OIRS) has been considered a promising technology for visible light communication (VLC) by constructing visual line-of-sight propagation paths to address the signal blockage issue. However, the existing works on OIRSs are mostly based on perfect channel state information (CSI), whose acquisition appears to be challenging due to the passive nature of the OIRS.… ▽ More Optical intelligent reflecting surface (OIRS) has been considered a promising technology for visible light communication (VLC) by constructing visual line-of-sight propagation paths to address the signal blockage issue. However, the existing works on OIRSs are mostly based on perfect channel state information (CSI), whose acquisition appears to be challenging due to the passive nature of the OIRS. To tackle this challenge, this paper proposes a customized channel estimation algorithm for OIRSs. Specifically, we first unveil the OIRS spatial coherence characteristics and derive the coherence distance in closed form. Based on this property, a spatial sampling-based algorithm is proposed to estimate the OIRS-reflected channel, by dividing the OIRS into multiple subarrays based on the coherence distance and sequentially estimating their associated CSI, followed by an interpolation to retrieve the full CSI. Simulation results validate the derived OIRS spatial coherence and demonstrate the efficacy of the proposed OIRS channel estimation algorithm. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.14140 [pdf, other]

Generative Artificial Intelligence Assisted Wireless Sensing: Human Flow Detection in Practical Communication Environments

Authors: Jiacheng Wang, Hongyang Du, Dusit Niyato, Zehui Xiong, Jiawen Kang, Bo Ai, Zhu Han, Dong In Kim

Abstract: Groundbreaking applications such as ChatGPT have heightened research interest in generative artificial intelligence (GAI). Essentially, GAI excels not only in content generation but also in signal processing, offering support for wireless sensing. Hence, we introduce a novel GAI-assisted human flow detection system (G-HFD). Rigorously, G-HFD first uses channel state information (CSI) to estimate t… ▽ More Groundbreaking applications such as ChatGPT have heightened research interest in generative artificial intelligence (GAI). Essentially, GAI excels not only in content generation but also in signal processing, offering support for wireless sensing. Hence, we introduce a novel GAI-assisted human flow detection system (G-HFD). Rigorously, G-HFD first uses channel state information (CSI) to estimate the velocity and acceleration of propagation path length change of the human-induced reflection (HIR). Then, given the strong inference ability of the diffusion model, we propose a unified weighted conditional diffusion model (UW-CDM) to denoise the estimation results, enabling the detection of the number of targets. Next, we use the CSI obtained by a uniform linear array with wavelength spacing to estimate the HIR's time of flight and direction of arrival (DoA). In this process, UW-CDM solves the problem of ambiguous DoA spectrum, ensuring accurate DoA estimation. Finally, through clustering, G-HFD determines the number of subflows and the number of targets in each subflow, i.e., the subflow size. The evaluation based on practical downlink communication signals shows G-HFD's accuracy of subflow size detection can reach 91%. This validates its effectiveness and underscores the significant potential of GAI in the context of wireless sensing. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.14131 [pdf, ps, other]

Possible signatures of higher dimension in thin accretion disk around brane world black hole

Authors: Ailin Liu, Tong-Yu He, Ming Liu, Zhan-Wen Han, Rong-Jia Yang

Abstract: We probe deeply into the characteristics of thin accretion disk surrounding black hole within the brane world paradigm. We investigate how model parameters affect the physical properties of the disk. Our findings indicate that as the tidal charge parameter inherited from the higher dimension increases, the energy flux, the radiation temperature, the spectral cutoff frequency, the spectral luminosi… ▽ More We probe deeply into the characteristics of thin accretion disk surrounding black hole within the brane world paradigm. We investigate how model parameters affect the physical properties of the disk. Our findings indicate that as the tidal charge parameter inherited from the higher dimension increases, the energy flux, the radiation temperature, the spectral cutoff frequency, the spectral luminosity, and the conversion efficiency of the disk all increase, but the radius of the innermost stable circular orbit decreases. Compared to cases of the Kerr and Schwarzschild black holes, the disk is hotter and more luminous for positive tidal charge parameter, while it is cooler and less luminous for negative tidal charge parameter, which suggests the potential for probing possible signatures of higher dimension. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: 16 pages, 16 figures

arXiv:2404.13816 [pdf, other]

Neural Radiance Field in Autonomous Driving: A Survey

Authors: Lei He, Leheng Li, Wenchao Sun, Zeyu Han, Yichen Liu, Sifa Zheng, Jianqiang Wang, Keqiang Li

Abstract: Neural Radiance Field (NeRF) has garnered significant attention from both academia and industry due to its intrinsic advantages, particularly its implicit representation and novel view synthesis capabilities. With the rapid advancements in deep learning, a multitude of methods have emerged to explore the potential applications of NeRF in the domain of Autonomous Driving (AD). However, a conspicuou… ▽ More Neural Radiance Field (NeRF) has garnered significant attention from both academia and industry due to its intrinsic advantages, particularly its implicit representation and novel view synthesis capabilities. With the rapid advancements in deep learning, a multitude of methods have emerged to explore the potential applications of NeRF in the domain of Autonomous Driving (AD). However, a conspicuous void is apparent within the current literature. To bridge this gap, this paper conducts a comprehensive survey of NeRF's applications in the context of AD. Our survey is structured to categorize NeRF's applications in Autonomous Driving (AD), specifically encompassing perception, 3D reconstruction, simultaneous localization and map** (SLAM), and simulation. We delve into in-depth analysis and summarize the findings for each application category, and conclude by providing insights and discussions on future directions in this field. We hope this paper serves as a comprehensive reference for researchers in this domain. To the best of our knowledge, this is the first survey specifically focused on the applications of NeRF in the Autonomous Driving domain. △ Less

Submitted 26 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.13689 [pdf, other]

Stability of the Abstract Thermoelastic System with Singularity

Authors: Chenxi Deng, Zhong-Jie Han, Zhaobin Kuang, Qiong Zhang

Abstract: In this paper, we analyze an abstract thermoelastic system, where the heat conduction follows the Cattaneo law. Zero becomes a spectrum point of the system operator when the coupling and thermal dam** parameters of the system satisfy specific conditions. We obtain the decay rates of solutions to the system with or without the inertial term. Furthermore, the decay rate of the system without inert… ▽ More In this paper, we analyze an abstract thermoelastic system, where the heat conduction follows the Cattaneo law. Zero becomes a spectrum point of the system operator when the coupling and thermal dam** parameters of the system satisfy specific conditions. We obtain the decay rates of solutions to the system with or without the inertial term. Furthermore, the decay rate of the system without inertial terms is shown to be optimal. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: 15 pages

MSC Class: 35Q74; 74F05

arXiv:2404.12666 [pdf, other]

A Survey on Federated Analytics: Taxonomy, Enabling Techniques, Applications and Open Issues

Authors: Zibo Wang, Haichao Ji, Yifei Zhu, Dan Wang, Zhu Han

Abstract: The escalating influx of data generated by networked edge devices, coupled with the growing awareness of data privacy, has promoted a transformative shift in computing paradigms from centralized data processing to privacy-preserved distributed data processing. Federated analytics (FA) is an emerging technique to support collaborative data analytics among diverse data owners without centralizing th… ▽ More The escalating influx of data generated by networked edge devices, coupled with the growing awareness of data privacy, has promoted a transformative shift in computing paradigms from centralized data processing to privacy-preserved distributed data processing. Federated analytics (FA) is an emerging technique to support collaborative data analytics among diverse data owners without centralizing the raw data. Despite the wide applications of FA in industry and academia, a comprehensive examination of existing research efforts in FA has been notably absent. This survey aims to bridge this gap by first providing an overview of FA, elucidating key concepts, and discussing its relationship with similar concepts. We then conduct a thorough examination of FA, including its taxonomy, key challenges, and enabling techniques. Diverse FA applications, including statistical metrics, set computation, frequency-related applications, database query operations, model-based applications, FL-assisting FA tasks, and other wireless network applications are then carefully reviewed. We complete the survey with several open research issues and future directions. This survey intends to provide a holistic understanding of the emerging FA techniques and foster the continued evolution of privacy-preserving distributed data processing in the emerging networked society. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: This survey has been submitted to IEEE Communications Surveys & Tutorials

arXiv:2404.12154 [pdf, other]

StyleBooth: Image Style Editing with Multimodal Instruction

Authors: Zhen Han, Chaojie Mao, Zeyinzi Jiang, Yulin Pan, **gfeng Zhang

Abstract: Given an original image, image editing aims to generate an image that align with the provided instruction. The challenges are to accept multimodal inputs as instructions and a scarcity of high-quality training data, including crucial triplets of source/target image pairs and multimodal (text and image) instructions. In this paper, we focus on image style editing and present StyleBooth, a method th… ▽ More Given an original image, image editing aims to generate an image that align with the provided instruction. The challenges are to accept multimodal inputs as instructions and a scarcity of high-quality training data, including crucial triplets of source/target image pairs and multimodal (text and image) instructions. In this paper, we focus on image style editing and present StyleBooth, a method that proposes a comprehensive framework for image editing and a feasible strategy for building a high-quality style editing dataset. We integrate encoded textual instruction and image exemplar as a unified condition for diffusion model, enabling the editing of original image following multimodal instructions. Furthermore, by iterative style-destyle tuning and editing and usability filtering, the StyleBooth dataset provides content-consistent stylized/plain image pairs in various categories of styles. To show the flexibility of StyleBooth, we conduct experiments on diverse tasks, such as text-based style editing, exemplar-based style editing and compositional style editing. The results demonstrate that the quality and variety of training data significantly enhance the ability to preserve content and improve the overall quality of generated images in editing tasks. Project page can be found at https://ali-vilab.github.io/stylebooth-page/. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.11950 [pdf, other]

Pair density waves in the strong-coupling two-dimensional Holstein-Hubbard model: a variational Monte Carlo study

Authors: Jiucai Wang, Wen Sun, Hao-Xin Wang, Zhaoyu Han, Steven A. Kivelson, Hong Yao

Abstract: A robust theory of the mechanism of pair density wave (PDW) superconductivity (i.e. where Cooper pairs have nonzero center of mass momentum) remains elusive. Here we explore the triangular lattice $t$-$J$-$V$ model, a low-energy effective theory derived from the strong-coupling limit of the Holstein-Hubbard model, by large-scale variational Monte Carlo simulations. When the electron density is suf… ▽ More A robust theory of the mechanism of pair density wave (PDW) superconductivity (i.e. where Cooper pairs have nonzero center of mass momentum) remains elusive. Here we explore the triangular lattice $t$-$J$-$V$ model, a low-energy effective theory derived from the strong-coupling limit of the Holstein-Hubbard model, by large-scale variational Monte Carlo simulations. When the electron density is sufficiently low, the favored ground state is an s-wave PDW, consistent with results obtained from previous studies in this limit. Additionally, a PDW ground state with nematic d-wave pairing emerges in intermediate range of electron densities and phonon frequencies. For these s-wave and d-wave PDWs arising in states with spontaneous breaking of time-reversal and inversion symmetries, PDW formation derives from valley-polarization and intra-pocket pairing. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 4.5 pages, 4 figures, 2 tables

arXiv:2404.10343 [pdf, other]

The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/. △ Less

Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

arXiv:2404.09699 [pdf, other]

Generative AI for Game Theory-based Mobile Networking

Authors: Long He, Geng Sun, Dusit Niyato, Hongyang Du, Fang Mei, Jiawen Kang, Mérouane Debbah, and Zhu Han

Abstract: With the continuous advancement of network technology, various emerging complex networking optimization problems opened up a wide range of applications utilizating of game theory. However, since game theory is a mathematical framework, game theory-based solutions often require the experience and knowledge of human experts. Recently, the remarkable advantages exhibited by generative artificial inte… ▽ More With the continuous advancement of network technology, various emerging complex networking optimization problems opened up a wide range of applications utilizating of game theory. However, since game theory is a mathematical framework, game theory-based solutions often require the experience and knowledge of human experts. Recently, the remarkable advantages exhibited by generative artificial intelligence (GAI) have gained widespread attention. In this article, we propose a novel GAI-enabled game theory solution that combines the powerful reasoning and generation capabilities of GAI to the design and optimization of mobile networking. Specifically, we first outline the game theory and key technologies of GAI, and then explore the advantages of combining GAI with game theory. Then, we briefly review the advantages and limitations of existing research and demonstrate the potential application values of GAI applied to game theory in mobile networking. Subsequently, we develop a game theory framework enabled by large language models (LLMs) to realize this combination, and demonstrate the effectiveness of the proposed framework through a case study in secured UAV networks. Finally, we provide several directions for future extensions. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.09079 [pdf, ps, other]

Compactness results for a Dirichlet energy of nonlocal gradient with applications

Authors: Zhaolong Han, Tadele Mengesha, Xiaochuan Tian

Abstract: We prove two compactness results for function spaces with finite Dirichlet energy of half-space nonlocal gradients. In each of these results, we provide sufficient conditions on a sequence of kernel functions that guarantee the asymptotic compact embedding of the associated nonlocal function spaces into the class of square-integrable functions. Moreover, we will demonstrate that the sequence of no… ▽ More We prove two compactness results for function spaces with finite Dirichlet energy of half-space nonlocal gradients. In each of these results, we provide sufficient conditions on a sequence of kernel functions that guarantee the asymptotic compact embedding of the associated nonlocal function spaces into the class of square-integrable functions. Moreover, we will demonstrate that the sequence of nonlocal function spaces converges in an appropriate sense to a limiting function space. As an application, we prove uniform Poincaré-type inequalities for sequence of half-space gradient operators. We also apply the compactness result to demonstrate the convergence of appropriately parameterized nonlocal heterogeneous anisotropic diffusion problems. We will construct asymptotically compatible schemes for these type of problems. Another application concerns the convergence and robust discretization of a nonlocal optimal control problem. △ Less

Submitted 13 April, 2024; originally announced April 2024.

arXiv:2404.08160 [pdf, other]

A Survey on Security of Ultra/Hyper Reliable Low Latency Communication: Recent Advancements, Challenges, and Future Directions

Authors: Annapurna Pradhan, Susmita Das, Md. Jalil Piran, Zhu Han

Abstract: Ultra-reliable low latency communication (URLLC) is an innovative service offered by fifth-generation (5G) wireless systems. URLLC enables various mission-critical applications by facilitating reliable and low-latency signal transmission to support extreme Quality of Service (QoS) requirements. Apart from reliability and latency, ensuring secure data transmission for URLLC has been a prominent iss… ▽ More Ultra-reliable low latency communication (URLLC) is an innovative service offered by fifth-generation (5G) wireless systems. URLLC enables various mission-critical applications by facilitating reliable and low-latency signal transmission to support extreme Quality of Service (QoS) requirements. Apart from reliability and latency, ensuring secure data transmission for URLLC has been a prominent issue for researchers in recent years. Using finite blocklength signals to achieve the stringent reliability and latency criteria in URLLC eliminates the possibility of using conventional complex cryptographic security enhancement techniques based on encoding and decoding of secret keys. Thus, the development of lightweight security mechanisms is of paramount importance for URLLC. Recently, Physical-Layer Security (PLS) techniques have emerged as a powerful alternative to the complex cryptography-based security approaches for facilitating secure URLLC by exploiting the randomness of the wireless channel. Therefore, in this survey, we present a comprehensive and in-depth review of the state-of-the-art PLS enhancements utilized to unleash secure URLLC while analyzing the impact of various system design parameters on its performance. Moreover, the survey incorporates a detailed overview of the recent advancements in ensuring secure URLLC using PLS in various mission-critical applications, and 5G URLLC enabling technologies like non-orthogonal multiple access (NOMA), multi-antenna systems, cooperative communication using unmanned aerial vehicles (UAV), and intelligent reflective surfaces (IRS). Apart from this, we briefly discuss the role of advanced Machine Learning (ML) techniques in designing robust and intelligent PLS schemes for URLLC service. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.07477 [pdf, ps, other]

Integrated Sensing and Communication Under DISCO Physical-Layer Jamming Attacks

Authors: Huan Huang, Hongliang Zhang, Weidong Mei, Jun Li, Yi Cai, A. Lee Swindlehurst, Zhu Han

Abstract: Integrated sensing and communication (ISAC) systems traditionally presuppose that sensing and communication (S&C) channels remain approximately constant during their coherence time. However, a "DISCO" reconfigurable intelligent surface (DRIS), i.e., an illegitimate RIS with random, time-varying reflection properties that acts like a "disco ball," introduces a paradigm shift that enables active cha… ▽ More Integrated sensing and communication (ISAC) systems traditionally presuppose that sensing and communication (S&C) channels remain approximately constant during their coherence time. However, a "DISCO" reconfigurable intelligent surface (DRIS), i.e., an illegitimate RIS with random, time-varying reflection properties that acts like a "disco ball," introduces a paradigm shift that enables active channel aging more rapidly during the channel coherence time. In this letter, we investigate the impact of DISCO jamming attacks launched by a DRISbased fully-passive jammer (FPJ) on an ISAC system. Specifically, an ISAC problem formulation and a corresponding waveform optimization are presented in which the ISAC waveform design considers the trade-off between the S&C performance and is formulated as a Pareto optimization problem. Moreover, a theoretical analysis is conducted to quantify the impact of DISCO jamming attacks. Numerical results are presented to evaluate the S&C performance under DISCO jamming attacks and to validate the derived theoretical analysis. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: This paper has been submitted for possible publication. For the code of the DISCO RIS is available on Github (https://github.com/huanhuan1799/Disco-Intelligent-Reflecting-Surfaces-Active-Channel-Aging-for-Fully-Passive-Jamming-Attacks)

arXiv:2404.06851 [pdf, other]

UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion

Authors: Junsheng Zhou, Weiqi Zhang, Baorui Ma, Kanle Shi, Yu-Shen Liu, Zhizhong Han

Abstract: Diffusion models have shown remarkable results for image generation, editing and inpainting. Recent works explore diffusion models for 3D shape generation with neural implicit functions, i.e., signed distance function and occupancy function. However, they are limited to shapes with closed surfaces, which prevents them from generating diverse 3D real-world contents containing open surfaces. In this… ▽ More Diffusion models have shown remarkable results for image generation, editing and inpainting. Recent works explore diffusion models for 3D shape generation with neural implicit functions, i.e., signed distance function and occupancy function. However, they are limited to shapes with closed surfaces, which prevents them from generating diverse 3D real-world contents containing open surfaces. In this work, we present UDiFF, a 3D diffusion model for unsigned distance fields (UDFs) which is capable to generate textured 3D shapes with open surfaces from text conditions or unconditionally. Our key idea is to generate UDFs in spatial-frequency domain with an optimal wavelet transformation, which produces a compact representation space for UDF generation. Specifically, instead of selecting an appropriate wavelet transformation which requires expensive manual efforts and still leads to large information loss, we propose a data-driven approach to learn the optimal wavelet transformation for UDFs. We evaluate UDiFF to show our advantages by numerical and visual comparisons with the latest methods on widely used benchmarks. Page: https://weiqi-zhang.github.io/UDiFF. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: To appear at CVPR2024. Project page: https://weiqi-zhang.github.io/UDiFF

arXiv:2404.06765 [pdf, other]

Harnessing the Power of AI-Generated Content for Semantic Communication

Authors: Yiru Wang, Wanting Yang, Zehui Xiong, Yu** Zhao, Tony Q. S. Quek, Zhu Han

Abstract: Semantic Communication (SemCom) is envisaged as the next-generation paradigm to address challenges stemming from the conflicts between the increasing volume of transmission data and the scarcity of spectrum resources. However, existing SemCom systems face drawbacks, such as low explainability, modality rigidity, and inadequate reconstruction functionality. Recognizing the transformative capabiliti… ▽ More Semantic Communication (SemCom) is envisaged as the next-generation paradigm to address challenges stemming from the conflicts between the increasing volume of transmission data and the scarcity of spectrum resources. However, existing SemCom systems face drawbacks, such as low explainability, modality rigidity, and inadequate reconstruction functionality. Recognizing the transformative capabilities of AI-generated content (AIGC) technologies in content generation, this paper explores a pioneering approach by integrating them into SemCom to address the aforementioned challenges. We employ a three-layer model to illustrate the proposed AIGC-assisted SemCom (AIGC-SCM) architecture, emphasizing its clear deviation from existing SemCom. Grounded in this model, we investigate various AIGC technologies with the potential to augment SemCom's performance. In alignment with SemCom's goal of conveying semantic meanings, we also introduce the new evaluation methods for our AIGC-SCM system. Subsequently, we explore communication scenarios where our proposed AIGC-SCM can realize its potential. For practical implementation, we construct a detailed integration workflow and conduct a case study in a virtual reality image transmission scenario. The results demonstrate our ability to maintain a high degree of alignment between the reconstructed content and the original source information, while substantially minimizing the data volume required for transmission. These findings pave the way for further enhancements in communication efficiency and the improvement of Quality of Service. At last, we present future directions for AIGC-SCM studies. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.06148 [pdf, other]

The radius variations of accreting main sequence stars and mass transfer instability

Authors: Zi-Qi Zhao, Zhen-Wei Li, Lin Xiao, Hong-Wei Ge, Zhan-Wen Han

Abstract: Many previous works studied the dynamical timescale mass transfer stability criteria based on the donor response with neglecting the stellar structure of the accretor. In this letter, we investigate the radial response of accretors with mass accumulation and its effect on the binary mass transfer stability. We perform a series of detailed stellar evolution simulations with different types of accre… ▽ More Many previous works studied the dynamical timescale mass transfer stability criteria based on the donor response with neglecting the stellar structure of the accretor. In this letter, we investigate the radial response of accretors with mass accumulation and its effect on the binary mass transfer stability. We perform a series of detailed stellar evolution simulations with different types of accretors and obtain the radial variations of stars accreting at different rates. Since the time within which the donor loses half of the original mass has a correlation with the donor mass, we approximately obtain the mean mass transfer rate as a function of mass ratio. Assuming that the common envelope (CE) phase occurs if the accretor radius exceeds the outer Roche lobe radius, we obtain the critical mass ratio of dynamically unstable mass transfer. We find the critical mass ratios for donors filling their Roche lobes at the Main Sequence (MS) and Hertzsprung Gap (HG) stages are smaller than that derived from the radial response of the donor in the traditional way. Our results may suggest that the binary is easier to enter into the CE phase for a donor star at the MS or HG stage than previously believed. △ Less

Submitted 12 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

Comments: 7 pages,9 figures, accepted for publication in MNRAS

arXiv:2404.06046 [pdf, other]

Nuclear charge radii of germanium isotopes around $N$ = 40

Authors: S. J. Wang, A. Kanellakopoulos, X. F. Yang, S. W. Bai, J. Billowes, M. L. Bissell, K. Blaum, B. Cheal, C. S. Devlin, R. F. Garcia Ruiz, J. Z. Han, H. Heylen, S. Kaufmann, K. Konig, A. Koszorus, S. Lechner, S. Malbrunot-Ettenauer, W. Nazarewicz, R. Neugart, G. Neyens, W. Nortershauser, T. Ratajczyk, P. -G. Reinhard, L. V. Rodrıguez, S. Sels , et al. (4 additional authors not shown)

Abstract: Collinear laser spectroscopy measurements were performed on $^{68-74}$Ge isotopes ($Z = 32$) at ISOLDE-CERN, by probing the $4s^2 4p^2 \, ^3\!P_1 \rightarrow 4s^2 4p 5s \, ^3\!P_1^o$ atomic transition (269~nm) of germanium. Nuclear charge radii are determined via the measured isotope shifts, revealing a larger local variation than the neighboring isotopic chains. Nuclear density functional theory… ▽ More Collinear laser spectroscopy measurements were performed on $^{68-74}$Ge isotopes ($Z = 32$) at ISOLDE-CERN, by probing the $4s^2 4p^2 \, ^3\!P_1 \rightarrow 4s^2 4p 5s \, ^3\!P_1^o$ atomic transition (269~nm) of germanium. Nuclear charge radii are determined via the measured isotope shifts, revealing a larger local variation than the neighboring isotopic chains. Nuclear density functional theory with the Fayans functionals Fy($Δr$,HFB) and Fy(IVP), and the SV-min Skyrme describes the experimental data for the differential charge radii $δ\langle r^{2} \rangle$ and charge radii $R_{\rm c}$ within the theoretical uncertainties. The observed large variation in the charge radii of germanium isotopes is better accounted for by theoretical models incorporating ground state quadrupole correlations. This suggests that the polarization effects due to pairing and deformation contribute to the observed large odd-even staggering in the charge radii of the Ge isotopic chain. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 6 pages,5 figures

arXiv:2404.05050 [pdf, other]

Co-design Accessible Public Robots: Insights from People with Mobility Disability, Robotic Practitioners and Their Collaborations

Authors: Howard Ziyu Han, Franklin Mingzhe Li, Alesandra Baca Vazquez, Daragh Byrne, Nikolas Martelaro, Sarah E Fox

Abstract: Sidewalk robots are increasingly common across the globe. Yet, their operation on public paths poses challenges for people with mobility disabilities (PwMD) who face barriers to accessibility, such as insufficient curb cuts. We interviewed 15 PwMD to understand how they perceive sidewalk robots. Findings indicated that PwMD feel they have to compete for space on the sidewalk when robots are introd… ▽ More Sidewalk robots are increasingly common across the globe. Yet, their operation on public paths poses challenges for people with mobility disabilities (PwMD) who face barriers to accessibility, such as insufficient curb cuts. We interviewed 15 PwMD to understand how they perceive sidewalk robots. Findings indicated that PwMD feel they have to compete for space on the sidewalk when robots are introduced. We next interviewed eight robotics practitioners to learn about their attitudes towards accessibility. Practitioners described how issues often stem from robotic companies addressing accessibility only after problems arise. Both interview groups underscored the importance of integrating accessibility from the outset. Building on this finding, we held four co-design workshops with PwMD and practitioners in pairs. These convenings brought to bear accessibility needs around robots operating in public spaces and in the public interest. Our study aims to set the stage for a more inclusive future around public service robots. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.04835 [pdf, other]

A born ultramassive white dwarf-hot subdwarf super-Chandrasekhar candidate

Authors: Changqing Luo, Jiao Li, Chuanjie Zheng, Dongdong Liu, Zhenwei Li, Yang** Luo, Peter Nemeth, Bo Zhang, Bo Wang, Song Wang, Yu Bai, Qingzheng Li, Pei Wang, Zhanwen Han, Jifeng Liu, Yang Huang, Xuefei Chen, Chao Liu

Abstract: Although supernovae is a well-known endpoint of an accreting white dwarf, alternative theoretical possibilities has been discussing broadly, such as the accretion-induced collapse (AIC) event as the endpoint of oxygen-neon (ONe) white dwarfs, either accreting up to or merging to excess the Chandrasekhar limit (the maximum mass of a stable white dwarf). AIC is an important channel to form neutron s… ▽ More Although supernovae is a well-known endpoint of an accreting white dwarf, alternative theoretical possibilities has been discussing broadly, such as the accretion-induced collapse (AIC) event as the endpoint of oxygen-neon (ONe) white dwarfs, either accreting up to or merging to excess the Chandrasekhar limit (the maximum mass of a stable white dwarf). AIC is an important channel to form neutron stars, especially for those unusual systems, which are hardly produced by core-collapse supernovae. However, the observational evidences for this theoretical predicted event and its progenitor are all very limited. In all of the known progenitors, white dwarfs increase in mass by accretion. Here, we report the discovery of an intriguing binary system Lan 11, consisted of a stripped core-helium-burning hot subdwarf and an unseen compact object of 1.08 to 1.35 $M_{\odot}$. Our binary population synthesis calculations, along with the absence of detection from the deep radio observations of the Five-hundred-meter Aperture Spherical Radio Telescope, strongly suggest that the latter is an ONe white dwarf. The total mass of this binary is 1.67 to 1.92 $M_{\odot}$}, significantly excessing the Chandrasekhar limit. The reproduction of its evolutionary history indicates that the unique system has undergone two phases of common envelope ejections, implying a born nature of this massive ONe white dwarf rather than an accretion growth from its companion. These results, together with short orbital period of this binary (3.65 hours), suggest that this system will merge in 500-540 Myr, largely triggering an AIC event, although the possibility of type Ia supernova cannot be fully ruled out. This finding greatly provides valuable constraints on our understanding of stellar endpoints, whatever leading to an AIC or a supernova. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Comments: 25 pages, 14 figures

arXiv:2404.04830 [pdf]

Magneto-Induced Topological Phase Transition in Inverted InAs/GaSb Bilayers

Authors: Zhongdong Han, Tingxin Li, Long Zhang, Rui-Rui Du

Abstract: We report a magneto-induced topological phase transition in inverted InAs/GaSb bilayers from a quantum spin Hall insulator to a normal insulator. We utilize a dual-gated Corbino device in which the degree of band inversion, or equivalently the electron and hole densities, can be continuously tuned. We observe a topological phase transition around the magnetic field where a band crossing occurs, th… ▽ More We report a magneto-induced topological phase transition in inverted InAs/GaSb bilayers from a quantum spin Hall insulator to a normal insulator. We utilize a dual-gated Corbino device in which the degree of band inversion, or equivalently the electron and hole densities, can be continuously tuned. We observe a topological phase transition around the magnetic field where a band crossing occurs, that is accompanied by a bulk-gap closure characterized by a bulk conductance peak (BCP). In another set of experiments, we study the transition under a tilted magnetic field (tilt angle $θ$). We observe the characteristic magneto-conductance around BCP as a function of $θ$, which dramatically depends on the density of the bilayers. In a relatively deep-inversion (hence a higher density) regime, where the electron-hole hybridization dominates the excitonic interaction, the BCP grows with $θ$. On the contrary, in a shallowly-inverted (a lower density) regime, where the excitonic interaction dominates the hybridization, the BCP is suppressed indicating a smooth crossover without a gap closure. This suggests the existence of a low-density, correlated insulator with spontaneous symmetry breaking near the critical point. Our highly controllable electron-hole system offers an ideal platform to study interacting topological states as proposed by recent theories. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Comments: 15+15 pages, 4+9 figures

arXiv:2404.03866 [pdf, other]

Derivative Spectroscopy and its Application at Detecting the Weak Emission/Absorption Lines

Authors: Lihuan Yu, Jiangdan Li, **liang Wang, Jiajia Li, Jiao Li, Qiang Xi, Zhanwen Han

Abstract: The development of spectroscopic survey telescopes like Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST), Apache Point Observatory Galactic Evolution Experiment and Sloan Digital Sky Survey has opened up unprecedented opportunities for stellar classification. Specific types of stars, such as early-type emission-line stars and those with stellar winds, can be distinguished by the… ▽ More The development of spectroscopic survey telescopes like Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST), Apache Point Observatory Galactic Evolution Experiment and Sloan Digital Sky Survey has opened up unprecedented opportunities for stellar classification. Specific types of stars, such as early-type emission-line stars and those with stellar winds, can be distinguished by the profiles of their spectral lines. In this paper, we introduce a method based on derivative spectroscopy (DS) designed to detect signals within complex backgrounds and provide a preliminary estimation of curve profiles. This method exhibits a unique advantage in identifying weak signals and unusual spectral line profiles when compared to other popular line detection methods. We validated our approach using synthesis spectra, demonstrating that DS can detect emission signals three times fainter than Gaussian fitting. Furthermore, we applied our method to 579,680 co-added spectra from LAMOST Medium-Resolution Spectroscopic Survey, identifying 16,629 spectra with emission peaks around the Hα line from 10,963 stars. These spectra were classified into three distinct morphological groups, resulting in nine subclasses as follows. (1) Emission peak above the pseudo-continuum line (single peak, double peaks, emission peak situated within an absorption line, P Cygni profile, Inverse P Cygni profile); (2) Emission peak below the pseudo-continuum line (sharp emission peak, double absorption peaks, emission peak shifted to one side of the absorption line); (3) Emission peak between the pseudo-continuum line. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2404.03411 [pdf, ps, other]

Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?

Authors: Shuo Chen, Zhen Han, Bailan He, Zifeng Ding, Wenqian Yu, Philip Torr, Volker Tresp, **dong Gu

Abstract: Various jailbreak attacks have been proposed to red-team Large Language Models (LLMs) and revealed the vulnerable safeguards of LLMs. Besides, some methods are not limited to the textual modality and extend the jailbreak attack to Multimodal Large Language Models (MLLMs) by perturbing the visual input. However, the absence of a universal evaluation benchmark complicates the performance reproductio… ▽ More Various jailbreak attacks have been proposed to red-team Large Language Models (LLMs) and revealed the vulnerable safeguards of LLMs. Besides, some methods are not limited to the textual modality and extend the jailbreak attack to Multimodal Large Language Models (MLLMs) by perturbing the visual input. However, the absence of a universal evaluation benchmark complicates the performance reproduction and fair comparison. Besides, there is a lack of comprehensive evaluation of closed-source state-of-the-art (SOTA) models, especially MLLMs, such as GPT-4V. To address these issues, this work first builds a comprehensive jailbreak evaluation dataset with 1445 harmful questions covering 11 different safety policies. Based on this dataset, extensive red-teaming experiments are conducted on 11 different LLMs and MLLMs, including both SOTA proprietary models and open-source models. We then conduct a deep analysis of the evaluated results and find that (1) GPT4 and GPT-4V demonstrate better robustness against jailbreak attacks compared to open-source LLMs and MLLMs. (2) Llama2 and Qwen-VL-Chat are more robust compared to other open-source models. (3) The transferability of visual jailbreak methods is relatively limited compared to textual jailbreak methods. The dataset and code can be found here https://anonymous.4open.science/r/red_teaming_gpt4-C1CE/README.md . △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: technical report

arXiv:2404.02163 [pdf, other]

FastqZip: An Improved Reference-Based Genome Sequence Lossy Compression Framework

Authors: Yuanjian Liu, Huihao Luo, Zhijun Han, Yao Hu, Yehui Yang, Kyle Chard, Sheng Di, Ian Foster, Jiesheng Wu

Abstract: Storing and archiving data produced by next-generation sequencing (NGS) is a huge burden for research institutions. Reference-based compression algorithms are effective in dealing with these data. Our work focuses on compressing FASTQ format files with an improved reference-based compression algorithm to achieve a higher compression ratio than other state-of-the-art algorithms. We propose FastqZip… ▽ More Storing and archiving data produced by next-generation sequencing (NGS) is a huge burden for research institutions. Reference-based compression algorithms are effective in dealing with these data. Our work focuses on compressing FASTQ format files with an improved reference-based compression algorithm to achieve a higher compression ratio than other state-of-the-art algorithms. We propose FastqZip, which uses a new method map** the sequence to reference for compression, allows reads-reordering and lossy quality scores, and the BSC or ZPAQ algorithm to perform final lossless compression for a higher compression ratio and relatively fast speed. Our method ensures the sequence can be losslessly reconstructed while allowing lossless or lossy compression for the quality scores. We reordered the reads to get a higher compression ratio. We evaluate our algorithms on five datasets and show that FastqZip can outperform the SOTA algorithm Genozip by around 10% in terms of compression ratio while having an acceptable slowdown. △ Less

Submitted 22 February, 2024; originally announced April 2024.

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seong** Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in develo** their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2404.01158 [pdf, other]

Dialogue with Robots: Proposals for Broadening Participation and Research in the SLIVAR Community

Authors: Casey Kennington, Malihe Alikhani, Heather Pon-Barry, Katherine Atwell, Yonatan Bisk, Daniel Fried, Felix Gervits, Zhao Han, Mert Inan, Michael Johnston, Raj Korpan, Diane Litman, Matthew Marge, Cynthia Matuszek, Ross Mead, Shiwali Mohan, Raymond Mooney, Natalie Parde, Jivko Sinapov, Angela Stewart, Matthew Stone, Stefanie Tellex, Tom Williams

Abstract: The ability to interact with machines using natural human language is becoming not just commonplace, but expected. The next step is not just text interfaces, but speech interfaces and not just with computers, but with all machines including robots. In this paper, we chronicle the recent history of this growing field of spoken dialogue with robots and offer the community three proposals, the first… ▽ More The ability to interact with machines using natural human language is becoming not just commonplace, but expected. The next step is not just text interfaces, but speech interfaces and not just with computers, but with all machines including robots. In this paper, we chronicle the recent history of this growing field of spoken dialogue with robots and offer the community three proposals, the first focused on education, the second on benchmarks, and the third on the modeling of language when it comes to spoken interaction with robots. The three proposals should act as white papers for any researcher to take and build upon. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: NSF Report on the "Dialogue with Robots" Workshop held in Pittsburg, PA, April 2023

arXiv:2404.00371 [pdf, other]

doi 10.1109/TMC.2024.3383038

From Learning to Analytics: Improving Model Efficacy with Goal-Directed Client Selection

Authors: **gwen Tong, Zhenzhen Chen, Liqun Fu, Jun Zhang, Zhu Han

Abstract: Federated learning (FL) is an appealing paradigm for learning a global model among distributed clients while preserving data privacy. Driven by the demand for high-quality user experiences, evaluating the well-trained global model after the FL process is crucial. In this paper, we propose a closed-loop model analytics framework that allows for effective evaluation of the trained global model using… ▽ More Federated learning (FL) is an appealing paradigm for learning a global model among distributed clients while preserving data privacy. Driven by the demand for high-quality user experiences, evaluating the well-trained global model after the FL process is crucial. In this paper, we propose a closed-loop model analytics framework that allows for effective evaluation of the trained global model using clients' local data. To address the challenges posed by system and data heterogeneities in the FL process, we study a goal-directed client selection problem based on the model analytics framework by selecting a subset of clients for the model training. This problem is formulated as a stochastic multi-armed bandit (SMAB) problem. We first put forth a quick initial upper confidence bound (Quick-Init UCB) algorithm to solve this SMAB problem under the federated analytics (FA) framework. Then, we further propose a belief propagation-based UCB (BP-UCB) algorithm under the democratized analytics (DA) framework. Moreover, we derive two regret upper bounds for the proposed algorithms, which increase logarithmically over the time horizon. The numerical results demonstrate that the proposed algorithms achieve nearly optimal performance, with a gap of less than 1.44% and 3.12% under the FA and DA frameworks, respectively. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: This work was partly presented at IEEE ICC 2022

MSC Class: 14J60 ACM Class: I.2.7

arXiv:2404.00323 [pdf, other]

CLIP-driven Outliers Synthesis for few-shot OOD detection

Authors: Hao Sun, Rundong He, Zhongyi Han, Zhicong Lin, Yongshun Gong, Yilong Yin

Abstract: Few-shot OOD detection focuses on recognizing out-of-distribution (OOD) images that belong to classes unseen during training, with the use of only a small number of labeled in-distribution (ID) images. Up to now, a mainstream strategy is based on large-scale vision-language models, such as CLIP. However, these methods overlook a crucial issue: the lack of reliable OOD supervision information, whic… ▽ More Few-shot OOD detection focuses on recognizing out-of-distribution (OOD) images that belong to classes unseen during training, with the use of only a small number of labeled in-distribution (ID) images. Up to now, a mainstream strategy is based on large-scale vision-language models, such as CLIP. However, these methods overlook a crucial issue: the lack of reliable OOD supervision information, which can lead to biased boundaries between in-distribution (ID) and OOD. To tackle this problem, we propose CLIP-driven Outliers Synthesis~(CLIP-OS). Firstly, CLIP-OS enhances patch-level features' perception by newly proposed patch uniform convolution, and adaptively obtains the proportion of ID-relevant information by employing CLIP-surgery-discrepancy, thus achieving separation between ID-relevant and ID-irrelevant. Next, CLIP-OS synthesizes reliable OOD data by mixing up ID-relevant features from different classes to provide OOD supervision information. Afterward, CLIP-OS leverages synthetic OOD samples by unknown-aware prompt learning to enhance the separability of ID and OOD. Extensive experiments across multiple benchmarks demonstrate that CLIP-OS achieves superior few-shot OOD detection capability. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: 9 pages,5 figures

arXiv:2403.19534 [pdf, other]

Locate, Assign, Refine: Taming Customized Image Inpainting with Text-Subject Guidance

Authors: Yulin Pan, Chaojie Mao, Zeyinzi Jiang, Zhen Han, **gfeng Zhang

Abstract: Prior studies have made significant progress in image inpainting guided by either text or subject image. However, the research on editing with their combined guidance is still in the early stages. To tackle this challenge, we present LAR-Gen, a novel approach for image inpainting that enables seamless inpainting of masked scene images, incorporating both the textual prompts and specified subjects.… ▽ More Prior studies have made significant progress in image inpainting guided by either text or subject image. However, the research on editing with their combined guidance is still in the early stages. To tackle this challenge, we present LAR-Gen, a novel approach for image inpainting that enables seamless inpainting of masked scene images, incorporating both the textual prompts and specified subjects. Our approach adopts a coarse-to-fine manner to ensure subject identity preservation and local semantic coherence. The process involves (i) Locate: concatenating the noise with masked scene image to achieve precise regional editing, (ii) Assign: employing decoupled cross-attention mechanism to accommodate multi-modal guidance, and (iii) Refine: using a novel RefineNet to supplement subject details. Additionally, to address the issue of scarce training data, we introduce a novel data construction pipeline. This pipeline extracts substantial pairs of data consisting of local text prompts and corresponding visual instances from a vast image dataset, leveraging publicly available large models. Extensive experiments and varied application scenarios demonstrate the superiority of LAR-Gen in terms of both identity preservation and text semantic consistency. Project page can be found at \url{https://ali-vilab.github.io/largen-page/}. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: 22 pages, 14 figures

arXiv:2403.19432 [pdf, other]

Uncovering Misattributed Suicide Causes through Annotation Inconsistency Detection in Death Investigation Notes

Authors: Song Wang, Yiliang Zhou, Ziqiang Han, Cui Tao, Yunyu Xiao, Ying Ding, Joydeep Ghosh, Yifan Peng

Abstract: Data accuracy is essential for scientific research and policy development. The National Violent Death Reporting System (NVDRS) data is widely used for discovering the patterns and causes of death. Recent studies suggested the annotation inconsistencies within the NVDRS and the potential impact on erroneous suicide-cause attributions. We present an empirical Natural Language Processing (NLP) approa… ▽ More Data accuracy is essential for scientific research and policy development. The National Violent Death Reporting System (NVDRS) data is widely used for discovering the patterns and causes of death. Recent studies suggested the annotation inconsistencies within the NVDRS and the potential impact on erroneous suicide-cause attributions. We present an empirical Natural Language Processing (NLP) approach to detect annotation inconsistencies and adopt a cross-validation-like paradigm to identify problematic instances. We analyzed 267,804 suicide death incidents between 2003 and 2020 from the NVDRS. Our results showed that incorporating the target state's data into training the suicide-crisis classifier brought an increase of 5.4% to the F-1 score on the target state's test set and a decrease of 1.1% on other states' test set. To conclude, we demonstrated the annotation inconsistencies in NVDRS's death investigation notes, identified problematic instances, evaluated the effectiveness of correcting problematic instances, and eventually proposed an NLP improvement solution. △ Less

Submitted 29 March, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

Comments: 19 pages, 6 figures

arXiv:2403.17326 [pdf]

Unveiling the origin of unconventional moire ferroelectricity

Authors: Ruirui Niu, Zhuoxian Li, Xiangyan Han, Qianling Liu, Zhuangzhuang Qu, Zhiyu Wang, Chunrui Han, Kenji Watanabe, Takashi Taniguchi, Kaihui Liu, **hai Mao, Wu Shi, Bo Peng, Zheng Vitto Han, Zizhao Gan, Jianming Lu

Abstract: Interfacial ferroelectricity emerges in heterostructures consisting of nonpolar van der Waals (vdW) layers, greatly expanding the scope of two dimensional ferroelectrics. In particular, the unconventional moire ferroelectricity observed in bilayer graphene/boron nitride (BN) heterostructures, exhibits promising functionalities with topological current, superconductivity and synaptic responses. How… ▽ More Interfacial ferroelectricity emerges in heterostructures consisting of nonpolar van der Waals (vdW) layers, greatly expanding the scope of two dimensional ferroelectrics. In particular, the unconventional moire ferroelectricity observed in bilayer graphene/boron nitride (BN) heterostructures, exhibits promising functionalities with topological current, superconductivity and synaptic responses. However, the debate about its mechanism - correlation driven charge transfer between two graphene layers - limits device reproducibility and hence large-scale production. Here by designing a single-layer graphene encapsulated by lattice-mismatched WSe2, we identify the ferroelectricity as stemming from - instead of graphene moire bands - the particular BN, where interfacial sliding ferroelectricity must play a role. With similar structures, multilayer twisted MoS2 is found to reproduce the ferroelectricity. The key is a conductive moire ferroelectric, where the screened gate and the pinned domain wall together result in unchanged electronic states, i.e. anomalous screening. The intimate connection to interfacial sliding ferroelectricity thus provides advantages of diverse choices of constituent materials and robust polarization switching while preserving the unique anomalous screening, paving the way to reproducible and reliable memory-based devices in artificial intelligence. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.16003 [pdf, other]

Diverse Representation Embedding for Lifelong Person Re-Identification

Authors: Shiben Liu, Huijie Fan, Qiang Wang, Xiai Chen, Zhi Han, Yandong Tang

Abstract: Lifelong Person Re-Identification (LReID) aims to continuously learn from successive data streams, matching individuals across multiple cameras. The key challenge for LReID is how to effectively preserve old knowledge while incrementally learning new information, which is caused by task-level domain gaps and limited old task datasets. Existing methods based on CNN backbone are insufficient to expl… ▽ More Lifelong Person Re-Identification (LReID) aims to continuously learn from successive data streams, matching individuals across multiple cameras. The key challenge for LReID is how to effectively preserve old knowledge while incrementally learning new information, which is caused by task-level domain gaps and limited old task datasets. Existing methods based on CNN backbone are insufficient to explore the representation of each instance from different perspectives, limiting model performance on limited old task datasets and new task datasets. Unlike these methods, we propose a Diverse Representations Embedding (DRE) framework that first explores a pure transformer for LReID. The proposed DRE preserves old knowledge while adapting to new information based on instance-level and task-level layout. Concretely, an Adaptive Constraint Module (ACM) is proposed to implement integration and push away operations between multiple overlap** representations generated by transformer-based backbone, obtaining rich and discriminative representations for each instance to improve adaptive ability of LReID. Based on the processed diverse representations, we propose Knowledge Update (KU) and Knowledge Preservation (KP) strategies at the task-level layout by introducing the adjustment model and the learner model. KU strategy enhances the adaptive learning ability of learner models for new information under the adjustment model prior, and KP strategy preserves old knowledge operated by representation-level alignment and logit-level supervision in limited old task datasets while guaranteeing the adaptive learning information capacity of the LReID model. Compared to state-of-the-art methods, our method achieves significantly improved performance in holistic, large-scale, and occluded datasets. △ Less

Submitted 2 April, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

Comments: 11 pages,7 Tables,3 Figures

arXiv:2403.14608 [pdf, other]

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

Authors: Zeyu Han, Chao Gao, **yang Liu, Jeff Zhang, Sai Qian Zhang

Abstract: Large models represent a groundbreaking advancement in multiple application fields, enabling remarkable achievements across various tasks. However, their unprecedented scale comes with significant computational costs. These models, often consisting of billions of parameters, require vast amounts of computational resources for execution. Especially, the expansive scale and computational demands pos… ▽ More Large models represent a groundbreaking advancement in multiple application fields, enabling remarkable achievements across various tasks. However, their unprecedented scale comes with significant computational costs. These models, often consisting of billions of parameters, require vast amounts of computational resources for execution. Especially, the expansive scale and computational demands pose considerable challenges when customizing them for particular downstream tasks, particularly over the hardware platforms constrained by computational capabilities. Parameter Efficient Fine-Tuning (PEFT) provides a practical solution by efficiently adapt the large models over the various downstream tasks. In particular, PEFT refers to the process of adjusting the parameters of a pre-trained large models to adapt it to a specific task while minimizing the number of additional parameters introduced or computational resources required. This approach is particularly important when dealing with large language models with high parameter counts, as fine-tuning these models from scratch can be computationally expensive and resource-intensive, posing considerable challenges in the supporting system platform design. In this survey, we present comprehensive studies of various PEFT algorithms, examining their performance and computational overhead. Moreover, we provide an overview of applications developed using different PEFT algorithms and discuss common techniques employed to mitigate computation costs for PEFT. In addition to the algorithmic perspective, we overview various real-world system designs to investigate the implementation costs associated with different PEFT algorithms. This survey serves as an indispensable resource for researchers aiming to understand both the PEFT algorithm and its system implementation, offering detailed insights into recent advancements and practical applications. △ Less

Submitted 29 April, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Comments: 24 pages, 12 figures

arXiv:2403.12771 [pdf, other]

TYC 3340-2437-1: A Quadruple System with A Massive Star

Authors: Jiao Li, Chao Liu, Changqing Luo, Bo Zhang, Jiang-Dan Li, Jia-Dong Li, Zhan-Wen Han, Xue-Fei Chen, Lu-Qian Wang, Min Fang, Li-Feng Xing, Xi-Liang Zhang, Chichuan **

Abstract: Hierarchical massive quadruple systems are ideal laboratories for examining the theories of star formation, dynamical evolution, and stellar evolution. The successive mergers of hierarchical quadruple systems might explain the mass gap between neutron stars and black holes. Looking for light curves of O-type binaries identified by LAMOST, we find a (2+2) quadruple system: TYC 3340-2437-1, located… ▽ More Hierarchical massive quadruple systems are ideal laboratories for examining the theories of star formation, dynamical evolution, and stellar evolution. The successive mergers of hierarchical quadruple systems might explain the mass gap between neutron stars and black holes. Looking for light curves of O-type binaries identified by LAMOST, we find a (2+2) quadruple system: TYC 3340-2437-1, located in the stellar bow-shock nebula (SBN). It has a probability of over 99.99\% being a quadruple system derived from the surface density of the vicinity stars. Its inner orbital periods are 3.390602(89) days and 2.4378(16) days, respectively, and the total mass is about (11.47 + 5.79) + (5.2 + 2.02) = 24.48 $M_{\odot}$. The line-of-sight inclinations of the inner binaries, B$_1$ and B$_2$, are 55.94 and 78.2 degrees, respectively, indicating that they are not co-planar. Based on observations spanning 34 months and the significance of the astrometric excess noise ($D>2$) in Gaia DR3 data, we guess that its outer orbital period might be a few years. If it were true, the quadruple system might form through the disk fragmentation mechanism with outer eccentric greater than zero. This eccentricity could be the cause of both the arc-like feature of the SBN and the noncoplanarity of the inner orbit. The outer orbital period and outer eccentric could be determined with the release of future epoch astrometric data of Gaia. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.12361 [pdf]

Multi-State, Ultra-thin, BEOL-Compatible AlScN Ferroelectric Diodes

Authors: Kwan-Ho Kim, Zirun Han, Yinuo Zhang, Pariasadat Musavigharavi, Jeffrey Zheng, Dhiren K. Pradhan, Eric A. Stach, Roy H. Olsson III, Deep Jariwala

Abstract: The growth in data generation necessitates efficient data processing technologies to address the von Neumann bottleneck in conventional computer architecture. Memory-driven computing, which integrates non-volatile memory (NVM) devices in a 3D stack, is gaining attention, with CMOS back-end-of-line (BEOL) compatible ferroelectric (FE) diodes being ideal due to their two-terminal design and inherent… ▽ More The growth in data generation necessitates efficient data processing technologies to address the von Neumann bottleneck in conventional computer architecture. Memory-driven computing, which integrates non-volatile memory (NVM) devices in a 3D stack, is gaining attention, with CMOS back-end-of-line (BEOL) compatible ferroelectric (FE) diodes being ideal due to their two-terminal design and inherently selector-free nature, facilitating high-density crossbar arrays. Here, we demonstrate BEOL-compatible, high-performance FE-diodes scaled to 5, 10, and 20 nm FE Al0.72Sc0.28N/Al0.64Sc0.36N films. Through interlayer (IL) engineering, we show substantial improvements in the ON/OFF ratios (>166 times) and rectification ratios (>176 times) in these scaled devices. The superlative characteristics also enables 5-bit multi-state operation with a stable retention. We also experimentally and theoretically demonstrate the counterintuitive result that the inclusion of an IL can lead to a decrease in the ferroelectric switching voltage of the device. An in-depth analysis into the device transport mechanisms is performed, and our compact model aligns seamlessly with the experimental results. Our results suggest the possibility of using scaled AlxSc1-xN FE-diodes for high performance, low-power, embedded NVM. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.11071 [pdf, other]

Wavenumber Domain Sparse Channel Estimation in Holographic MIMO

Authors: Xufeng Guo, Yuanbin Chen, Ying Wang, Zhaocheng Wang, Zhu Han

Abstract: In this paper, we investigate the sparse channel estimation in holographic multiple-input multiple-output (HMIMO) systems. The conventional angular-domain representation fails to capture the continuous angular power spectrum characterized by the spatially-stationary electromagnetic random field, thus leading to the ambiguous detection of the significant angular power, which is referred to as the p… ▽ More In this paper, we investigate the sparse channel estimation in holographic multiple-input multiple-output (HMIMO) systems. The conventional angular-domain representation fails to capture the continuous angular power spectrum characterized by the spatially-stationary electromagnetic random field, thus leading to the ambiguous detection of the significant angular power, which is referred to as the power leakage. To tackle this challenge, the HMIMO channel is represented in the wavenumber domain for exploring its cluster-dominated sparsity. Specifically, a finite set of Fourier harmonics acts as a series of sampling probes to encapsulate the integral of the power spectrum over specific angular regions. This technique effectively eliminates power leakage resulting from power mismatches induced by the use of discrete angular-domain probes. Next, the channel estimation problem is recast as a sparse recovery of the significant angular power spectrum over the continuous integration region. We then propose an accompanying graph-cut-based swap expansion (GCSE) algorithm to extract beneficial sparsity inherent in HMIMO channels. Numerical results demonstrate that this wavenumber-domainbased GCSE approach achieves robust performance with rapid convergence. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: This paper has been accepted in 2024 ICC

arXiv:2403.08931 [pdf, ps, other]

Unleashing the True Power of Age-of-Information: Service Aggregation in Connected and Autonomous Vehicles

Authors: Anik Mallik, Dawei Chen, Kyungtae Han, Jiang Xie, Zhu Han

Abstract: Connected and autonomous vehicles (CAVs) rely heavily upon time-sensitive information update services to ensure the safety of people and assets, and satisfactory entertainment applications. Therefore, the freshness of information is a crucial performance metric for CAV services. However, information from roadside sensors and nearby vehicles can get delayed in transmission due to the high mobility… ▽ More Connected and autonomous vehicles (CAVs) rely heavily upon time-sensitive information update services to ensure the safety of people and assets, and satisfactory entertainment applications. Therefore, the freshness of information is a crucial performance metric for CAV services. However, information from roadside sensors and nearby vehicles can get delayed in transmission due to the high mobility of vehicles. Our research shows that a CAV's relative distance and speed play an essential role in determining the Age-of-Information (AoI). With an increase in AoI, incremental service aggregation issues are observed with out-of-sequence information updates, which hampers the performance of low-latency applications in CAVs. In this paper, we propose a novel AoI-based service aggregation method for CAVs, which can process the information updates according to their update cycles. First, the AoI for sensors and vehicles is modeled, and a predictive AoI system is designed. Then, to reduce the overall service aggregation time and computational load, intervals are used for periodic AoI prediction, and information sources are clustered based on the AoI value. Finally, the system aggregates services for CAV applications using the predicted AoI. We evaluate the system performance based on data sequencing success rate (DSSR) and overall system latency. Lastly, we compare the performance of our proposed system with three other state-of-the-art methods. The evaluation and comparison results show that our proposed predictive AoI-based service aggregation system maintains satisfactory latency and DSSR for CAV applications and outperforms other existing methods. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 6 pages, 8 figures, to appear in the Proceedings of IEEE International Conference on Communications (IEEE ICC, 9-13 June 2024, Denver, CO, USA)

arXiv:2403.07252 [pdf, ps, other]

Serre functors and complete torsion pairs

Authors: Zhe Han, ** He

Abstract: Given a torsion pair $(\mathcal{T},\mathcal{F})$ in an abelian category $\mathcal{A}$, there is a t-structure $(\mathcal{U}_\mathcal{T},\mathcal{V}_\mathcal{T})$ determined by $\mathcal{T}$ on the derived category $D^b(\mathcal{A})$. The existence of derived equivalence between heart $\mathcal{B}$ of the t-structure and $\mathcal{A}$ which naturally extends the embedding… ▽ More Given a torsion pair $(\mathcal{T},\mathcal{F})$ in an abelian category $\mathcal{A}$, there is a t-structure $(\mathcal{U}_\mathcal{T},\mathcal{V}_\mathcal{T})$ determined by $\mathcal{T}$ on the derived category $D^b(\mathcal{A})$. The existence of derived equivalence between heart $\mathcal{B}$ of the t-structure and $\mathcal{A}$ which naturally extends the embedding $\mathcal{B}\to D^b(\mathcal{A})$ is determined by the completeness of the torsion pair [6]. When $\mathcal{A}$ is the module category of a finite-dimensional hereditary algebra and $\mathcal{U}_\mathcal{T}$ is closed under Serre functor, then there exists a triangle equivalence $D^b(\mathcal{B})\to D^b(\mathcal{A})$ [21]. In this case, we give a straightforward proof of the fact torsion pair $(\mathcal{T},\mathcal{F})$ is complete if and only if $\mathcal{U}_\mathcal{T}$ is closed under the Serre functor. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: 18pages

arXiv:2403.06927 [pdf]

Effective multiband synthetic four-wave mixing by cascading quadratic processes

Authors: Li Chen, Zheng Ge, Su-Jian Niu, Yin-Hai Li, Zhao-Qi-Zhi Han, Yue-Wei Song, Wu-Zhen Li, Ren-Hui Chen, Ming-Yuan Gao, Meng-Yu Xie, Zhi-Yuan Zhou, Bao-Sen Shi

Abstract: Four wave mixing (FWM) is an important way to generate supercontinuum and frequency combs in the mid-infrared band. Here, we obtain simultaneous synthetic FWM in the visible and mid-infrared bands by cascading quadratic nonlinear processes in a periodically poled lithium niobate crystal (PPLN), which has a 110dB(at 3000nm) higher conversion efficiency than the FWM directly generated by third-order… ▽ More Four wave mixing (FWM) is an important way to generate supercontinuum and frequency combs in the mid-infrared band. Here, we obtain simultaneous synthetic FWM in the visible and mid-infrared bands by cascading quadratic nonlinear processes in a periodically poled lithium niobate crystal (PPLN), which has a 110dB(at 3000nm) higher conversion efficiency than the FWM directly generated by third-order susceptibilities in bulk PPLN crystals. A general model of this process is developed that is in full agreement with the experimental verifications. The frequency difference between the new frequency components can be freely tuned by changing the frequency difference of the dual pump lasers. Furthermore, by increasing the conversion bandwidth and efficiency of the cascaded processes, it is feasible to generate frequency combs in three bands the visible, near-infrared and mid-infrared bands simultaneously through high-order cascaded processes. This work opens up a new avenue toward free-tuning multiband frequency comb generation with multi-octaves frequency spanning, which will have significant applications in fields such as mid-infrared gas sensing, lidar and precision spectroscopy. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.06388 [pdf, other]

A Zero Trust Framework for Realization and Defense Against Generative AI Attacks in Power Grid

Authors: Md. Shirajum Munir, Sravanthi Proddatoori, Manjushree Muralidhara, Walid Saad, Zhu Han, Sachin Shetty

Abstract: Understanding the potential of generative AI (GenAI)-based attacks on the power grid is a fundamental challenge that must be addressed in order to protect the power grid by realizing and validating risk in new attack vectors. In this paper, a novel zero trust framework for a power grid supply chain (PGSC) is proposed. This framework facilitates early detection of potential GenAI-driven attack vect… ▽ More Understanding the potential of generative AI (GenAI)-based attacks on the power grid is a fundamental challenge that must be addressed in order to protect the power grid by realizing and validating risk in new attack vectors. In this paper, a novel zero trust framework for a power grid supply chain (PGSC) is proposed. This framework facilitates early detection of potential GenAI-driven attack vectors (e.g., replay and protocol-type attacks), assessment of tail risk-based stability measures, and mitigation of such threats. First, a new zero trust system model of PGSC is designed and formulated as a zero-trust problem that seeks to guarantee for a stable PGSC by realizing and defending against GenAI-driven cyber attacks. Second, in which a domain-specific generative adversarial networks (GAN)-based attack generation mechanism is developed to create a new vulnerability cyberspace for further understanding that threat. Third, tail-based risk realization metrics are developed and implemented for quantifying the extreme risk of a potential attack while leveraging a trust measurement approach for continuous validation. Fourth, an ensemble learning-based bootstrap aggregation scheme is devised to detect the attacks that are generating synthetic identities with convincing user and distributed energy resources device profiles. Experimental results show the efficacy of the proposed zero trust framework that achieves an accuracy of 95.7% on attack vector generation, a risk measure of 9.61% for a 95% stable PGSC, and a 99% confidence in defense against GenAI-driven attack. △ Less

Submitted 10 March, 2024; originally announced March 2024.

arXiv:2403.05826 [pdf, other]

Cached Model-as-a-Resource: Provisioning Large Language Model Agents for Edge Intelligence in Space-air-ground Integrated Networks

Authors: Minrui Xu, Dusit Niyato, Hongliang Zhang, Jiawen Kang, Zehui Xiong, Shiwen Mao, Zhu Han

Abstract: Edge intelligence in space-air-ground integrated networks (SAGINs) can enable worldwide network coverage beyond geographical limitations for users to access ubiquitous and low-latency intelligence services. Facing global coverage and complex environments in SAGINs, edge intelligence can provision approximate large language models (LLMs) agents for users via edge servers at ground base stations (BS… ▽ More Edge intelligence in space-air-ground integrated networks (SAGINs) can enable worldwide network coverage beyond geographical limitations for users to access ubiquitous and low-latency intelligence services. Facing global coverage and complex environments in SAGINs, edge intelligence can provision approximate large language models (LLMs) agents for users via edge servers at ground base stations (BSs) or cloud data centers relayed by satellites. As LLMs with billions of parameters are pre-trained on vast datasets, LLM agents have few-shot learning capabilities, e.g., chain-of-thought (CoT) prompting for complex tasks, which raises a new trade-off between resource consumption and performance in SAGINs. In this paper, we propose a joint caching and inference framework for edge intelligence to provision sustainable and ubiquitous LLM agents in SAGINs. We introduce "cached model-as-a-resource" for offering LLMs with limited context windows and propose a novel optimization framework, i.e., joint model caching and inference, to utilize cached model resources for provisioning LLM agent services along with communication, computing, and storage resources. We design "age of thought" (AoT) considering the CoT prompting of LLMs, and propose a least AoT cached model replacement algorithm for optimizing the provisioning cost. We propose a deep Q-network-based modified second-bid (DQMSB) auction to incentivize network operators, which can enhance allocation efficiency by 23% while guaranteeing strategy-proofness and free from adverse selection. △ Less

Submitted 31 May, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

arXiv:2403.05793 [pdf, ps, other]

Performance Bounds for Passive Sensing in Asynchronous ISAC Systems -- Appendices

Authors: **gbo Zhao, Zhaoming Lu, J. Andrew Zhang, Weicai Li, Yifeng Xiong, Zijun Han, Xiangming Wen, Tao Gu

Abstract: This document contains the appendices for our paper titled ``Performance Bounds for Passive Sensing in Asynchronous ISAC Systems." The appendices include rigorous derivations of key formulas, detailed proofs of the theorems and propositions introduced in the paper, and details of the algorithm tested in the numerical simulation for validation. These appendices aim to support and elaborate on the f… ▽ More This document contains the appendices for our paper titled ``Performance Bounds for Passive Sensing in Asynchronous ISAC Systems." The appendices include rigorous derivations of key formulas, detailed proofs of the theorems and propositions introduced in the paper, and details of the algorithm tested in the numerical simulation for validation. These appendices aim to support and elaborate on the findings and methodologies presented in the main text. All external references to equations, theorems, and so forth, are directed towards the corresponding elements within the main paper. △ Less

Submitted 29 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

Comments: 5 pages

arXiv:2403.05567 [pdf, other]

A Unified Framework for Underwater Metaverse with Optical Perception

Authors: **gyang Cao, Mu Zhou, Jiacheng Wang, Guangyuan Liu, Dusit Niyato, Shiwen Mao, Zhu Han, Jiawen Kang

Abstract: With the advancement of AI technology and increasing attention to deep-sea exploration, the underwater Metaverse is gradually emerging. This paper explores the concept of underwater Metaverse, emerging virtual reality systems and services aimed at simulating and enhancing virtual experience of marine environments. First, we discuss potential applications of underwater Metaverse in underwater scien… ▽ More With the advancement of AI technology and increasing attention to deep-sea exploration, the underwater Metaverse is gradually emerging. This paper explores the concept of underwater Metaverse, emerging virtual reality systems and services aimed at simulating and enhancing virtual experience of marine environments. First, we discuss potential applications of underwater Metaverse in underwater scientific research and marine conservation. Next, we present the architecture and supporting technologies of the underwater Metaverse, including high-resolution underwater imageing technologies and image processing technologies for rendering a realistic virtual world. Based on this, we present a use case for building a realistic underwater virtual world using underwater quantum imaging-generated artificial intelligence (QI-GAI) technology. The results demonstrate the effectiveness of the underwater Metaverse framework in simulating complex underwater environments, thus validating its potential in providing high-quality, interactive underwater virtual experiences. Finally, the paper examines the future development directions of underwater Metaverse, and provides new perspectives for marine science and conservation. △ Less

Submitted 20 February, 2024; originally announced March 2024.

arXiv:2403.02977 [pdf, other]

Fast Iterative Region Inflation for Computing Large 2-D/3-D Convex Regions of Obstacle-Free Space

Authors: Qianhao Wang, Zhepei Wang, Mingyang Wang, Jialin Ji, Zhichao Han, Tianyue Wu, Rui **, Yuman Gao, Chao Xu, Fei Gao

Abstract: Convex polytopes have compact representations and exhibit convexity, which makes them suitable for abstracting obstacle-free spaces from various environments. Existing methods for generating convex polytopes always struggle to strike a balance between two requirements, producing high-quality polytope and efficiency. Moreover, another crucial requirement for convex polytopes to accurately contain c… ▽ More Convex polytopes have compact representations and exhibit convexity, which makes them suitable for abstracting obstacle-free spaces from various environments. Existing methods for generating convex polytopes always struggle to strike a balance between two requirements, producing high-quality polytope and efficiency. Moreover, another crucial requirement for convex polytopes to accurately contain certain seed point sets, such as a robot or a front-end path, is proposed in various tasks, which we refer to as manageability. In this paper, we show that we can achieve generation of high-quality convex polytope while ensuring both efficiency and manageability simultaneously, by introducing Fast Iterative Regional Inflation (FIRI).FIRI consists of two iteratively executed submodules: Restrictive Inflation (RsI) and computation of the Maximum Volume Inscribed Ellipsoid (MVIE) of convex polytope. By explicitly incorporating constraints that include the seed point set, RsI guarantees manageability. Meanwhile, the iterative monotonic optimization of MVIE, which serves as a lower bound of the volume of convex polytope, ensures high-quality results of FIRI. In terms of efficiency, we design methods tailored to the low-dimensional and multi-constrained nature of both modules, resulting in orders of magnitude improvement compared to generic solvers. Notably, for 2-D MVIE, we present a novel analytical algorithm that achieves linear-time complexity for the first time, further enhancing the efficiency of FIRI in the 2-D scenario. Extensive benchmarks conducted against state-of-the-art methods validate the superior performance of FIRI in terms of quality, manageability, and efficiency. Furthermore, various real-world applications showcase the generality and practicality of FIRI. The high-performance code of FIRI will be open-sourced for the reference of the community. △ Less

Submitted 6 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

arXiv:2402.18404 [pdf]

Polarization entanglement by two simultaneous backward phase-matching processes in a single crystal

Authors: Ming-Yuan Gao, Yin-Hai Li, Zhao-Qi-Zhi Han, Qiang Zhou, Guang-Can Guo, Zhi-Yuan Zhou, Bao-Sen Shi

Abstract: Entanglement enables many promising applications in quantum technology. Devising new generation methods and harnessing entanglement are prerequisites for practical applications. Here we realize a distinct polarization-entangled source by simultaneously achieving type-0 and type-I backward quasi-phase matching (BQPM) through spontaneous parametric down-conversion in a single bulk crystal, which is… ▽ More Entanglement enables many promising applications in quantum technology. Devising new generation methods and harnessing entanglement are prerequisites for practical applications. Here we realize a distinct polarization-entangled source by simultaneously achieving type-0 and type-I backward quasi-phase matching (BQPM) through spontaneous parametric down-conversion in a single bulk crystal, which is different from all previous entangled-source configurations. Pum** the crystal with a single polarized beam generates a non-maximally polarization-entangled state, which can be further projected to a maximal Bell state with a pair of Brewster windows. Hong-Ou-Mandel interference experiments are done on polarization-degenerate photon pairs for both type-0 and type-I BQPM processes for the first time. The emitted photons in both processes have a bandwidth as narrow as 15.7 GHz. The high quality of this source is characterized by various methods. The rather simple configuration, narrow bandwidth, and high entanglement quality make the source very promising for many quantum information tasks. △ Less

Submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.17401 [pdf]

Quantum entanglement enabled ellipsometer for phase retardance measurement

Authors: Meng-Yu Xie, Su-Jian Niu, Yin-Hai Li, Zheng Ge, Ming-Yuan Gao, Zhao-Qi-Zhi Han, Ren-Hui Chen, Zhi-Yuan Zhou, Bao-Sen Shi

Abstract: An ellipsometer is a vital precision tool used for measuring optical parameters with wide applications in many fields, including accurate measurements in film thickness, optical constants, structural profiles, etc. However, the precise measurement of photosensitive materials meets huge obstacles because of the excessive input photons, therefore the requirement of enhancing detection accuracy under… ▽ More An ellipsometer is a vital precision tool used for measuring optical parameters with wide applications in many fields, including accurate measurements in film thickness, optical constants, structural profiles, etc. However, the precise measurement of photosensitive materials meets huge obstacles because of the excessive input photons, therefore the requirement of enhancing detection accuracy under low incident light intensity is an essential topic in the precision measurement. In this work, by combining a polarization-entangled photon source with a classical transmission-type ellipsometer, the quantum ellipsometer with the PSA (Polarizer-Sample-Analyzer) and the Senarmount method is constructed firstly to measure the phase retardation of the birefringent materials. The experimental results show that the accuracy can reach to nanometer scale at extremely low input intensity, and the stability are within 1% for all specimens tested with a compensator involved. Our work paves the way for precision measurement at low incident light intensity, with potential applications in measuring photosensitive materials, active-biological samples and other remote monitoring scenarios. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 13 pages, 5 figures. This work has been submitted for possible publication

arXiv:2402.14899 [pdf, other]

Stop Reasoning! When Multimodal LLMs with Chain-of-Thought Reasoning Meets Adversarial Images

Authors: Zefeng Wang, Zhen Han, Shuo Chen, Fan Xue, Zifeng Ding, Xun Xiao, Volker Tresp, Philip Torr, **dong Gu

Abstract: Recently, Multimodal LLMs (MLLMs) have shown a great ability to understand images. However, like traditional vision models, they are still vulnerable to adversarial images. Meanwhile, Chain-of-Thought (CoT) reasoning has been widely explored on MLLMs, which not only improves model's performance, but also enhances model's explainability by giving intermediate reasoning steps. Nevertheless, there is… ▽ More Recently, Multimodal LLMs (MLLMs) have shown a great ability to understand images. However, like traditional vision models, they are still vulnerable to adversarial images. Meanwhile, Chain-of-Thought (CoT) reasoning has been widely explored on MLLMs, which not only improves model's performance, but also enhances model's explainability by giving intermediate reasoning steps. Nevertheless, there is still a lack of study regarding MLLMs' adversarial robustness with CoT and an understanding of what the rationale looks like when MLLMs infer wrong answers with adversarial images. Our research evaluates the adversarial robustness of MLLMs when employing CoT reasoning, finding that CoT marginally improves adversarial robustness against existing attack methods. Moreover, we introduce a novel stop-reasoning attack technique that effectively bypasses the CoT-induced robustness enhancements. Finally, we demonstrate the alterations in CoT reasoning when MLLMs confront adversarial images, shedding light on their reasoning process under adversarial attacks. △ Less

Submitted 18 March, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

Showing 51–100 of 1,406 results for author: Han, Z