Search | arXiv e-print repository

Penetrative AI: Making LLMs Comprehend the Physical World

Authors: Huatao Xu, Liying Han, Qirui Yang, Mo Li, Mani Srivastava

Abstract: Recent developments in Large Language Models (LLMs) have demonstrated their remarkable capabilities across a range of tasks. Questions, however, persist about the nature of LLMs and their potential to integrate common-sense human knowledge when performing tasks involving information about the real physical world. This paper delves into these questions by exploring how LLMs can be extended to inter… ▽ More Recent developments in Large Language Models (LLMs) have demonstrated their remarkable capabilities across a range of tasks. Questions, however, persist about the nature of LLMs and their potential to integrate common-sense human knowledge when performing tasks involving information about the real physical world. This paper delves into these questions by exploring how LLMs can be extended to interact with and reason about the physical world through IoT sensors and actuators, a concept that we term "Penetrative AI". The paper explores such an extension at two levels of LLMs' ability to penetrate into the physical world via the processing of sensory signals. Our preliminary findings indicate that LLMs, with ChatGPT being the representative example in our exploration, have considerable and unique proficiency in employing the embedded world knowledge for interpreting IoT sensor data and reasoning over them about tasks in the physical realm. Not only this opens up new applications for LLMs beyond traditional text-based tasks, but also enables new ways of incorporating human knowledge in cyber-physical systems. △ Less

Submitted 12 June, 2024; v1 submitted 14 October, 2023; originally announced October 2023.

Comments: ACL Findings 2024

arXiv:2310.09568 [pdf, other]

Wafer-scale Computing: Advancements, Challenges, and Future Perspectives

Authors: Yang Hu, Xinhan Lin, Huizheng Wang, Zhen He, Xingmao Yu, Jiahao Zhang, Qize Yang, Zheng Xu, Sihan Guan, Jiahao Fang, Haoran Shang, Xinru Tang, Xu Dai, Shaojun Wei, Shouyi Yin

Abstract: Nowadays, artificial intelligence (AI) technology with large models plays an increasingly important role in both academia and industry. It also brings a rapidly increasing demand for the computing power of the hardware. As the computing demand for AI continues to grow, the growth of hardware computing power has failed to keep up. This has become a significant factor restricting the development of… ▽ More Nowadays, artificial intelligence (AI) technology with large models plays an increasingly important role in both academia and industry. It also brings a rapidly increasing demand for the computing power of the hardware. As the computing demand for AI continues to grow, the growth of hardware computing power has failed to keep up. This has become a significant factor restricting the development of AI. The augmentation of hardware computing power is mainly propelled by the escalation of transistor density and chip area. However, the former is impeded by the termination of the Moore's Law and Dennard scaling, and the latter is significantly restricted by the challenge of disrupting the legacy fabrication equipment and process. In recent years, advanced packaging technologies that have gradually matured are increasingly used to implement bigger chips that integrate multiple chiplets, while still providing interconnections with chip-level density and bandwidth. Compared to conventional high-performance computing paradigms such as multi-accelerator and datacenter-scale computing, Wafer-scale Computing shows remarkable advantages in communication bandwidth, integration density, and programmability potential. Not surprisingly, disruptive Wafer-scale Computing also brings unprecedented design challenges for hardware architecture, design-system-technology co-optimization, power and cooling systems, and compiler tool chain. At present, there are no comprehensive surveys summarizing the current state and design insights of Wafer-scale Computing. This paper aims to take the first step to help academia and industry review existing wafer-scale chips and essential technologies in a one-stop manner. So that people can conveniently grasp the basic knowledge and key points, understand the achievements and shortcomings of existing research, and contribute to this promising research direction. △ Less

Submitted 14 October, 2023; originally announced October 2023.

ACM Class: B.7.0; C.1

arXiv:2310.08459 [pdf, other]

A Survey of Heterogeneous Transfer Learning

Authors: Runxue Bao, Yiming Sun, Yuhe Gao, **dong Wang, Qiang Yang, Haifeng Chen, Zhi-Hong Mao, Ye Ye

Abstract: The application of transfer learning, an approach utilizing knowledge from a source domain to enhance model performance in a target domain, has seen a tremendous rise in recent years, underpinning many real-world scenarios. The key to its success lies in the shared common knowledge between the domains, a prerequisite in most transfer learning methodologies. These methods typically presuppose ident… ▽ More The application of transfer learning, an approach utilizing knowledge from a source domain to enhance model performance in a target domain, has seen a tremendous rise in recent years, underpinning many real-world scenarios. The key to its success lies in the shared common knowledge between the domains, a prerequisite in most transfer learning methodologies. These methods typically presuppose identical feature spaces and label spaces in both domains, known as homogeneous transfer learning, which, however, is not always a practical assumption. Oftentimes, the source and target domains vary in feature spaces, data distributions, and label spaces, making it challenging or costly to secure source domain data with identical feature and label spaces as the target domain. Arbitrary elimination of these differences is not always feasible or optimal. Thus, heterogeneous transfer learning, acknowledging and dealing with such disparities, has emerged as a promising approach for a variety of tasks. Despite the existence of a survey in 2017 on this topic, the fast-paced advances post-2017 necessitate an updated, in-depth review. We therefore present a comprehensive survey of recent developments in heterogeneous transfer learning methods, offering a systematic guide for future research. Our paper reviews methodologies for diverse learning scenarios, discusses the limitations of current studies, and covers various application contexts, including Natural Language Processing, Computer Vision, Multimodality, and Biomedicine, to foster a deeper understanding and spur future research. △ Less

Submitted 15 October, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

arXiv:2310.07744 [pdf, other]

Terrain-adaptive Central Pattern Generators with Reinforcement Learning for Hexapod Locomotion

Authors: Qiyue Yang, Yue Gao, Shaoyuan Li

Abstract: Inspired by biological motion generation, central pattern generators (CPGs) is frequently employed in legged robot locomotion control to produce natural gait pattern with low-dimensional control signals. However, the limited adaptability and stability over complex terrains hinder its application. To address this issue, this paper proposes a terrain-adaptive locomotion control method that incorpora… ▽ More Inspired by biological motion generation, central pattern generators (CPGs) is frequently employed in legged robot locomotion control to produce natural gait pattern with low-dimensional control signals. However, the limited adaptability and stability over complex terrains hinder its application. To address this issue, this paper proposes a terrain-adaptive locomotion control method that incorporates deep reinforcement learning (DRL) framework into CPG, where the CPG model is responsible for the generation of synchronized signals, providing basic locomotion gait, while DRL is integrated to enhance the adaptability of robot towards uneven terrains by adjusting the parameters of CPG map** functions. The experiments conducted on the hexapod robot in Isaac Gym simulation environment demonstrated the superiority of the proposed method in terrain-adaptability, convergence rate and reward design complexity. △ Less

Submitted 11 October, 2023; originally announced October 2023.

arXiv:2310.07602 [pdf, other]

Dual Radar: A Multi-modal Dataset with Dual 4D Radar for Autonomous Driving

Authors: Xinyu Zhang, Li Wang, Jian Chen, Cheng Fang, Lei Yang, Ziying Song, Guangqi Yang, Yichen Wang, Xiaofei Zhang, Jun Li, Zhiwei Li, Qingshan Yang, Zhenlin Zhang, Shuzhi Sam Ge

Abstract: Radar has stronger adaptability in adverse scenarios for autonomous driving environmental perception compared to widely adopted cameras and LiDARs. Compared with commonly used 3D radars, the latest 4D radars have precise vertical resolution and higher point cloud density, making it a highly promising sensor for autonomous driving in complex environmental perception. However, due to the much higher… ▽ More Radar has stronger adaptability in adverse scenarios for autonomous driving environmental perception compared to widely adopted cameras and LiDARs. Compared with commonly used 3D radars, the latest 4D radars have precise vertical resolution and higher point cloud density, making it a highly promising sensor for autonomous driving in complex environmental perception. However, due to the much higher noise than LiDAR, manufacturers choose different filtering strategies, resulting in an inverse ratio between noise level and point cloud density. There is still a lack of comparative analysis on which method is beneficial for deep learning-based perception algorithms in autonomous driving. One of the main reasons is that current datasets only adopt one type of 4D radar, making it difficult to compare different 4D radars in the same scene. Therefore, in this paper, we introduce a novel large-scale multi-modal dataset featuring, for the first time, two types of 4D radars captured simultaneously. This dataset enables further research into effective 4D radar perception algorithms.Our dataset consists of 151 consecutive series, most of which last 20 seconds and contain 10,007 meticulously synchronized and annotated frames. Moreover, our dataset captures a variety of challenging driving scenarios, including many road conditions, weather conditions, nighttime and daytime with different lighting intensities and periods. Our dataset annotates consecutive frames, which can be applied to 3D object detection and tracking, and also supports the study of multi-modal tasks. We experimentally validate our dataset, providing valuable results for studying different types of 4D radars. This dataset is released on https://github.com/adept-thu/Dual-Radar. △ Less

Submitted 9 November, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

arXiv:2310.06333 [pdf, ps, other]

Learning bounded-degree polytrees with known skeleton

Authors: Davin Choo, Joy Qi** Yang, Arnab Bhattacharyya, Clément L. Canonne

Abstract: We establish finite-sample guarantees for efficient proper learning of bounded-degree polytrees, a rich class of high-dimensional probability distributions and a subclass of Bayesian networks, a widely-studied type of graphical model. Recently, Bhattacharyya et al. (2021) obtained finite-sample guarantees for recovering tree-structured Bayesian networks, i.e., 1-polytrees. We extend their results… ▽ More We establish finite-sample guarantees for efficient proper learning of bounded-degree polytrees, a rich class of high-dimensional probability distributions and a subclass of Bayesian networks, a widely-studied type of graphical model. Recently, Bhattacharyya et al. (2021) obtained finite-sample guarantees for recovering tree-structured Bayesian networks, i.e., 1-polytrees. We extend their results by providing an efficient algorithm which learns $d$-polytrees in polynomial time and sample complexity for any bounded $d$ when the underlying undirected graph (skeleton) is known. We complement our algorithm with an information-theoretic sample complexity lower bound, showing that the dependence on the dimension and target accuracy parameters are nearly tight. △ Less

Submitted 21 January, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

Comments: Fixed some typos. Added some discussions. Accepted to ALT 2024

arXiv:2310.05143 [pdf, other]

ZooPFL: Exploring Black-box Foundation Models for Personalized Federated Learning

Authors: Wang Lu, Hao Yu, **dong Wang, Damien Teney, Haohan Wang, Yiqiang Chen, Qiang Yang, Xing Xie, Xiangyang Ji

Abstract: When personalized federated learning (FL) meets large foundation models, new challenges arise from various limitations in resources. In addition to typical limitations such as data, computation, and communication costs, access to the models is also often limited. This paper endeavors to solve both the challenges of limited resources and personalization. i.e., distribution shifts between clients. T… ▽ More When personalized federated learning (FL) meets large foundation models, new challenges arise from various limitations in resources. In addition to typical limitations such as data, computation, and communication costs, access to the models is also often limited. This paper endeavors to solve both the challenges of limited resources and personalization. i.e., distribution shifts between clients. To do so, we propose a method named ZOOPFL that uses Zeroth-Order Optimization for Personalized Federated Learning. ZOOPFL avoids direct interference with the foundation models and instead learns to adapt its inputs through zeroth-order optimization. In addition, we employ simple yet effective linear projections to remap its predictions for personalization. To reduce the computation costs and enhance personalization, we propose input surgery to incorporate an auto-encoder with low-dimensional and client-specific embeddings. We provide theoretical support for ZOOPFL to analyze its convergence. Extensive empirical experiments on computer vision and natural language processing tasks using popular foundation models demonstrate its effectiveness for FL on black-box foundation models. △ Less

Submitted 8 October, 2023; originally announced October 2023.

Comments: Technical report; 26 pages; code will be available at https://github.com/microsoft/PersonalizedFL

arXiv:2310.03049 [pdf, other]

QuATON: Quantization Aware Training of Optical Neurons

Authors: Hasindu Kariyawasam, Ramith Hettiarachchi, Quansan Yang, Alex Matlock, Takahiro Nambara, Hiroyuki Kusaka, Yuichiro Kunai, Peter T C So, Edward S Boyden, Dushan Wadduwage

Abstract: Optical processors, built with "optical neurons", can efficiently perform high-dimensional linear operations at the speed of light. Thus they are a promising avenue to accelerate large-scale linear computations. With the current advances in micro-fabrication, such optical processors can now be 3D fabricated, but with a limited precision. This limitation translates to quantization of learnable para… ▽ More Optical processors, built with "optical neurons", can efficiently perform high-dimensional linear operations at the speed of light. Thus they are a promising avenue to accelerate large-scale linear computations. With the current advances in micro-fabrication, such optical processors can now be 3D fabricated, but with a limited precision. This limitation translates to quantization of learnable parameters in optical neurons, and should be handled during the design of the optical processor in order to avoid a model mismatch. Specifically, optical neurons should be trained or designed within the physical-constraints at a predefined quantized precision level. To address this critical issues we propose a physics-informed quantization-aware training framework. Our approach accounts for physical constraints during the training process, leading to robust designs. We demonstrate that our approach can design state of the art optical processors using diffractive networks for multiple physics based tasks despite quantized learnable parameters. We thus lay the foundation upon which improved optical processors may be 3D fabricated in the future. △ Less

Submitted 21 March, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

arXiv:2310.02922 [pdf, other]

doi 10.1007/s11128-023-03859-9

Public verifiable measurement-only blind quantum computation based on entanglement witnesses

Authors: Wen-Jie Liu, Zi-Xian Li, Wen-Bo Li, Qi Yang

Abstract: Recently, Sato et al. proposed an public verifiable blind quantum computation (BQC) protocol by inserting a third-party arbiter. However, it is not true public verifiable in a sense, because the arbiter is determined in advance and participates in the whole process. In this paper, a public verifiable protocol for measurement-only BQC is proposed. The fidelity between arbitrary states and the graph… ▽ More Recently, Sato et al. proposed an public verifiable blind quantum computation (BQC) protocol by inserting a third-party arbiter. However, it is not true public verifiable in a sense, because the arbiter is determined in advance and participates in the whole process. In this paper, a public verifiable protocol for measurement-only BQC is proposed. The fidelity between arbitrary states and the graph states of 2-colorable graphs is estimated by measuring the entanglement witnesses of the graph states,so as to verify the correctness of the prepared graph states. Compared with the previous protocol, our protocol is public verifiable in the true sense by allowing other random clients to execute the public verification. It also has greater advantages in the efficiency, where the number of local measurements is O(n^3*log {n}) and graph states' copies is O(n^2*log{n}). △ Less

Submitted 3 October, 2023; originally announced October 2023.

Comments: 17 pages, 5 figures

Journal ref: Quantum Information Processing, 2023. 22(3): p. 137

arXiv:2310.01320 [pdf, other]

Avalon's Game of Thoughts: Battle Against Deception through Recursive Contemplation

Authors: Shenzhi Wang, Chang Liu, Zilong Zheng, Siyuan Qi, Shuo Chen, Qisen Yang, Andrew Zhao, Chaofei Wang, Shiji Song, Gao Huang

Abstract: Recent breakthroughs in large language models (LLMs) have brought remarkable success in the field of LLM-as-Agent. Nevertheless, a prevalent assumption is that the information processed by LLMs is consistently honest, neglecting the pervasive deceptive or misleading information in human society and AI-generated content. This oversight makes LLMs susceptible to malicious manipulations, potentially… ▽ More Recent breakthroughs in large language models (LLMs) have brought remarkable success in the field of LLM-as-Agent. Nevertheless, a prevalent assumption is that the information processed by LLMs is consistently honest, neglecting the pervasive deceptive or misleading information in human society and AI-generated content. This oversight makes LLMs susceptible to malicious manipulations, potentially resulting in detrimental outcomes. This study utilizes the intricate Avalon game as a testbed to explore LLMs' potential in deceptive environments. Avalon, full of misinformation and requiring sophisticated logic, manifests as a "Game-of-Thoughts". Inspired by the efficacy of humans' recursive thinking and perspective-taking in the Avalon game, we introduce a novel framework, Recursive Contemplation (ReCon), to enhance LLMs' ability to identify and counteract deceptive information. ReCon combines formulation and refinement contemplation processes; formulation contemplation produces initial thoughts and speech, while refinement contemplation further polishes them. Additionally, we incorporate first-order and second-order perspective transitions into these processes respectively. Specifically, the first-order allows an LLM agent to infer others' mental states, and the second-order involves understanding how others perceive the agent's mental state. After integrating ReCon with different LLMs, extensive experiment results from the Avalon game indicate its efficacy in aiding LLMs to discern and maneuver around deceptive information without extra fine-tuning and data. Finally, we offer a possible explanation for the efficacy of ReCon and explore the current limitations of LLMs in terms of safety, reasoning, speaking style, and format, potentially furnishing insights for subsequent research. △ Less

Submitted 24 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: 40 pages

arXiv:2310.00907 [pdf, other]

The Participatory Turn in AI Design: Theoretical Foundations and the Current State of Practice

Authors: Fernando Delgado, Stephen Yang, Michael Madaio, Qian Yang

Abstract: Despite the growing consensus that stakeholders affected by AI systems should participate in their design, enormous variation and implicit disagreements exist among current approaches. For researchers and practitioners who are interested in taking a participatory approach to AI design and development, it remains challenging to assess the extent to which any participatory approach grants substantiv… ▽ More Despite the growing consensus that stakeholders affected by AI systems should participate in their design, enormous variation and implicit disagreements exist among current approaches. For researchers and practitioners who are interested in taking a participatory approach to AI design and development, it remains challenging to assess the extent to which any participatory approach grants substantive agency to stakeholders. This article thus aims to ground what we dub the "participatory turn" in AI design by synthesizing existing theoretical literature on participation and through empirical investigation and critique of its current practices. Specifically, we derive a conceptual framework through synthesis of literature across technology design, political theory, and the social sciences that researchers and practitioners can leverage to evaluate approaches to participation in AI design. Additionally, we articulate empirical findings concerning the current state of participatory practice in AI design based on an analysis of recently published research and semi-structured interviews with 12 AI researchers and practitioners. We use these empirical findings to understand the current state of participatory practice and subsequently provide guidance to better align participatory goals and methods in a way that accounts for practical constraints. △ Less

Submitted 2 October, 2023; originally announced October 2023.

arXiv:2310.00890 [pdf]

doi 10.1002/adma.202313742

Femtosecond electron diffraction reveals local disorder and local anharmonicity in thermoelectric SnSe

Authors: **gjun Li, Yingpeng Qi, Qing Yang, Luye Yue, Changyuan Yao, Zi**g Chen, Sheng Meng, Dao Xiang, Jianming Cao

Abstract: The microscopic arrangement of atoms and molecules is the determining factor in how materials behave and perform. Beyond the long-range periodicity, the local disorder with local structures deviating from the average lattice structure plays a vital role in determining the physical properties of the phonon, electron and spin subsystems in crystalline functional materials. Experimentally characteriz… ▽ More The microscopic arrangement of atoms and molecules is the determining factor in how materials behave and perform. Beyond the long-range periodicity, the local disorder with local structures deviating from the average lattice structure plays a vital role in determining the physical properties of the phonon, electron and spin subsystems in crystalline functional materials. Experimentally characterizing the 3D atomic configuration of such local disorder and correlating it with the advanced functions remain a big challenge. Time-domain evolution of the local disorder, either static or dynamical, is lost due to the characterization at equilibrium state with conventional probing techniques. With the combination of femtosecond electron diffraction, structure factor calculation and TDDFT-MD simulation, we exclusively identify the static local disorder and the local anharmonicity of it in thermoelectric SnSe. The ultrafast structural dynamics in time domain reveal a dominant static off-symmetry displacement of Sn (~0.4 angstrom) and the anharmonicity of this local disorder induces an ultrafast atomic displacement within 100 fs after photoexcitation. The microscopic picture of the local anharmonicity indicates a direct and first signature of the THz Einstein oscillators in real space. Therefore, a glass-like thermal transport channel with the local disorder, the Einstein oscillators and the local anharmonicity, updates the fundamental insight into the long-debated ultralow thermal conductivity in SnSe. The local disorder over one to a few unit cells is pervasive and indispensable in thermoelectric materials, multiferroic materials and correlated electronic materials. Our method of revealing the 3D local disorder and the local correlated interactions by ultrafast structural dynamics will inspire broad interest in construction of the structure-property relationship in material science. △ Less

Submitted 2 October, 2023; originally announced October 2023.

Report number: 2313742

Journal ref: Adv. Mater. 2313742 (2024)

arXiv:2310.00730 [pdf]

EventLFM: Event Camera integrated Fourier Light Field Microscopy for Ultrafast 3D imaging

Authors: Ruipeng Guo, Qianwan Yang, Andrew S. Chang, Guorong Hu, Joseph Greene, Christopher V. Gabel, Sixian You, Lei Tian

Abstract: Ultrafast 3D imaging is indispensable for visualizing complex and dynamic biological processes. Conventional scanning-based techniques necessitate an inherent trade-off between acquisition speed and space-bandwidth product (SBP). Emerging single-shot 3D wide-field techniques offer a promising alternative but are bottlenecked by the synchronous readout constraints of conventional CMOS systems, thus… ▽ More Ultrafast 3D imaging is indispensable for visualizing complex and dynamic biological processes. Conventional scanning-based techniques necessitate an inherent trade-off between acquisition speed and space-bandwidth product (SBP). Emerging single-shot 3D wide-field techniques offer a promising alternative but are bottlenecked by the synchronous readout constraints of conventional CMOS systems, thus restricting data throughput to maintain high SBP at limited frame rates. To address this, we introduce EventLFM, a straightforward and cost-effective system that overcomes these challenges by integrating an event camera with Fourier light field microscopy (LFM), a state-of-the-art single-shot 3D wide-field imaging technique. The event camera operates on a novel asynchronous readout architecture, thereby bypassing the frame rate limitations inherent to conventional CMOS systems. We further develop a simple and robust event-driven LFM reconstruction algorithm that can reliably reconstruct 3D dynamics from the unique spatiotemporal measurements captured by EventLFM. Experimental results demonstrate that EventLFM can robustly reconstruct fast-moving and rapidly blinking 3D fluorescent samples at kHz frame rates. Furthermore, we highlight EventLFM's capability for imaging of blinking neuronal signals in scattering mouse brain tissues and 3D tracking of GFP-labeled neurons in freely moving C. elegans. We believe that the combined ultrafast speed and large 3D SBP offered by EventLFM may open up new possibilities across many biomedical applications. △ Less

Submitted 3 April, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

arXiv:2310.00593 [pdf, other]

Nonlinear Multi-Carrier System with Signal Clip**: Measurement, Analysis, and Optimization

Authors: Yuyang Du, Liang Hao, Yiming Lei, Qun Yang, Shiqi Xu

Abstract: Signal clip** is a classic technique for reducing peak-to-average power ratio (PAPR) in orthogonal frequency division multiplexing (OFDM) systems. It has been widely applied in consumer electronic devices owing to its low complexity and high efficiency. Although clip** reduces the nonlinear distortion caused by power amplifiers (PAs), it induces additional clip** distortion. Optimizing the j… ▽ More Signal clip** is a classic technique for reducing peak-to-average power ratio (PAPR) in orthogonal frequency division multiplexing (OFDM) systems. It has been widely applied in consumer electronic devices owing to its low complexity and high efficiency. Although clip** reduces the nonlinear distortion caused by power amplifiers (PAs), it induces additional clip** distortion. Optimizing the joint system performance with consideration of both PA nonlinearity and clip** distortion remains an open problem due to the complex PA modeling. In this paper, we analyze the PA nonlinearity through the Bessel-Fourier PA (BFPA) model and simplify its power expression using inter-modulation product (IMP) analysis. We derive expressions of the receiver signal-to-noise ratio (SNR) and system symbol error rate (SER) for the nonlinear clipped OFDM system. With the derivations, we investigate the optimal system setting to achieve the SER lower bound in a practical OFDM system that considers both PA nonlinearity and clip** distortion. The methods and results presented in this paper can serve as a useful reference for the system-level optimization of clipped OFDM systems with nonlinear PA. △ Less

Submitted 16 February, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

arXiv:2310.00199 [pdf, other]

DeformUX-Net: Exploring a 3D Foundation Backbone for Medical Image Segmentation with Depthwise Deformable Convolution

Authors: Ho Hin Lee, Quan Liu, Qi Yang, Xin Yu, Shunxing Bao, Yuankai Huo, Bennett A. Landman

Abstract: The application of 3D ViTs to medical image segmentation has seen remarkable strides, somewhat overshadowing the budding advancements in Convolutional Neural Network (CNN)-based models. Large kernel depthwise convolution has emerged as a promising technique, showcasing capabilities akin to hierarchical transformers and facilitating an expansive effective receptive field (ERF) vital for dense predi… ▽ More The application of 3D ViTs to medical image segmentation has seen remarkable strides, somewhat overshadowing the budding advancements in Convolutional Neural Network (CNN)-based models. Large kernel depthwise convolution has emerged as a promising technique, showcasing capabilities akin to hierarchical transformers and facilitating an expansive effective receptive field (ERF) vital for dense predictions. Despite this, existing core operators, ranging from global-local attention to large kernel convolution, exhibit inherent trade-offs and limitations (e.g., global-local range trade-off, aggregating attentional features). We hypothesize that deformable convolution can be an exploratory alternative to combine all advantages from the previous operators, providing long-range dependency, adaptive spatial aggregation and computational efficiency as a foundation backbone. In this work, we introduce 3D DeformUX-Net, a pioneering volumetric CNN model that adeptly navigates the shortcomings traditionally associated with ViTs and large kernel convolution. Specifically, we revisit volumetric deformable convolution in depth-wise setting to adapt long-range dependency with computational efficiency. Inspired by the concepts of structural re-parameterization for convolution kernel weights, we further generate the deformable tri-planar offsets by adapting a parallel branch (starting from $1\times1\times1$ convolution), providing adaptive spatial aggregation across all channels. Our empirical evaluations reveal that the 3D DeformUX-Net consistently outperforms existing state-of-the-art ViTs and large kernel convolution models across four challenging public datasets, spanning various scales from organs (KiTS: 0.680 to 0.720, MSD Pancreas: 0.676 to 0.717, AMOS: 0.871 to 0.902) to vessels (e.g., MSD hepatic vessels: 0.635 to 0.671) in mean Dice. △ Less

Submitted 3 October, 2023; v1 submitted 29 September, 2023; originally announced October 2023.

Comments: 14 pages, the source code with our pre-trained model is available at this https://github.com/MASILab/deform-uxnet

arXiv:2310.00015 [pdf, other]

Semantic Communication with Probability Graph: A Joint Communication and Computation Design

Authors: Zhouxiang Zhao, Zhaohui Yang, Quoc-Viet Pham, Qianqian Yang, Zhaoyang Zhang

Abstract: In this paper, we present a probability graph-based semantic information compression system for scenarios where the base station (BS) and the user share common background knowledge. We employ probability graphs to represent the shared knowledge between the communicating parties. During the transmission of specific text data, the BS first extracts semantic information from the text, which is repres… ▽ More In this paper, we present a probability graph-based semantic information compression system for scenarios where the base station (BS) and the user share common background knowledge. We employ probability graphs to represent the shared knowledge between the communicating parties. During the transmission of specific text data, the BS first extracts semantic information from the text, which is represented by a knowledge graph. Subsequently, the BS omits certain relational information based on the shared probability graph to reduce the data size. Upon receiving the compressed semantic data, the user can automatically restore missing information using the shared probability graph and predefined rules. This approach brings additional computational resource consumption while effectively reducing communication resource consumption. Considering the limitations of wireless resources, we address the problem of joint communication and computation resource allocation design, aiming at minimizing the total communication and computation energy consumption of the network while adhering to latency, transmit power, and semantic constraints. Simulation results demonstrate the effectiveness of the proposed system. △ Less

Submitted 5 October, 2023; v1 submitted 16 September, 2023; originally announced October 2023.

arXiv:2309.17293 [pdf, other]

doi 10.1007/s10773-023-05382-0

Quantum Privacy-preserving Two-party Circle Intersection Protocol Based on Phase-encoded Query

Authors: Zi-Xian Li, Qi Yang, Bao Feng, Wen-Jie Liu

Abstract: Privacy-preserving geometric intersection (PGI) is an important issue in Secure multiparty computation (SMC). The existing quantum PGI protocols are mainly based on grid coding, which requires a lot of computational complexity. The phase-encoded query method which has been used in some Quantum SMC protocols is suitable to solve the decision problem, but it needs to apply high dimensional Oracle op… ▽ More Privacy-preserving geometric intersection (PGI) is an important issue in Secure multiparty computation (SMC). The existing quantum PGI protocols are mainly based on grid coding, which requires a lot of computational complexity. The phase-encoded query method which has been used in some Quantum SMC protocols is suitable to solve the decision problem, but it needs to apply high dimensional Oracle operators. In this paper, we use the principle of phase-encoded query to solve an important PGI problem, namely privacy-preserving two-party circle intersection. We study the implementation of Oracle operator in detail, and achieve polynomial computational complexity by decompsing it into quantum arithmetic operations. Performance analysis shows that our protocol is correct and efficient, and can protect the privacy of all participants against internal and external attacks. △ Less

Submitted 29 September, 2023; originally announced September 2023.

Comments: 16 pages, 2 figures

Journal ref: International Journal of Theoretical Physics,2023.62(7):p.138

arXiv:2309.16713 [pdf, other]

UAV-assisted Semantic Communication with Hybrid Action Reinforcement Learning

Authors: Peiyuan Si, Jun Zhao, Kwok-Yan Lam, Qing Yang

Abstract: In this paper, we aim to explore the use of uplink semantic communications with the assistance of UAV in order to improve data collection effiicency for metaverse users in remote areas. To reduce the time for uplink data collection while balancing the trade-off between reconstruction quality and computational energy cost, we propose a hybrid action reinforcement learning (RL) framework to make dec… ▽ More In this paper, we aim to explore the use of uplink semantic communications with the assistance of UAV in order to improve data collection effiicency for metaverse users in remote areas. To reduce the time for uplink data collection while balancing the trade-off between reconstruction quality and computational energy cost, we propose a hybrid action reinforcement learning (RL) framework to make decisions on semantic model scale, channel allocation, transmission power, and UAV trajectory. The variables are classified into discrete type and continuous type, which are optimized by two different RL agents to generate the combined action. Simulation results indicate that the proposed hybrid action reinforcement learning framework can effectively improve the efficiency of uplink semantic data collection under different parameter settings and outperforms the benchmark scenarios. △ Less

Submitted 1 December, 2023; v1 submitted 18 August, 2023; originally announced September 2023.

Comments: This paper appears in IEEE Global Communications Conference (GLOBECOM) 2023

arXiv:2309.16622 [pdf, other]

doi 10.1016/j.physletb.2024.138601

Results on Elastic Cross Sections in Proton-Proton Collisions at $\sqrt{s} = 510$ GeV with the STAR Detector at RHIC

Authors: STAR Collaboration, M. I. Abdulhamid, B. E. Aboona, J. Adam, L. Adamczyk, J. R. Adams, I. Aggarwal, M. M. Aggarwal, Z. Ahammed, E. C. Aschenauer, S. Aslam, J. Atchison, V. Bairathi, J. G. Ball Cap, K. Barish, R. Bellwied, P. Bhagat, A. Bhasin, S. Bhatta, S. R. Bhosale, J. Bielcik, J. Bielcikova, J. D. Brandenburg, C. Broodo, X. Z. Cai , et al. (343 additional authors not shown)

Abstract: We report results on an elastic cross section measurement in proton-proton collisions at a center-of-mass energy $\sqrt{s}=510$ GeV, obtained with the Roman Pot setup of the STAR experiment at the Relativistic Heavy Ion Collider (RHIC). The elastic differential cross section is measured in the four-momentum transfer squared range $0.23 \leq -t \leq 0.67$ GeV$^2$. We find that a constant slope $B$… ▽ More We report results on an elastic cross section measurement in proton-proton collisions at a center-of-mass energy $\sqrt{s}=510$ GeV, obtained with the Roman Pot setup of the STAR experiment at the Relativistic Heavy Ion Collider (RHIC). The elastic differential cross section is measured in the four-momentum transfer squared range $0.23 \leq -t \leq 0.67$ GeV$^2$. We find that a constant slope $B$ does not fit the data in the aforementioned $t$ range, and we obtain a much better fit using a second-order polynomial for $B(t)$. The $t$ dependence of $B$ is determined using six subintervals of $t$ in the STAR measured $t$ range, and is in good agreement with the phenomenological models. The measured elastic differential cross section $\mathrm{d}σ/\mathrm{dt}$ agrees well with the results obtained at $\sqrt{s} = 546$ GeV for proton--antiproton collisions by the UA4 experiment. We also determine that the integrated elastic cross section within the STAR $t$-range is $σ^\mathrm{fid}_\mathrm{el} = 462.1 \pm 0.9 (\mathrm{stat.}) \pm 1.1 (\mathrm {syst.}) \pm 11.6 (\mathrm {scale})$~$μ\mathrm{b}$. △ Less

Submitted 6 May, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

Comments: 9 pages, 9 figures Version as published in Physics Letters B. HEPDATA: https://www.hepdata.net/record/144920

Journal ref: Physics Letters B, Volume 852, May 2024, 138601

arXiv:2309.16441 [pdf, other]

On symbology and differential equations of Feynman integrals from Schubert analysis

Authors: Song He, Xuhang Jiang, Jiahao Liu, Qinglin Yang

Abstract: We take the first step in generalizing the so-called "Schubert analysis", originally proposed in twistor space for four-dimensional kinematics, to the study of symbol letters and more detailed information on canonical differential equations for Feynman integral families in general dimensions with general masses. The basic idea is to work in embedding space and compute possible cross-ratios built f… ▽ More We take the first step in generalizing the so-called "Schubert analysis", originally proposed in twistor space for four-dimensional kinematics, to the study of symbol letters and more detailed information on canonical differential equations for Feynman integral families in general dimensions with general masses. The basic idea is to work in embedding space and compute possible cross-ratios built from (Lorentz products of) maximal cut solutions for all integrals in the family. We demonstrate the power of the method using the most general one-loop integrals, as well as various two-loop planar integral families (such as sunrise, double-triangle and double-box) in general dimensions. Not only can we obtain all symbol letters as cross-ratios from maximal-cut solutions, but we also reproduce entries in the canonical differential equations satisfied by a basis of dlog integrals. △ Less

Submitted 28 September, 2023; originally announced September 2023.

Comments: 51 pages, many figures

arXiv:2309.16071 [pdf, other]

Influence Pathway Discovery on Social Media

Authors: Xinyi Liu, Ruijie Wang, Dachun Sun, **ning Li, Christina Youn, You Lyu, Jianyuan Zhan, Dayou Wu, Xinhe Xu, Mingjun Liu, Xinshuo Lei, Zhihao Xu, Yutong Zhang, Zehao Li, Qikai Yang, Tarek Abdelzaher

Abstract: This paper addresses influence pathway discovery, a key emerging problem in today's online media. We propose a discovery algorithm that leverages recently published work on unsupervised interpretable ideological embedding, a map** of ideological beliefs (done in a self-supervised fashion) into interpretable low-dimensional spaces. Computing the ideological embedding at scale allows one to analyz… ▽ More This paper addresses influence pathway discovery, a key emerging problem in today's online media. We propose a discovery algorithm that leverages recently published work on unsupervised interpretable ideological embedding, a map** of ideological beliefs (done in a self-supervised fashion) into interpretable low-dimensional spaces. Computing the ideological embedding at scale allows one to analyze correlations between the ideological positions of leaders, influencers, news portals, or population segments, deriving potential influence pathways. The work is motivated by the importance of social media as the preeminent means for global interactions and collaborations on today's Internet, as well as their frequent (mis-)use to wield influence that targets social beliefs and attitudes of selected populations. Tools that enable the understanding and map** of influence propagation through population segments on social media are therefore increasingly important. In this paper, influence is measured by the perceived ideological shift over time that is correlated with influencers' activity. Correlated shifts in ideological embeddings indicate changes, such as swings/switching (among competing ideologies), polarization (depletion of neutral ideological positions), escalation/radicalization (shifts to more extreme versions of the ideology), or unification/cooldown (shifts towards more neutral stances). Case-studies are presented to explore selected influence pathways (i) in a recent French election, (ii) during political discussions in the Philippines, and (iii) for some Russian messaging during the Russia/Ukraine conflict. △ Less

Submitted 27 September, 2023; originally announced September 2023.

Comments: This paper is accepted by IEEE CIC as an invited vision paper

arXiv:2309.15675 [pdf, other]

SJTU-TMQA: A quality assessment database for static mesh with texture map

Authors: Bingyang Cui, Qi Yang, Kaifa Yang, Yiling Xu, Xiaozhong Xu, Shan Liu

Abstract: In recent years, static meshes with texture maps have become one of the most prevalent digital representations of 3D shapes in various applications, such as animation, gaming, medical imaging, and cultural heritage applications. However, little research has been done on the quality assessment of textured meshes, which hinders the development of quality-oriented applications, such as mesh compressi… ▽ More In recent years, static meshes with texture maps have become one of the most prevalent digital representations of 3D shapes in various applications, such as animation, gaming, medical imaging, and cultural heritage applications. However, little research has been done on the quality assessment of textured meshes, which hinders the development of quality-oriented applications, such as mesh compression and enhancement. In this paper, we create a large-scale textured mesh quality assessment database, namely SJTU-TMQA, which includes 21 reference meshes and 945 distorted samples. The meshes are rendered into processed video sequences and then conduct subjective experiments to obtain mean opinion scores (MOS). The diversity of content and accuracy of MOS has been shown to validate its heterogeneity and reliability. The impact of various types of distortion on human perception is demonstrated. 13 state-of-the-art objective metrics are evaluated on SJTU-TMQA. The results report the highest correlation of around 0.6, indicating the need for more effective objective metrics. The SJTU-TMQA is available at https://ccccby.github.io △ Less

Submitted 27 September, 2023; originally announced September 2023.

arXiv:2309.14997 [pdf, other]

IAIFNet: An Illumination-Aware Infrared and Visible Image Fusion Network

Authors: Qiao Yang, Yu Zhang, Zi**g Zhao, Jian Zhang, Shunli Zhang

Abstract: Infrared and visible image fusion (IVIF) is used to generate fusion images with comprehensive features of both images, which is beneficial for downstream vision tasks. However, current methods rarely consider the illumination condition in low-light environments, and the targets in the fused images are often not prominent. To address the above issues, we propose an Illumination-Aware Infrared and V… ▽ More Infrared and visible image fusion (IVIF) is used to generate fusion images with comprehensive features of both images, which is beneficial for downstream vision tasks. However, current methods rarely consider the illumination condition in low-light environments, and the targets in the fused images are often not prominent. To address the above issues, we propose an Illumination-Aware Infrared and Visible Image Fusion Network, named as IAIFNet. In our framework, an illumination enhancement network first estimates the incident illumination maps of input images. Afterwards, with the help of proposed adaptive differential fusion module (ADFM) and salient target aware module (STAM), an image fusion network effectively integrates the salient features of the illumination-enhanced infrared and visible images into a fusion image of high visual quality. Extensive experimental results verify that our method outperforms five state-of-the-art methods of fusing infrared and visible images. △ Less

Submitted 26 May, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

Comments: Accept by IEEE Signal Processing Letters

arXiv:2309.14745 [pdf, other]

SSPFusion: A Semantic Structure-Preserving Approach for Infrared and Visible Image Fusion

Authors: Qiao Yang, Yu Zhang, Jian Zhang, Zi**g Zhao, Shunli Zhang, **qiao Wang, Junzhe Chen

Abstract: Most existing learning-based infrared and visible image fusion (IVIF) methods exhibit massive redundant information in the fusion images, i.e., yielding edge-blurring effect or unrecognizable for object detectors. To alleviate these issues, we propose a semantic structure-preserving approach for IVIF, namely SSPFusion. At first, we design a Structural Feature Extractor (SFE) to extract the structu… ▽ More Most existing learning-based infrared and visible image fusion (IVIF) methods exhibit massive redundant information in the fusion images, i.e., yielding edge-blurring effect or unrecognizable for object detectors. To alleviate these issues, we propose a semantic structure-preserving approach for IVIF, namely SSPFusion. At first, we design a Structural Feature Extractor (SFE) to extract the structural features of infrared and visible images. Then, we introduce a multi-scale Structure-Preserving Fusion (SPF) module to fuse the structural features of infrared and visible images, while maintaining the consistency of semantic structures between the fusion and source images. Owing to these two effective modules, our method is able to generate high-quality fusion images from pairs of infrared and visible images, which can boost the performance of downstream computer-vision tasks. Experimental results on three benchmarks demonstrate that our method outperforms eight state-of-the-art image fusion methods in terms of both qualitative and quantitative evaluations. The code for our method, along with additional comparison results, will be made available at: https://github.com/QiaoYang-CV/SSPFUSION. △ Less

Submitted 26 December, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

arXiv:2309.14220 [pdf, other]

doi 10.1103/PhysRevD.109.012004

Longitudinal and transverse spin transfer to $Λ$ and $\overlineΛ$ hyperons in polarized $p$+$p$ collisions at $\sqrt{s} = 200$ GeV

Authors: STAR Collaboration, M. I. Abdulhamid, B. E. Aboona, J. Adam, L. Adamczyk, J. R. Adams, I. Aggarwal, M. M. Aggarwal, Z. Ahammed, D. M. Anderson, E. C. Aschenauer, S. Aslam, J. Atchison, V. Bairathi, W. Baker, J. G. Ball Cap, K. Barish, R. Bellwied, P. Bhagat, A. Bhasin, S. Bhatta, J. Bielcik, J. Bielcikova, J. D. Brandenburg, X. Z. Cai , et al. (357 additional authors not shown)

Abstract: The longitudinal and transverse spin transfers to $Λ$ ($\overlineΛ$) hyperons in polarized proton-proton collisions are expected to be sensitive to the helicity and transversity distributions, respectively, of (anti-)strange quarks in the proton, and to the corresponding polarized fragmentation functions. We report improved measurements of the longitudinal spin transfer coefficient, $D_{LL}$, and… ▽ More The longitudinal and transverse spin transfers to $Λ$ ($\overlineΛ$) hyperons in polarized proton-proton collisions are expected to be sensitive to the helicity and transversity distributions, respectively, of (anti-)strange quarks in the proton, and to the corresponding polarized fragmentation functions. We report improved measurements of the longitudinal spin transfer coefficient, $D_{LL}$, and the transverse spin transfer coefficient, $D_{TT}$, to $Λ$ and $\overlineΛ$ in polarized proton-proton collisions at $\sqrt{s}$ = 200 GeV by the STAR experiment at RHIC. The data set includes longitudinally polarized proton-proton collisions with an integrated luminosity of 52 pb$^{-1}$, and transversely polarized proton-proton collisions with a similar integrated luminosity. Both data sets have about twice the statistics of previous results and cover a kinematic range of $|η_{Λ(\overlineΛ)}|$ $<$ 1.2 and transverse momentum $p_{T,{Λ(\overlineΛ)}}$ up to 8 GeV/$c$. We also report the first measurements of the hyperon spin transfer coefficients $D_{LL}$ and $D_{TT}$ as a function of the fractional jet momentum $z$ carried by the hyperon, which can provide more direct constraints on the polarized fragmentation functions. △ Less

Submitted 7 December, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

arXiv:2309.13930 [pdf, other]

SAMN: A Sample Attention Memory Network Combining SVM and NN in One Architecture

Authors: Qiaoling Yang, Linkai Luo, Haoyu Zhang, Hong Peng, Ziyang Chen

Abstract: Support vector machine (SVM) and neural networks (NN) have strong complementarity. SVM focuses on the inner operation among samples while NN focuses on the operation among the features within samples. Thus, it is promising and attractive to combine SVM and NN, as it may provide a more powerful function than SVM or NN alone. However, current work on combining them lacks true integration. To address… ▽ More Support vector machine (SVM) and neural networks (NN) have strong complementarity. SVM focuses on the inner operation among samples while NN focuses on the operation among the features within samples. Thus, it is promising and attractive to combine SVM and NN, as it may provide a more powerful function than SVM or NN alone. However, current work on combining them lacks true integration. To address this, we propose a sample attention memory network (SAMN) that effectively combines SVM and NN by incorporating sample attention module, class prototypes, and memory block to NN. SVM can be viewed as a sample attention machine. It allows us to add a sample attention module to NN to implement the main function of SVM. Class prototypes are representatives of all classes, which can be viewed as alternatives to support vectors. The memory block is used for the storage and update of class prototypes. Class prototypes and memory block effectively reduce the computational cost of sample attention and make SAMN suitable for multi-classification tasks. Extensive experiments show that SAMN achieves better classification performance than single SVM or single NN with similar parameter sizes, as well as the previous best model for combining SVM and NN. The sample attention mechanism is a flexible module that can be easily deepened and incorporated into neural networks that require it. △ Less

Submitted 25 September, 2023; originally announced September 2023.

arXiv:2309.12610 [pdf, other]

doi 10.1103/PhysRevC.109.044914

Reaction plane correlated triangular flow in Au+Au collisions at $\sqrt{s_{NN}}=3$ GeV

Authors: STAR Collaboration, M. I. Abdulhamid, B. E. Aboona, J. Adam, L. Adamczyk, J. R. Adams, I. Aggarwal, M. M. Aggarwal, Z. Ahammed, E. C. Aschenauer, S. Aslam, J. Atchison, V. Bairathi, J. G. Ball Cap, K. Barish, R. Bellwied, P. Bhagat, A. Bhasin, S. Bhatta, S. R. Bhosale, J. Bielcik, J. Bielcikova, J. D. Brandenburg, C. Broodo, X. Z. Cai , et al. (341 additional authors not shown)

Abstract: We measure triangular flow relative to the reaction plane at 3 GeV center-of-mass energy in Au+Au collisions at the BNL Relativistic Heavy Ion Collider. A significant $v_3$ signal for protons is observed, which increases for higher rapidity, higher transverse momentum, and more peripheral collisions. The triangular flow is essentially rapidity-odd with a slope at mid-rapidity, $dv_3/dy|_{(y=0)}$,… ▽ More We measure triangular flow relative to the reaction plane at 3 GeV center-of-mass energy in Au+Au collisions at the BNL Relativistic Heavy Ion Collider. A significant $v_3$ signal for protons is observed, which increases for higher rapidity, higher transverse momentum, and more peripheral collisions. The triangular flow is essentially rapidity-odd with a slope at mid-rapidity, $dv_3/dy|_{(y=0)}$, opposite in sign compared to the slope for directed flow. No significant $v_3$ signal is observed for charged pions and kaons. Comparisons with models suggest that a mean field potential is required to describe these results, and that the triangular shape of the participant nucleons is the result of stop** and nuclear geometry. △ Less

Submitted 19 April, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

Comments: 12 pages, 14 figures

Journal ref: Phys. Rev. C 109, 044914 (2024)

arXiv:2309.11454 [pdf, other]

NeighViz: Towards Better Understanding of Neighborhood Effects on Social Groups with Spatial Data

Authors: Yue Yu, Yifang Wang, Qisen Yang, Di Weng, Yongjun Zhang, Xiaogang Wu, Yingcai Wu, Huamin Qu

Abstract: Understanding how local environments influence individual behaviors, such as voting patterns or suicidal tendencies, is crucial in social science to reveal and reduce spatial disparities and promote social well-being. With the increasing availability of large-scale individual-level census data, new analytical opportunities arise for social scientists to explore human behaviors (e.g., political eng… ▽ More Understanding how local environments influence individual behaviors, such as voting patterns or suicidal tendencies, is crucial in social science to reveal and reduce spatial disparities and promote social well-being. With the increasing availability of large-scale individual-level census data, new analytical opportunities arise for social scientists to explore human behaviors (e.g., political engagement) among social groups at a fine-grained level. However, traditional statistical methods mostly focus on global, aggregated spatial correlations, which are limited to understanding and comparing the impact of local environments (e.g., neighborhoods) on human behaviors among social groups. In this study, we introduce a new analytical framework for analyzing multi-variate neighborhood effects between social groups. We then propose NeighVi, an interactive visual analytics system that helps social scientists explore, understand, and verify the influence of neighborhood effects on human behaviors. Finally, we use a case study to illustrate the effectiveness and usability of our system. △ Less

Submitted 20 September, 2023; originally announced September 2023.

Comments: Symposium on Visualization in Data Science (VDS) at IEEE VIS 2023

arXiv:2309.09392 [pdf, other]

Deep conditional generative models for longitudinal single-slice abdominal computed tomography harmonization

Authors: Xin Yu, Qi Yang, Yucheng Tang, Riqiang Gao, Shunxing Bao, Leon Y. Cai, Ho Hin Lee, Yuankai Huo, Ann Zenobia Moore, Luigi Ferrucci, Bennett A. Landman

Abstract: Two-dimensional single-slice abdominal computed tomography (CT) provides a detailed tissue map with high resolution allowing quantitative characterization of relationships between health conditions and aging. However, longitudinal analysis of body composition changes using these scans is difficult due to positional variation between slices acquired in different years, which leading to different or… ▽ More Two-dimensional single-slice abdominal computed tomography (CT) provides a detailed tissue map with high resolution allowing quantitative characterization of relationships between health conditions and aging. However, longitudinal analysis of body composition changes using these scans is difficult due to positional variation between slices acquired in different years, which leading to different organs/tissues captured. To address this issue, we propose C-SliceGen, which takes an arbitrary axial slice in the abdominal region as a condition and generates a pre-defined vertebral level slice by estimating structural changes in the latent space. Our experiments on 2608 volumetric CT data from two in-house datasets and 50 subjects from the 2015 Multi-Atlas Abdomen Labeling Challenge dataset (BTCV) Challenge demonstrate that our model can generate high-quality images that are realistic and similar. We further evaluate our method's capability to harmonize longitudinal positional variation on 1033 subjects from the Baltimore Longitudinal Study of Aging (BLSA) dataset, which contains longitudinal single abdominal slices, and confirmed that our method can harmonize the slice positional variance in terms of visceral fat area. This approach provides a promising direction for map** slices from different vertebral levels to a target slice and reducing positional variance for single-slice longitudinal analysis. The source code is available at: https://github.com/MASILab/C-SliceGen. △ Less

Submitted 17 September, 2023; originally announced September 2023.

arXiv:2309.06857 [pdf, other]

Listening for the Axion Echo with the 21 CentiMeter Array

Authors: Ariel Arza, Quan Guo, Lei Wu, Qiaoli Yang, Xiaolong Yang, Qiang Yuan, Bin Zhu

Abstract: The axion is a hypothetical elementary particle that could solve the long-standing strong CP problem in particle physics and the dark matter mystery in the cosmos. Due to the stimulation of the ambient photons, the axion dark matter decay into photons is significantly enhanced so that its echo signal could be detected by terrestrial telescopes. As a pathfinder, we study the expected sensitivity of… ▽ More The axion is a hypothetical elementary particle that could solve the long-standing strong CP problem in particle physics and the dark matter mystery in the cosmos. Due to the stimulation of the ambient photons, the axion dark matter decay into photons is significantly enhanced so that its echo signal could be detected by terrestrial telescopes. As a pathfinder, we study the expected sensitivity of searching for the axion dark matter in the mass range between $0.41$ and $1.6μ\text{eV}$ with the 21 CentiMeter Array (21CMA). We aim to cover the whole 21CMA frequency range in two years by using a 1MW emitter. We find that the resulting sensitivity on the axion-photon coupling could surpass other existing limits by about one order of magnitude. △ Less

Submitted 13 September, 2023; originally announced September 2023.

Comments: 7 pages, 4 figures

arXiv:2309.06787 [pdf, other]

DCTTS: Discrete Diffusion Model with Contrastive Learning for Text-to-speech Generation

Authors: Zhichao Wu, Qiulin Li, Sixing Liu, Qun Yang

Abstract: In the Text-to-speech(TTS) task, the latent diffusion model has excellent fidelity and generalization, but its expensive resource consumption and slow inference speed have always been a challenging. This paper proposes Discrete Diffusion Model with Contrastive Learning for Text-to-Speech Generation(DCTTS). The following contributions are made by DCTTS: 1) The TTS diffusion model based on discrete… ▽ More In the Text-to-speech(TTS) task, the latent diffusion model has excellent fidelity and generalization, but its expensive resource consumption and slow inference speed have always been a challenging. This paper proposes Discrete Diffusion Model with Contrastive Learning for Text-to-Speech Generation(DCTTS). The following contributions are made by DCTTS: 1) The TTS diffusion model based on discrete space significantly lowers the computational consumption of the diffusion model and improves sampling speed; 2) The contrastive learning method based on discrete space is used to enhance the alignment connection between speech and text and improve sampling quality; and 3) It uses an efficient text encoder to simplify the model's parameters and increase computational efficiency. The experimental results demonstrate that the approach proposed in this paper has outstanding speech synthesis quality and sampling speed while significantly reducing the resource consumption of diffusion model. The synthesized samples are available at https://github.com/lawtherWu/DCTTS. △ Less

Submitted 13 September, 2023; originally announced September 2023.

Comments: 5 pages, submitted to ICASSP

arXiv:2309.06258 [pdf, other]

Work Statistics and Adiabatic Assumption in Nonequilibrium Many-Body Theory

Authors: Yi Zuo, Qinghong Yang, Bang-Gui Liu, Dong E Liu

Abstract: Keldysh field theory, based on adiabatic assumptions, serves as an widely used framework for addressing nonequilibrium many-body systems. Nonetheless, the validity of such adiabatic assumptions when addressing interacting Gibbs states remains a topic of contention. We use the knowledge of work statistics developed in nonequilibrium thermodynamics to study this problem. Consequently, we deduce a un… ▽ More Keldysh field theory, based on adiabatic assumptions, serves as an widely used framework for addressing nonequilibrium many-body systems. Nonetheless, the validity of such adiabatic assumptions when addressing interacting Gibbs states remains a topic of contention. We use the knowledge of work statistics developed in nonequilibrium thermodynamics to study this problem. Consequently, we deduce a universal theorem delineating the characteristics of evolutions that transition an initial Gibbs state to another. Based on this theorem, we analytically ascertain that adiabatic evolutions fail to transition a non-interacting Gibbs state to its interacting counterpart. However, this adiabatic approach remains a superior approximation relative to its non-adiabatic counterpart. Numerics verifying our theory and predictions are also provided. Furthermore, our findings render insights into the preparation of Gibbs states within the domain of quantum computation. △ Less

Submitted 21 September, 2023; v1 submitted 12 September, 2023; originally announced September 2023.

Comments: 5+7 pages, 2 figures. Supplementary information containing calculation details is presented. Expressions have been polished

arXiv:2309.05325 [pdf]

Superfolded configuration induced low thermal conductivity in two-dimensional carbon allotropes revealed via machine learning force constant potential

Authors: Linfeng Yu, Kexin Dong, Qi Yang, Yi Zhang, Xiong Zheng, Huimin Wang, Zhenzhen Qin, Guangzhao Qin

Abstract: Understanding the fundamental link between structure and functionalization is crucial for the design and optimization of functional materials, since different structural configurations could trigger materials to demonstrate diverse physical, chemical, and electronic properties. However, the correlation between crystal structure and thermal conductivity (\k{appa}) remains enigmatic. In this study,… ▽ More Understanding the fundamental link between structure and functionalization is crucial for the design and optimization of functional materials, since different structural configurations could trigger materials to demonstrate diverse physical, chemical, and electronic properties. However, the correlation between crystal structure and thermal conductivity (\k{appa}) remains enigmatic. In this study, taking two-dimensional (2D) carbon allotropes as study cases, we utilize phonon Boltzmann transport equation (BTE) along with machine learning force constant potential to thoroughly explore the complex folding structure of pure sp2 hybridized carbon materials from the perspective of crystal structure, mode-level phonon resolved thermal transport, and atomic interactions, with the goal of identifying the underlying relationship between 2D geometry and \k{appa}. We propose two potential structure evolution mechanisms for targeted thermal transport properties: in-plane and out-of-plane folding evolutions, which are generally applicable to 2D carbon allotropes. It is revealed that the folded structure produces strong symmetry breaking, and simultaneously produces exceptionally strongly suppressed phonon group velocities, strong phonon-phonon scattering, and weak phonon hydrodynamics, which ultimately lead to low \k{appa}. The insight into the folded effect of atomic structures on thermal transport deepens our understanding of the relationship between structure and functionalization, which offers straightforward guidance for designing novel nanomaterials with targeted \k{appa}, as well as propel developments in materials science and engineering. △ Less

Submitted 28 January, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

arXiv:2309.04982 [pdf, other]

Efficient Link Prediction in Continuous-Time Dynamic Networks using Optimal Transmission and Metropolis Hastings Sampling

Authors: Ruizhi Zhang, Wei Wei, Qiming Yang, Zhenyu Shi, Xiangnan Feng, Zhiming Zheng

Abstract: Efficient link prediction in continuous-time dynamic networks is a challenging problem that has attracted much research attention in recent years. A widely used approach to dynamic network link prediction is to extract the local structure of the target link through temporal random walk on the network and learn node features using a coding model. However, this approach often assumes that candidate… ▽ More Efficient link prediction in continuous-time dynamic networks is a challenging problem that has attracted much research attention in recent years. A widely used approach to dynamic network link prediction is to extract the local structure of the target link through temporal random walk on the network and learn node features using a coding model. However, this approach often assumes that candidate temporal neighbors follow some certain types of distributions, which may be inappropriate for real-world networks, thereby incurring information loss. To address this limitation, we propose a framework in continuous-time dynamic networks based on Optimal Transmission (OT) and Metropolis Hastings (MH) sampling (COM). Specifically, we use optimal transmission theory to calculate the Wasserstein distance between the current node and the time-valid candidate neighbors to minimize information loss in node information propagation. Additionally, we employ the MH algorithm to obtain higher-order structural relationships in the vicinity of the target link, as it is a Markov Chain Monte Carlo method and can flexibly simulate target distributions with complex patterns. We demonstrate the effectiveness of our proposed method through experiments on eight datasets from different fields. △ Less

Submitted 10 September, 2023; originally announced September 2023.

Comments: 11 pages, 7 figures

arXiv:2309.04109 [pdf, other]

From Text to Mask: Localizing Entities Using the Attention of Text-to-Image Diffusion Models

Authors: Changming Xiao, Qi Yang, Feng Zhou, Changshui Zhang

Abstract: Diffusion models have revolted the field of text-to-image generation recently. The unique way of fusing text and image information contributes to their remarkable capability of generating highly text-related images. From another perspective, these generative models imply clues about the precise correlation between words and pixels. In this work, a simple but effective method is proposed to utilize… ▽ More Diffusion models have revolted the field of text-to-image generation recently. The unique way of fusing text and image information contributes to their remarkable capability of generating highly text-related images. From another perspective, these generative models imply clues about the precise correlation between words and pixels. In this work, a simple but effective method is proposed to utilize the attention mechanism in the denoising network of text-to-image diffusion models. Without re-training nor inference-time optimization, the semantic grounding of phrases can be attained directly. We evaluate our method on Pascal VOC 2012 and Microsoft COCO 2014 under weakly-supervised semantic segmentation setting and our method achieves superior performance to prior methods. In addition, the acquired word-pixel correlation is found to be generalizable for the learned text embedding of customized generation methods, requiring only a few modifications. To validate our discovery, we introduce a new practical task called "personalized referring image segmentation" with a new dataset. Experiments in various situations demonstrate the advantages of our method compared to strong baselines on this task. In summary, our work reveals a novel way to extract the rich multi-modal knowledge hidden in diffusion models for segmentation. △ Less

Submitted 8 September, 2023; originally announced September 2023.

arXiv:2309.04071 [pdf, other]

doi 10.1117/12.3009084

Enhancing Hierarchical Transformers for Whole Brain Segmentation with Intracranial Measurements Integration

Authors: Xin Yu, Yucheng Tang, Qi Yang, Ho Hin Lee, Shunxing Bao, Yuankai Huo, Bennett A. Landman

Abstract: Whole brain segmentation with magnetic resonance imaging (MRI) enables the non-invasive measurement of brain regions, including total intracranial volume (TICV) and posterior fossa volume (PFV). Enhancing the existing whole brain segmentation methodology to incorporate intracranial measurements offers a heightened level of comprehensiveness in the analysis of brain structures. Despite its potentia… ▽ More Whole brain segmentation with magnetic resonance imaging (MRI) enables the non-invasive measurement of brain regions, including total intracranial volume (TICV) and posterior fossa volume (PFV). Enhancing the existing whole brain segmentation methodology to incorporate intracranial measurements offers a heightened level of comprehensiveness in the analysis of brain structures. Despite its potential, the task of generalizing deep learning techniques for intracranial measurements faces data availability constraints due to limited manually annotated atlases encompassing whole brain and TICV/PFV labels. In this paper, we enhancing the hierarchical transformer UNesT for whole brain segmentation to achieve segmenting whole brain with 133 classes and TICV/PFV simultaneously. To address the problem of data scarcity, the model is first pretrained on 4859 T1-weighted (T1w) 3D volumes sourced from 8 different sites. These volumes are processed through a multi-atlas segmentation pipeline for label generation, while TICV/PFV labels are unavailable. Subsequently, the model is finetuned with 45 T1w 3D volumes from Open Access Series Imaging Studies (OASIS) where both 133 whole brain classes and TICV/PFV labels are available. We evaluate our method with Dice similarity coefficients(DSC). We show that our model is able to conduct precise TICV/PFV estimation while maintaining the 132 brain regions performance at a comparable level. Code and trained model are available at: https://github.com/MASILab/UNesT/tree/main/wholebrainSeg. △ Less

Submitted 10 April, 2024; v1 submitted 7 September, 2023; originally announced September 2023.

arXiv:2309.01458 [pdf, other]

Leveraging Reward Consistency for Interpretable Feature Discovery in Reinforcement Learning

Authors: Qisen Yang, Huanqian Wang, Mukun Tong, Wenjie Shi, Gao Huang, Shiji Song

Abstract: The black-box nature of deep reinforcement learning (RL) hinders them from real-world applications. Therefore, interpreting and explaining RL agents have been active research topics in recent years. Existing methods for post-hoc explanations usually adopt the action matching principle to enable an easy understanding of vision-based RL agents. In this paper, it is argued that the commonly used acti… ▽ More The black-box nature of deep reinforcement learning (RL) hinders them from real-world applications. Therefore, interpreting and explaining RL agents have been active research topics in recent years. Existing methods for post-hoc explanations usually adopt the action matching principle to enable an easy understanding of vision-based RL agents. In this paper, it is argued that the commonly used action matching principle is more like an explanation of deep neural networks (DNNs) than the interpretation of RL agents. It may lead to irrelevant or misplaced feature attribution when different DNNs' outputs lead to the same rewards or different rewards result from the same outputs. Therefore, we propose to consider rewards, the essential objective of RL agents, as the essential objective of interpreting RL agents as well. To ensure reward consistency during interpretable feature discovery, a novel framework (RL interpreting RL, denoted as RL-in-RL) is proposed to solve the gradient disconnection from actions to rewards. We verify and evaluate our method on the Atari 2600 games as well as Duckietown, a challenging self-driving car simulator environment. The results show that our method manages to keep reward (or return) consistency and achieves high-quality feature attribution. Further, a series of analytical experiments validate our assumption of the action matching principle's limitations. △ Less

Submitted 4 September, 2023; originally announced September 2023.

arXiv:2309.01448 [pdf, other]

Hundreds Guide Millions: Adaptive Offline Reinforcement Learning with Expert Guidance

Authors: Qisen Yang, Shenzhi Wang, Qihang Zhang, Gao Huang, Shiji Song

Abstract: Offline reinforcement learning (RL) optimizes the policy on a previously collected dataset without any interactions with the environment, yet usually suffers from the distributional shift problem. To mitigate this issue, a typical solution is to impose a policy constraint on a policy improvement objective. However, existing methods generally adopt a ``one-size-fits-all'' practice, i.e., kee** on… ▽ More Offline reinforcement learning (RL) optimizes the policy on a previously collected dataset without any interactions with the environment, yet usually suffers from the distributional shift problem. To mitigate this issue, a typical solution is to impose a policy constraint on a policy improvement objective. However, existing methods generally adopt a ``one-size-fits-all'' practice, i.e., kee** only a single improvement-constraint balance for all the samples in a mini-batch or even the entire offline dataset. In this work, we argue that different samples should be treated with different policy constraint intensities. Based on this idea, a novel plug-in approach named Guided Offline RL (GORL) is proposed. GORL employs a guiding network, along with only a few expert demonstrations, to adaptively determine the relative importance of the policy improvement and policy constraint for every sample. We theoretically prove that the guidance provided by our method is rational and near-optimal. Extensive experiments on various environments suggest that GORL can be easily installed on most offline RL algorithms with statistically significant performance improvements. △ Less

Submitted 4 September, 2023; originally announced September 2023.

arXiv:2309.00575 [pdf]

Critical roles of edge turbulent transport in the formation of high-field-side high-density front and density limit disruption in J-TEXT tokamak

Authors: Peng Shi, Yuhan Wang, Li Gao, Hongjuan Sun1, Qinghu Yang, Xin Xu, Chengshuo Shen, Yanqiu Chen, Qinlin Tao, Zhipeng Chen, Haosheng Wu, Lu Wang, Zhongyong Chen, Nengchao Wang, Zhoujun Yang, **gchun Li, Yonghua Ding, Yuan Pan, J-TEXT team

Abstract: This article presents an in-depth study of the sequence of events leading to density limit disruption in J-TEXT tokamak plasmas, with an emphasis on boudary turbulent transport and the high-field-side high-density (HFSHD) front. These phenomena were extensively investigated by using Langmuir probe and Polarimeter-interferometer diagnostics. This article presents an in-depth study of the sequence of events leading to density limit disruption in J-TEXT tokamak plasmas, with an emphasis on boudary turbulent transport and the high-field-side high-density (HFSHD) front. These phenomena were extensively investigated by using Langmuir probe and Polarimeter-interferometer diagnostics. △ Less

Submitted 1 September, 2023; originally announced September 2023.

arXiv:2308.15774 [pdf]

Comparing Spatial Navigation and Human Environment Interaction in Virtual Reality vs. Identical Real Environments across the Adult Lifespan

Authors: Saleh Kalantari, Bill Tong Xu, Armin Mostafavi, Anne Seoyoung Lee, Qi Yang

Abstract: Virtual reality (VR) is increasingly being used as a research platform for investigating human responses to environmental variables. While VR provides tremendous advantages in terms of variable isolation and manipulation, and ease of data-collection, some researchers have expressed concerns about the ecological validity of VR-based findings. In the current study we replicated a real-world, multi-l… ▽ More Virtual reality (VR) is increasingly being used as a research platform for investigating human responses to environmental variables. While VR provides tremendous advantages in terms of variable isolation and manipulation, and ease of data-collection, some researchers have expressed concerns about the ecological validity of VR-based findings. In the current study we replicated a real-world, multi-level educational facility in VR, and compared data collected in the VR and real-world environments as participants (n=36) completed identical wayfinding tasks. We found significant differences in all of the measures used, including distance covered, number of mistakes made, time for task completion, spatial memory, extent of backtracking, observation of directional signs, perceived uncertainty levels, perceived cognitive workload, and perceived task difficulty. We also analyzed potential age-related effects to look for heightened VR/real response discrepancies among older adult participants (>55 years) compared to younger adults. This analysis yielded no significant effects of age. Finally, we examined the spatial distribution of self-reported wayfinding uncertainty across the building floorplan, finding that areas in which uncertainty was most pronounced were similar between the real-world and VR settings. Thus, participants appeared to be responding to the same environmental features in the real and VR conditions, but the extent of these responses was significantly different. Overall, the findings suggest that when VR is used to contrast varying environmental design conditions the resulting data should be interpreted cautiously and should not be generalized into real-world conclusions without further validation. △ Less

Submitted 30 August, 2023; originally announced August 2023.

arXiv:2308.13250 [pdf, other]

TC-LIF: A Two-Compartment Spiking Neuron Model for Long-Term Sequential Modelling

Authors: Shimin Zhang, Qu Yang, Chenxiang Ma, Jibin Wu, Haizhou Li, Kay Chen Tan

Abstract: The identification of sensory cues associated with potential opportunities and dangers is frequently complicated by unrelated events that separate useful cues by long delays. As a result, it remains a challenging task for state-of-the-art spiking neural networks (SNNs) to establish long-term temporal dependency between distant cues. To address this challenge, we propose a novel biologically inspir… ▽ More The identification of sensory cues associated with potential opportunities and dangers is frequently complicated by unrelated events that separate useful cues by long delays. As a result, it remains a challenging task for state-of-the-art spiking neural networks (SNNs) to establish long-term temporal dependency between distant cues. To address this challenge, we propose a novel biologically inspired Two-Compartment Leaky Integrate-and-Fire spiking neuron model, dubbed TC-LIF. The proposed model incorporates carefully designed somatic and dendritic compartments that are tailored to facilitate learning long-term temporal dependencies. Furthermore, a theoretical analysis is provided to validate the effectiveness of TC-LIF in propagating error gradients over an extended temporal duration. Our experimental results, on a diverse range of temporal classification tasks, demonstrate superior temporal classification capability, rapid training convergence, and high energy efficiency of the proposed TC-LIF model. Therefore, this work opens up a myriad of opportunities for solving challenging temporal processing tasks on emerging neuromorphic computing systems. Our code is publicly available at https://github.com/ZhangShimin1/TC-LIF. △ Less

Submitted 17 February, 2024; v1 submitted 25 August, 2023; originally announced August 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2307.07231

arXiv:2308.12712 [pdf, other]

doi 10.1145/3581783.3612105

Ground-to-Aerial Person Search: Benchmark Dataset and Approach

Authors: Shizhou Zhang, Qingchun Yang, De Cheng, Yinghui Xing, Guoqiang Liang, Peng Wang, Yanning Zhang

Abstract: In this work, we construct a large-scale dataset for Ground-to-Aerial Person Search, named G2APS, which contains 31,770 images of 260,559 annotated bounding boxes for 2,644 identities appearing in both of the UAVs and ground surveillance cameras. To our knowledge, this is the first dataset for cross-platform intelligent surveillance applications, where the UAVs could work as a powerful complement… ▽ More In this work, we construct a large-scale dataset for Ground-to-Aerial Person Search, named G2APS, which contains 31,770 images of 260,559 annotated bounding boxes for 2,644 identities appearing in both of the UAVs and ground surveillance cameras. To our knowledge, this is the first dataset for cross-platform intelligent surveillance applications, where the UAVs could work as a powerful complement for the ground surveillance cameras. To more realistically simulate the actual cross-platform Ground-to-Aerial surveillance scenarios, the surveillance cameras are fixed about 2 meters above the ground, while the UAVs capture videos of persons at different location, with a variety of view-angles, flight attitudes and flight modes. Therefore, the dataset has the following unique characteristics: 1) drastic view-angle changes between query and gallery person images from cross-platform cameras; 2) diverse resolutions, poses and views of the person images under 9 rich real-world scenarios. On basis of the G2APS benchmark dataset, we demonstrate detailed analysis about current two-step and end-to-end person search methods, and further propose a simple yet effective knowledge distillation scheme on the head of the ReID network, which achieves state-of-the-art performances on both of the G2APS and the previous two public person search datasets, i.e., PRW and CUHK-SYSU. The dataset and source code available on \url{https://github.com/yqc123456/HKD_for_person_search}. △ Less

Submitted 24 August, 2023; originally announced August 2023.

Comments: Accepted by ACM MM 2023

ACM Class: I.5.4; I.4.8

arXiv:2308.11841 [pdf, other]

A Survey for Federated Learning Evaluations: Goals and Measures

Authors: Di Chai, Leye Wang, Liu Yang, Junxue Zhang, Kai Chen, Qiang Yang

Abstract: Evaluation is a systematic approach to assessing how well a system achieves its intended purpose. Federated learning (FL) is a novel paradigm for privacy-preserving machine learning that allows multiple parties to collaboratively train models without sharing sensitive data. However, evaluating FL is challenging due to its interdisciplinary nature and diverse goals, such as utility, efficiency, and… ▽ More Evaluation is a systematic approach to assessing how well a system achieves its intended purpose. Federated learning (FL) is a novel paradigm for privacy-preserving machine learning that allows multiple parties to collaboratively train models without sharing sensitive data. However, evaluating FL is challenging due to its interdisciplinary nature and diverse goals, such as utility, efficiency, and security. In this survey, we first review the major evaluation goals adopted in the existing studies and then explore the evaluation metrics used for each goal. We also introduce FedEval, an open-source platform that provides a standardized and comprehensive evaluation framework for FL algorithms in terms of their utility, efficiency, and security. Finally, we discuss several challenges and future research directions for FL evaluation. △ Less

Submitted 23 March, 2024; v1 submitted 22 August, 2023; originally announced August 2023.

arXiv:2308.11761 [pdf, other]

KnowledGPT: Enhancing Large Language Models with Retrieval and Storage Access on Knowledge Bases

Authors: Xintao Wang, Qianwen Yang, Yongting Qiu, Jiaqing Liang, Qianyu He, Zhouhong Gu, Yanghua Xiao, Wei Wang

Abstract: Large language models (LLMs) have demonstrated impressive impact in the field of natural language processing, but they still struggle with several issues regarding, such as completeness, timeliness, faithfulness and adaptability. While recent efforts have focuses on connecting LLMs with external knowledge sources, the integration of knowledge bases (KBs) remains understudied and faces several chal… ▽ More Large language models (LLMs) have demonstrated impressive impact in the field of natural language processing, but they still struggle with several issues regarding, such as completeness, timeliness, faithfulness and adaptability. While recent efforts have focuses on connecting LLMs with external knowledge sources, the integration of knowledge bases (KBs) remains understudied and faces several challenges. In this paper, we introduce KnowledGPT, a comprehensive framework to bridge LLMs with various knowledge bases, facilitating both the retrieval and storage of knowledge. The retrieval process employs the program of thought prompting, which generates search language for KBs in code format with pre-defined functions for KB operations. Besides retrieval, KnowledGPT offers the capability to store knowledge in a personalized KB, catering to individual user demands. With extensive experiments, we show that by integrating LLMs with KBs, KnowledGPT properly answers a broader range of questions requiring world knowledge compared with vanilla LLMs, utilizing both knowledge existing in widely-known KBs and extracted into personalized KBs. △ Less

Submitted 17 August, 2023; originally announced August 2023.

Comments: 21 pages, 10 figures

arXiv:2308.11376 [pdf, other]

Boundary-RL: Reinforcement Learning for Weakly-Supervised Prostate Segmentation in TRUS Images

Authors: Weixi Yi, Vasilis Stavrinides, Zachary M. C. Baum, Qianye Yang, Dean C. Barratt, Matthew J. Clarkson, Yipeng Hu, Shaheer U. Saeed

Abstract: We propose Boundary-RL, a novel weakly supervised segmentation method that utilises only patch-level labels for training. We envision the segmentation as a boundary detection problem, rather than a pixel-level classification as in previous works. This outlook on segmentation may allow for boundary delineation under challenging scenarios such as where noise artefacts may be present within the regio… ▽ More We propose Boundary-RL, a novel weakly supervised segmentation method that utilises only patch-level labels for training. We envision the segmentation as a boundary detection problem, rather than a pixel-level classification as in previous works. This outlook on segmentation may allow for boundary delineation under challenging scenarios such as where noise artefacts may be present within the region-of-interest (ROI) boundaries, where traditional pixel-level classification-based weakly supervised methods may not be able to effectively segment the ROI. Particularly of interest, ultrasound images, where intensity values represent acoustic impedance differences between boundaries, may also benefit from the boundary delineation approach. Our method uses reinforcement learning to train a controller function to localise boundaries of ROIs using a reward derived from a pre-trained boundary-presence classifier. The classifier indicates when an object boundary is encountered within a patch, as the controller modifies the patch location in a sequential Markov decision process. The classifier itself is trained using only binary patch-level labels of object presence, which are the only labels used during training of the entire boundary delineation framework, and serves as a weak signal to inform the boundary delineation. The use of a controller function ensures that a sliding window over the entire image is not necessary. It also prevents possible false-positive or -negative cases by minimising number of patches passed to the boundary-presence classifier. We evaluate our proposed approach for a clinically relevant task of prostate gland segmentation on trans-rectal ultrasound images. We show improved performance compared to other tested weakly supervised methods, using the same labels e.g., multiple instance learning. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Comments: Accepted to MICCAI Workshop MLMI 2023 (14th International Conference on Machine Learning in Medical Imaging)

arXiv:2308.10863 [pdf, other]

doi 10.1103/PhysRevB.109.085139

Emergence of charge density wave and superconducting phase transitions through Lorentz-invariant interactions in the Haldane-Hubbard model

Authors: Qiao Yang, Yu-Biao Wu, Lin Zhuang, Ji-Min Zhao, Wu-Ming Liu

Abstract: We derive Lorentz-invariant four-fermion interactions, including Nambu-Jona-Lasinio type and superconducting type, which are widely studied in high-energy physics, from the honeycomb lattice Hamiltonian with Hubbard interaction. We investigate the phase transitions induced by these two interactions and consider the effects of the chemical potential and magnetic flux (Haldane mass term) on these ph… ▽ More We derive Lorentz-invariant four-fermion interactions, including Nambu-Jona-Lasinio type and superconducting type, which are widely studied in high-energy physics, from the honeycomb lattice Hamiltonian with Hubbard interaction. We investigate the phase transitions induced by these two interactions and consider the effects of the chemical potential and magnetic flux (Haldane mass term) on these phase transitions. We find that the charge-density-wave and superconductivity generated by the attractive interactions are mainly controlled by the chemical potential, while the magnetic flux delimits the domain of phase transition. Our analysis underscores the influence of the initial topological state on the phase transitions, a facet largely overlooked in prior studies. We present experimental protocols using cold atoms to verify our theoretical results. △ Less

Submitted 1 June, 2024; v1 submitted 21 August, 2023; originally announced August 2023.

Journal ref: Physical Review B 109,085139 (2024)

arXiv:2308.09678 [pdf, other]

PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation

Authors: Hanbing Liu, Jun-Yan He, Zhi-Qi Cheng, Wangmeng Xiang, Qize Yang, Wenhao Chai, Gaoang Wang, Xu Bao, Bin Luo, Yifeng Geng, Xuansong Xie

Abstract: Existing 3D human pose estimators face challenges in adapting to new datasets due to the lack of 2D-3D pose pairs in training sets. To overcome this issue, we propose \textit{Multi-Hypothesis \textbf{P}ose \textbf{Syn}thesis \textbf{D}omain \textbf{A}daptation} (\textbf{PoSynDA}) framework to bridge this data disparity gap in target domain. Typically, PoSynDA uses a diffusion-inspired structure to… ▽ More Existing 3D human pose estimators face challenges in adapting to new datasets due to the lack of 2D-3D pose pairs in training sets. To overcome this issue, we propose \textit{Multi-Hypothesis \textbf{P}ose \textbf{Syn}thesis \textbf{D}omain \textbf{A}daptation} (\textbf{PoSynDA}) framework to bridge this data disparity gap in target domain. Typically, PoSynDA uses a diffusion-inspired structure to simulate 3D pose distribution in the target domain. By incorporating a multi-hypothesis network, PoSynDA generates diverse pose hypotheses and aligns them with the target domain. To do this, it first utilizes target-specific source augmentation to obtain the target domain distribution data from the source domain by decoupling the scale and position parameters. The process is then further refined through the teacher-student paradigm and low-rank adaptation. With extensive comparison of benchmarks such as Human3.6M and MPI-INF-3DHP, PoSynDA demonstrates competitive performance, even comparable to the target-trained MixSTE model\cite{zhang2022mixste}. This work paves the way for the practical application of 3D human pose estimation in unseen domains. The code is available at https://github.com/hbing-l/PoSynDA. △ Less

Submitted 16 October, 2023; v1 submitted 18 August, 2023; originally announced August 2023.

Comments: Accepted to ACM Multimedia 2023; 10 pages, 4 figures, 8 tables; the code is at https://github.com/hbing-l/PoSynDA

arXiv:2308.08730 [pdf, other]

Learning A Coarse-to-Fine Diffusion Transformer for Image Restoration

Authors: Liyan Wang, Qinyu Yang, Cong Wang, Wei Wang, **shan Pan, Zhixun Su

Abstract: Recent years have witnessed the remarkable performance of diffusion models in various vision tasks. However, for image restoration that aims to recover clear images with sharper details from given degraded observations, diffusion-based methods may fail to recover promising results due to inaccurate noise estimation. Moreover, simple constraining noises cannot effectively learn complex degradation… ▽ More Recent years have witnessed the remarkable performance of diffusion models in various vision tasks. However, for image restoration that aims to recover clear images with sharper details from given degraded observations, diffusion-based methods may fail to recover promising results due to inaccurate noise estimation. Moreover, simple constraining noises cannot effectively learn complex degradation information, which subsequently hinders the model capacity. To solve the above problems, we propose a coarse-to-fine diffusion Transformer (C2F-DFT) for image restoration. Specifically, our C2F-DFT contains diffusion self-attention (DFSA) and diffusion feed-forward network (DFN) within a new coarse-to-fine training scheme. The DFSA and DFN respectively capture the long-range diffusion dependencies and learn hierarchy diffusion representation to facilitate better restoration. In the coarse training stage, our C2F-DFT estimates noises and then generates the final clean image by a sampling algorithm. To further improve the restoration quality, we propose a simple yet effective fine training scheme. It first exploits the coarse-trained diffusion model with fixed steps to generate restoration results, which then would be constrained with corresponding ground-truth ones to optimize the models to remedy the unsatisfactory results affected by inaccurate noise estimation. Extensive experiments show that C2F-DFT significantly outperforms diffusion-based restoration method IR-SDE and achieves competitive performance compared with Transformer-based state-of-the-art methods on $3$ tasks, including image deraining, image deblurring, and real image denoising. Code is available at https://github.com/wlydlut/C2F-DFT. △ Less

Submitted 8 October, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

Comments: 13 pages, 10 figures

arXiv:2308.08477 [pdf, other]

Detecting Quadratically Coupled Ultra-light Dark Matter with Stimulated Annihilation

Authors: Yuanlin Gong, Xin Liu, Lei Wu, Qiaoli Yang, Bin Zhu

Abstract: Ultra-light Dark Matter (ULDM) is one of the most promising DM candidates. Due to the Bose enhancement, we find the annihilation rate of the ULDM in the presence of background photon radiation can be greatly enhanced and produce a distinctive reflected electromagnetic wave with an angular frequency equal to the ULDM mass. We propose to utilize such stimulated annihilation to probe the ULDM with th… ▽ More Ultra-light Dark Matter (ULDM) is one of the most promising DM candidates. Due to the Bose enhancement, we find the annihilation rate of the ULDM in the presence of background photon radiation can be greatly enhanced and produce a distinctive reflected electromagnetic wave with an angular frequency equal to the ULDM mass. We propose to utilize such stimulated annihilation to probe the ULDM with the electromagnetic quadratic coupling by emitting a beam of radio into space. With a power of 50 MW emitter, we forecast the sensitivity of quadratic coupling in different local halo models for low-frequency radio telescopes, such as LOFAR, UTR-2 and ngLOBO. △ Less

Submitted 12 February, 2024; v1 submitted 16 August, 2023; originally announced August 2023.

Comments: 7+3pages, 2 figures, discussions and references are added, and two equivalent formalisms are given. version accepted by PRD

arXiv:2308.08223 [pdf]

doi 10.1016/j.jcrysgro.2023.127398

Growth of millimeter-sized high-quality CuFeSe$_2$ single crystals by the molten salt method and study of their semiconducting behavior

Authors: Mingwei Ma, Binbin Ruan, Menghu Zhou, Yadong Gu, Qingxin Dong, Qingsong Yang, Qiaoyu Wang, Lewei Chen, Yunqing Shi, Junkun Yi, Genfu Chen, Zhian Ren

Abstract: An eutectic AlCl$_3$/KCl molten salt method in a horizontal configuration was employed to grow millimeter-sized and composition homogeneous CuFeSe$_2$ single crystals due to the continuous growth process in a temperature gradient induced solution convection. The typical as-grown CuFeSe$_2$ single crystals in cubic forms are nearly 1.6$\times$1.2$\times$1.0 mm3 in size. The chemical composition and… ▽ More An eutectic AlCl$_3$/KCl molten salt method in a horizontal configuration was employed to grow millimeter-sized and composition homogeneous CuFeSe$_2$ single crystals due to the continuous growth process in a temperature gradient induced solution convection. The typical as-grown CuFeSe$_2$ single crystals in cubic forms are nearly 1.6$\times$1.2$\times$1.0 mm3 in size. The chemical composition and homogeneity of the crystals was examined by both inductively coupled plasma atomic emission spectroscopy and energy dispersive spectrometer with Cu:Fe:Se = 0.96:1.00:1.99 consistent with the stoichiometric composition of CuFeSe$_2$. The magnetic measurements suggest a ferrimagnetic or weak ferromagnetic transition below T$_C$ = 146 K and the resistivity reveals a semiconducting behavior and an abrupt increase below T$_C$. △ Less

Submitted 16 August, 2023; originally announced August 2023.

Journal ref: Journal of Crystal Growth (2023)

Showing 201–250 of 1,410 results for author: Yang, Q