-
Penetrative AI: Making LLMs Comprehend the Physical World
Authors:
Huatao Xu,
Liying Han,
Qirui Yang,
Mo Li,
Mani Srivastava
Abstract:
Recent developments in Large Language Models (LLMs) have demonstrated their remarkable capabilities across a range of tasks. Questions, however, persist about the nature of LLMs and their potential to integrate common-sense human knowledge when performing tasks involving information about the real physical world. This paper delves into these questions by exploring how LLMs can be extended to inter…
▽ More
Recent developments in Large Language Models (LLMs) have demonstrated their remarkable capabilities across a range of tasks. Questions, however, persist about the nature of LLMs and their potential to integrate common-sense human knowledge when performing tasks involving information about the real physical world. This paper delves into these questions by exploring how LLMs can be extended to interact with and reason about the physical world through IoT sensors and actuators, a concept that we term "Penetrative AI". The paper explores such an extension at two levels of LLMs' ability to penetrate into the physical world via the processing of sensory signals. Our preliminary findings indicate that LLMs, with ChatGPT being the representative example in our exploration, have considerable and unique proficiency in employing the embedded world knowledge for interpreting IoT sensor data and reasoning over them about tasks in the physical realm. Not only this opens up new applications for LLMs beyond traditional text-based tasks, but also enables new ways of incorporating human knowledge in cyber-physical systems.
△ Less
Submitted 12 June, 2024; v1 submitted 14 October, 2023;
originally announced October 2023.
-
Wafer-scale Computing: Advancements, Challenges, and Future Perspectives
Authors:
Yang Hu,
Xinhan Lin,
Huizheng Wang,
Zhen He,
Xingmao Yu,
Jiahao Zhang,
Qize Yang,
Zheng Xu,
Sihan Guan,
Jiahao Fang,
Haoran Shang,
Xinru Tang,
Xu Dai,
Shaojun Wei,
Shouyi Yin
Abstract:
Nowadays, artificial intelligence (AI) technology with large models plays an increasingly important role in both academia and industry. It also brings a rapidly increasing demand for the computing power of the hardware. As the computing demand for AI continues to grow, the growth of hardware computing power has failed to keep up. This has become a significant factor restricting the development of…
▽ More
Nowadays, artificial intelligence (AI) technology with large models plays an increasingly important role in both academia and industry. It also brings a rapidly increasing demand for the computing power of the hardware. As the computing demand for AI continues to grow, the growth of hardware computing power has failed to keep up. This has become a significant factor restricting the development of AI. The augmentation of hardware computing power is mainly propelled by the escalation of transistor density and chip area. However, the former is impeded by the termination of the Moore's Law and Dennard scaling, and the latter is significantly restricted by the challenge of disrupting the legacy fabrication equipment and process.
In recent years, advanced packaging technologies that have gradually matured are increasingly used to implement bigger chips that integrate multiple chiplets, while still providing interconnections with chip-level density and bandwidth. Compared to conventional high-performance computing paradigms such as multi-accelerator and datacenter-scale computing, Wafer-scale Computing shows remarkable advantages in communication bandwidth, integration density, and programmability potential. Not surprisingly, disruptive Wafer-scale Computing also brings unprecedented design challenges for hardware architecture, design-system-technology co-optimization, power and cooling systems, and compiler tool chain. At present, there are no comprehensive surveys summarizing the current state and design insights of Wafer-scale Computing. This paper aims to take the first step to help academia and industry review existing wafer-scale chips and essential technologies in a one-stop manner. So that people can conveniently grasp the basic knowledge and key points, understand the achievements and shortcomings of existing research, and contribute to this promising research direction.
△ Less
Submitted 14 October, 2023;
originally announced October 2023.
-
A Survey of Heterogeneous Transfer Learning
Authors:
Runxue Bao,
Yiming Sun,
Yuhe Gao,
**dong Wang,
Qiang Yang,
Haifeng Chen,
Zhi-Hong Mao,
Ye Ye
Abstract:
The application of transfer learning, an approach utilizing knowledge from a source domain to enhance model performance in a target domain, has seen a tremendous rise in recent years, underpinning many real-world scenarios. The key to its success lies in the shared common knowledge between the domains, a prerequisite in most transfer learning methodologies. These methods typically presuppose ident…
▽ More
The application of transfer learning, an approach utilizing knowledge from a source domain to enhance model performance in a target domain, has seen a tremendous rise in recent years, underpinning many real-world scenarios. The key to its success lies in the shared common knowledge between the domains, a prerequisite in most transfer learning methodologies. These methods typically presuppose identical feature spaces and label spaces in both domains, known as homogeneous transfer learning, which, however, is not always a practical assumption. Oftentimes, the source and target domains vary in feature spaces, data distributions, and label spaces, making it challenging or costly to secure source domain data with identical feature and label spaces as the target domain. Arbitrary elimination of these differences is not always feasible or optimal. Thus, heterogeneous transfer learning, acknowledging and dealing with such disparities, has emerged as a promising approach for a variety of tasks. Despite the existence of a survey in 2017 on this topic, the fast-paced advances post-2017 necessitate an updated, in-depth review. We therefore present a comprehensive survey of recent developments in heterogeneous transfer learning methods, offering a systematic guide for future research. Our paper reviews methodologies for diverse learning scenarios, discusses the limitations of current studies, and covers various application contexts, including Natural Language Processing, Computer Vision, Multimodality, and Biomedicine, to foster a deeper understanding and spur future research.
△ Less
Submitted 15 October, 2023; v1 submitted 12 October, 2023;
originally announced October 2023.
-
Terrain-adaptive Central Pattern Generators with Reinforcement Learning for Hexapod Locomotion
Authors:
Qiyue Yang,
Yue Gao,
Shaoyuan Li
Abstract:
Inspired by biological motion generation, central pattern generators (CPGs) is frequently employed in legged robot locomotion control to produce natural gait pattern with low-dimensional control signals. However, the limited adaptability and stability over complex terrains hinder its application. To address this issue, this paper proposes a terrain-adaptive locomotion control method that incorpora…
▽ More
Inspired by biological motion generation, central pattern generators (CPGs) is frequently employed in legged robot locomotion control to produce natural gait pattern with low-dimensional control signals. However, the limited adaptability and stability over complex terrains hinder its application. To address this issue, this paper proposes a terrain-adaptive locomotion control method that incorporates deep reinforcement learning (DRL) framework into CPG, where the CPG model is responsible for the generation of synchronized signals, providing basic locomotion gait, while DRL is integrated to enhance the adaptability of robot towards uneven terrains by adjusting the parameters of CPG map** functions. The experiments conducted on the hexapod robot in Isaac Gym simulation environment demonstrated the superiority of the proposed method in terrain-adaptability, convergence rate and reward design complexity.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
Dual Radar: A Multi-modal Dataset with Dual 4D Radar for Autonomous Driving
Authors:
Xinyu Zhang,
Li Wang,
Jian Chen,
Cheng Fang,
Lei Yang,
Ziying Song,
Guangqi Yang,
Yichen Wang,
Xiaofei Zhang,
Jun Li,
Zhiwei Li,
Qingshan Yang,
Zhenlin Zhang,
Shuzhi Sam Ge
Abstract:
Radar has stronger adaptability in adverse scenarios for autonomous driving environmental perception compared to widely adopted cameras and LiDARs. Compared with commonly used 3D radars, the latest 4D radars have precise vertical resolution and higher point cloud density, making it a highly promising sensor for autonomous driving in complex environmental perception. However, due to the much higher…
▽ More
Radar has stronger adaptability in adverse scenarios for autonomous driving environmental perception compared to widely adopted cameras and LiDARs. Compared with commonly used 3D radars, the latest 4D radars have precise vertical resolution and higher point cloud density, making it a highly promising sensor for autonomous driving in complex environmental perception. However, due to the much higher noise than LiDAR, manufacturers choose different filtering strategies, resulting in an inverse ratio between noise level and point cloud density. There is still a lack of comparative analysis on which method is beneficial for deep learning-based perception algorithms in autonomous driving. One of the main reasons is that current datasets only adopt one type of 4D radar, making it difficult to compare different 4D radars in the same scene. Therefore, in this paper, we introduce a novel large-scale multi-modal dataset featuring, for the first time, two types of 4D radars captured simultaneously. This dataset enables further research into effective 4D radar perception algorithms.Our dataset consists of 151 consecutive series, most of which last 20 seconds and contain 10,007 meticulously synchronized and annotated frames. Moreover, our dataset captures a variety of challenging driving scenarios, including many road conditions, weather conditions, nighttime and daytime with different lighting intensities and periods. Our dataset annotates consecutive frames, which can be applied to 3D object detection and tracking, and also supports the study of multi-modal tasks. We experimentally validate our dataset, providing valuable results for studying different types of 4D radars. This dataset is released on https://github.com/adept-thu/Dual-Radar.
△ Less
Submitted 9 November, 2023; v1 submitted 11 October, 2023;
originally announced October 2023.
-
Learning bounded-degree polytrees with known skeleton
Authors:
Davin Choo,
Joy Qi** Yang,
Arnab Bhattacharyya,
Clément L. Canonne
Abstract:
We establish finite-sample guarantees for efficient proper learning of bounded-degree polytrees, a rich class of high-dimensional probability distributions and a subclass of Bayesian networks, a widely-studied type of graphical model. Recently, Bhattacharyya et al. (2021) obtained finite-sample guarantees for recovering tree-structured Bayesian networks, i.e., 1-polytrees. We extend their results…
▽ More
We establish finite-sample guarantees for efficient proper learning of bounded-degree polytrees, a rich class of high-dimensional probability distributions and a subclass of Bayesian networks, a widely-studied type of graphical model. Recently, Bhattacharyya et al. (2021) obtained finite-sample guarantees for recovering tree-structured Bayesian networks, i.e., 1-polytrees. We extend their results by providing an efficient algorithm which learns $d$-polytrees in polynomial time and sample complexity for any bounded $d$ when the underlying undirected graph (skeleton) is known. We complement our algorithm with an information-theoretic sample complexity lower bound, showing that the dependence on the dimension and target accuracy parameters are nearly tight.
△ Less
Submitted 21 January, 2024; v1 submitted 10 October, 2023;
originally announced October 2023.
-
ZooPFL: Exploring Black-box Foundation Models for Personalized Federated Learning
Authors:
Wang Lu,
Hao Yu,
**dong Wang,
Damien Teney,
Haohan Wang,
Yiqiang Chen,
Qiang Yang,
Xing Xie,
Xiangyang Ji
Abstract:
When personalized federated learning (FL) meets large foundation models, new challenges arise from various limitations in resources. In addition to typical limitations such as data, computation, and communication costs, access to the models is also often limited. This paper endeavors to solve both the challenges of limited resources and personalization. i.e., distribution shifts between clients. T…
▽ More
When personalized federated learning (FL) meets large foundation models, new challenges arise from various limitations in resources. In addition to typical limitations such as data, computation, and communication costs, access to the models is also often limited. This paper endeavors to solve both the challenges of limited resources and personalization. i.e., distribution shifts between clients. To do so, we propose a method named ZOOPFL that uses Zeroth-Order Optimization for Personalized Federated Learning. ZOOPFL avoids direct interference with the foundation models and instead learns to adapt its inputs through zeroth-order optimization. In addition, we employ simple yet effective linear projections to remap its predictions for personalization. To reduce the computation costs and enhance personalization, we propose input surgery to incorporate an auto-encoder with low-dimensional and client-specific embeddings. We provide theoretical support for ZOOPFL to analyze its convergence. Extensive empirical experiments on computer vision and natural language processing tasks using popular foundation models demonstrate its effectiveness for FL on black-box foundation models.
△ Less
Submitted 8 October, 2023;
originally announced October 2023.
-
QuATON: Quantization Aware Training of Optical Neurons
Authors:
Hasindu Kariyawasam,
Ramith Hettiarachchi,
Quansan Yang,
Alex Matlock,
Takahiro Nambara,
Hiroyuki Kusaka,
Yuichiro Kunai,
Peter T C So,
Edward S Boyden,
Dushan Wadduwage
Abstract:
Optical processors, built with "optical neurons", can efficiently perform high-dimensional linear operations at the speed of light. Thus they are a promising avenue to accelerate large-scale linear computations. With the current advances in micro-fabrication, such optical processors can now be 3D fabricated, but with a limited precision. This limitation translates to quantization of learnable para…
▽ More
Optical processors, built with "optical neurons", can efficiently perform high-dimensional linear operations at the speed of light. Thus they are a promising avenue to accelerate large-scale linear computations. With the current advances in micro-fabrication, such optical processors can now be 3D fabricated, but with a limited precision. This limitation translates to quantization of learnable parameters in optical neurons, and should be handled during the design of the optical processor in order to avoid a model mismatch. Specifically, optical neurons should be trained or designed within the physical-constraints at a predefined quantized precision level. To address this critical issues we propose a physics-informed quantization-aware training framework. Our approach accounts for physical constraints during the training process, leading to robust designs. We demonstrate that our approach can design state of the art optical processors using diffractive networks for multiple physics based tasks despite quantized learnable parameters. We thus lay the foundation upon which improved optical processors may be 3D fabricated in the future.
△ Less
Submitted 21 March, 2024; v1 submitted 3 October, 2023;
originally announced October 2023.
-
Public verifiable measurement-only blind quantum computation based on entanglement witnesses
Authors:
Wen-Jie Liu,
Zi-Xian Li,
Wen-Bo Li,
Qi Yang
Abstract:
Recently, Sato et al. proposed an public verifiable blind quantum computation (BQC) protocol by inserting a third-party arbiter. However, it is not true public verifiable in a sense, because the arbiter is determined in advance and participates in the whole process. In this paper, a public verifiable protocol for measurement-only BQC is proposed. The fidelity between arbitrary states and the graph…
▽ More
Recently, Sato et al. proposed an public verifiable blind quantum computation (BQC) protocol by inserting a third-party arbiter. However, it is not true public verifiable in a sense, because the arbiter is determined in advance and participates in the whole process. In this paper, a public verifiable protocol for measurement-only BQC is proposed. The fidelity between arbitrary states and the graph states of 2-colorable graphs is estimated by measuring the entanglement witnesses of the graph states,so as to verify the correctness of the prepared graph states. Compared with the previous protocol, our protocol is public verifiable in the true sense by allowing other random clients to execute the public verification. It also has greater advantages in the efficiency, where the number of local measurements is O(n^3*log {n}) and graph states' copies is O(n^2*log{n}).
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
Avalon's Game of Thoughts: Battle Against Deception through Recursive Contemplation
Authors:
Shenzhi Wang,
Chang Liu,
Zilong Zheng,
Siyuan Qi,
Shuo Chen,
Qisen Yang,
Andrew Zhao,
Chaofei Wang,
Shiji Song,
Gao Huang
Abstract:
Recent breakthroughs in large language models (LLMs) have brought remarkable success in the field of LLM-as-Agent. Nevertheless, a prevalent assumption is that the information processed by LLMs is consistently honest, neglecting the pervasive deceptive or misleading information in human society and AI-generated content. This oversight makes LLMs susceptible to malicious manipulations, potentially…
▽ More
Recent breakthroughs in large language models (LLMs) have brought remarkable success in the field of LLM-as-Agent. Nevertheless, a prevalent assumption is that the information processed by LLMs is consistently honest, neglecting the pervasive deceptive or misleading information in human society and AI-generated content. This oversight makes LLMs susceptible to malicious manipulations, potentially resulting in detrimental outcomes. This study utilizes the intricate Avalon game as a testbed to explore LLMs' potential in deceptive environments. Avalon, full of misinformation and requiring sophisticated logic, manifests as a "Game-of-Thoughts". Inspired by the efficacy of humans' recursive thinking and perspective-taking in the Avalon game, we introduce a novel framework, Recursive Contemplation (ReCon), to enhance LLMs' ability to identify and counteract deceptive information. ReCon combines formulation and refinement contemplation processes; formulation contemplation produces initial thoughts and speech, while refinement contemplation further polishes them. Additionally, we incorporate first-order and second-order perspective transitions into these processes respectively. Specifically, the first-order allows an LLM agent to infer others' mental states, and the second-order involves understanding how others perceive the agent's mental state. After integrating ReCon with different LLMs, extensive experiment results from the Avalon game indicate its efficacy in aiding LLMs to discern and maneuver around deceptive information without extra fine-tuning and data. Finally, we offer a possible explanation for the efficacy of ReCon and explore the current limitations of LLMs in terms of safety, reasoning, speaking style, and format, potentially furnishing insights for subsequent research.
△ Less
Submitted 24 October, 2023; v1 submitted 2 October, 2023;
originally announced October 2023.
-
The Participatory Turn in AI Design: Theoretical Foundations and the Current State of Practice
Authors:
Fernando Delgado,
Stephen Yang,
Michael Madaio,
Qian Yang
Abstract:
Despite the growing consensus that stakeholders affected by AI systems should participate in their design, enormous variation and implicit disagreements exist among current approaches. For researchers and practitioners who are interested in taking a participatory approach to AI design and development, it remains challenging to assess the extent to which any participatory approach grants substantiv…
▽ More
Despite the growing consensus that stakeholders affected by AI systems should participate in their design, enormous variation and implicit disagreements exist among current approaches. For researchers and practitioners who are interested in taking a participatory approach to AI design and development, it remains challenging to assess the extent to which any participatory approach grants substantive agency to stakeholders. This article thus aims to ground what we dub the "participatory turn" in AI design by synthesizing existing theoretical literature on participation and through empirical investigation and critique of its current practices. Specifically, we derive a conceptual framework through synthesis of literature across technology design, political theory, and the social sciences that researchers and practitioners can leverage to evaluate approaches to participation in AI design. Additionally, we articulate empirical findings concerning the current state of participatory practice in AI design based on an analysis of recently published research and semi-structured interviews with 12 AI researchers and practitioners. We use these empirical findings to understand the current state of participatory practice and subsequently provide guidance to better align participatory goals and methods in a way that accounts for practical constraints.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
Femtosecond electron diffraction reveals local disorder and local anharmonicity in thermoelectric SnSe
Authors:
**gjun Li,
Yingpeng Qi,
Qing Yang,
Luye Yue,
Changyuan Yao,
Zi**g Chen,
Sheng Meng,
Dao Xiang,
Jianming Cao
Abstract:
The microscopic arrangement of atoms and molecules is the determining factor in how materials behave and perform. Beyond the long-range periodicity, the local disorder with local structures deviating from the average lattice structure plays a vital role in determining the physical properties of the phonon, electron and spin subsystems in crystalline functional materials. Experimentally characteriz…
▽ More
The microscopic arrangement of atoms and molecules is the determining factor in how materials behave and perform. Beyond the long-range periodicity, the local disorder with local structures deviating from the average lattice structure plays a vital role in determining the physical properties of the phonon, electron and spin subsystems in crystalline functional materials. Experimentally characterizing the 3D atomic configuration of such local disorder and correlating it with the advanced functions remain a big challenge. Time-domain evolution of the local disorder, either static or dynamical, is lost due to the characterization at equilibrium state with conventional probing techniques. With the combination of femtosecond electron diffraction, structure factor calculation and TDDFT-MD simulation, we exclusively identify the static local disorder and the local anharmonicity of it in thermoelectric SnSe. The ultrafast structural dynamics in time domain reveal a dominant static off-symmetry displacement of Sn (~0.4 angstrom) and the anharmonicity of this local disorder induces an ultrafast atomic displacement within 100 fs after photoexcitation. The microscopic picture of the local anharmonicity indicates a direct and first signature of the THz Einstein oscillators in real space. Therefore, a glass-like thermal transport channel with the local disorder, the Einstein oscillators and the local anharmonicity, updates the fundamental insight into the long-debated ultralow thermal conductivity in SnSe. The local disorder over one to a few unit cells is pervasive and indispensable in thermoelectric materials, multiferroic materials and correlated electronic materials. Our method of revealing the 3D local disorder and the local correlated interactions by ultrafast structural dynamics will inspire broad interest in construction of the structure-property relationship in material science.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
EventLFM: Event Camera integrated Fourier Light Field Microscopy for Ultrafast 3D imaging
Authors:
Ruipeng Guo,
Qianwan Yang,
Andrew S. Chang,
Guorong Hu,
Joseph Greene,
Christopher V. Gabel,
Sixian You,
Lei Tian
Abstract:
Ultrafast 3D imaging is indispensable for visualizing complex and dynamic biological processes. Conventional scanning-based techniques necessitate an inherent trade-off between acquisition speed and space-bandwidth product (SBP). Emerging single-shot 3D wide-field techniques offer a promising alternative but are bottlenecked by the synchronous readout constraints of conventional CMOS systems, thus…
▽ More
Ultrafast 3D imaging is indispensable for visualizing complex and dynamic biological processes. Conventional scanning-based techniques necessitate an inherent trade-off between acquisition speed and space-bandwidth product (SBP). Emerging single-shot 3D wide-field techniques offer a promising alternative but are bottlenecked by the synchronous readout constraints of conventional CMOS systems, thus restricting data throughput to maintain high SBP at limited frame rates. To address this, we introduce EventLFM, a straightforward and cost-effective system that overcomes these challenges by integrating an event camera with Fourier light field microscopy (LFM), a state-of-the-art single-shot 3D wide-field imaging technique. The event camera operates on a novel asynchronous readout architecture, thereby bypassing the frame rate limitations inherent to conventional CMOS systems. We further develop a simple and robust event-driven LFM reconstruction algorithm that can reliably reconstruct 3D dynamics from the unique spatiotemporal measurements captured by EventLFM. Experimental results demonstrate that EventLFM can robustly reconstruct fast-moving and rapidly blinking 3D fluorescent samples at kHz frame rates. Furthermore, we highlight EventLFM's capability for imaging of blinking neuronal signals in scattering mouse brain tissues and 3D tracking of GFP-labeled neurons in freely moving C. elegans. We believe that the combined ultrafast speed and large 3D SBP offered by EventLFM may open up new possibilities across many biomedical applications.
△ Less
Submitted 3 April, 2024; v1 submitted 1 October, 2023;
originally announced October 2023.
-
Nonlinear Multi-Carrier System with Signal Clip**: Measurement, Analysis, and Optimization
Authors:
Yuyang Du,
Liang Hao,
Yiming Lei,
Qun Yang,
Shiqi Xu
Abstract:
Signal clip** is a classic technique for reducing peak-to-average power ratio (PAPR) in orthogonal frequency division multiplexing (OFDM) systems. It has been widely applied in consumer electronic devices owing to its low complexity and high efficiency. Although clip** reduces the nonlinear distortion caused by power amplifiers (PAs), it induces additional clip** distortion. Optimizing the j…
▽ More
Signal clip** is a classic technique for reducing peak-to-average power ratio (PAPR) in orthogonal frequency division multiplexing (OFDM) systems. It has been widely applied in consumer electronic devices owing to its low complexity and high efficiency. Although clip** reduces the nonlinear distortion caused by power amplifiers (PAs), it induces additional clip** distortion. Optimizing the joint system performance with consideration of both PA nonlinearity and clip** distortion remains an open problem due to the complex PA modeling. In this paper, we analyze the PA nonlinearity through the Bessel-Fourier PA (BFPA) model and simplify its power expression using inter-modulation product (IMP) analysis. We derive expressions of the receiver signal-to-noise ratio (SNR) and system symbol error rate (SER) for the nonlinear clipped OFDM system. With the derivations, we investigate the optimal system setting to achieve the SER lower bound in a practical OFDM system that considers both PA nonlinearity and clip** distortion. The methods and results presented in this paper can serve as a useful reference for the system-level optimization of clipped OFDM systems with nonlinear PA.
△ Less
Submitted 16 February, 2024; v1 submitted 1 October, 2023;
originally announced October 2023.
-
DeformUX-Net: Exploring a 3D Foundation Backbone for Medical Image Segmentation with Depthwise Deformable Convolution
Authors:
Ho Hin Lee,
Quan Liu,
Qi Yang,
Xin Yu,
Shunxing Bao,
Yuankai Huo,
Bennett A. Landman
Abstract:
The application of 3D ViTs to medical image segmentation has seen remarkable strides, somewhat overshadowing the budding advancements in Convolutional Neural Network (CNN)-based models. Large kernel depthwise convolution has emerged as a promising technique, showcasing capabilities akin to hierarchical transformers and facilitating an expansive effective receptive field (ERF) vital for dense predi…
▽ More
The application of 3D ViTs to medical image segmentation has seen remarkable strides, somewhat overshadowing the budding advancements in Convolutional Neural Network (CNN)-based models. Large kernel depthwise convolution has emerged as a promising technique, showcasing capabilities akin to hierarchical transformers and facilitating an expansive effective receptive field (ERF) vital for dense predictions. Despite this, existing core operators, ranging from global-local attention to large kernel convolution, exhibit inherent trade-offs and limitations (e.g., global-local range trade-off, aggregating attentional features). We hypothesize that deformable convolution can be an exploratory alternative to combine all advantages from the previous operators, providing long-range dependency, adaptive spatial aggregation and computational efficiency as a foundation backbone. In this work, we introduce 3D DeformUX-Net, a pioneering volumetric CNN model that adeptly navigates the shortcomings traditionally associated with ViTs and large kernel convolution. Specifically, we revisit volumetric deformable convolution in depth-wise setting to adapt long-range dependency with computational efficiency. Inspired by the concepts of structural re-parameterization for convolution kernel weights, we further generate the deformable tri-planar offsets by adapting a parallel branch (starting from $1\times1\times1$ convolution), providing adaptive spatial aggregation across all channels. Our empirical evaluations reveal that the 3D DeformUX-Net consistently outperforms existing state-of-the-art ViTs and large kernel convolution models across four challenging public datasets, spanning various scales from organs (KiTS: 0.680 to 0.720, MSD Pancreas: 0.676 to 0.717, AMOS: 0.871 to 0.902) to vessels (e.g., MSD hepatic vessels: 0.635 to 0.671) in mean Dice.
△ Less
Submitted 3 October, 2023; v1 submitted 29 September, 2023;
originally announced October 2023.
-
Semantic Communication with Probability Graph: A Joint Communication and Computation Design
Authors:
Zhouxiang Zhao,
Zhaohui Yang,
Quoc-Viet Pham,
Qianqian Yang,
Zhaoyang Zhang
Abstract:
In this paper, we present a probability graph-based semantic information compression system for scenarios where the base station (BS) and the user share common background knowledge. We employ probability graphs to represent the shared knowledge between the communicating parties. During the transmission of specific text data, the BS first extracts semantic information from the text, which is repres…
▽ More
In this paper, we present a probability graph-based semantic information compression system for scenarios where the base station (BS) and the user share common background knowledge. We employ probability graphs to represent the shared knowledge between the communicating parties. During the transmission of specific text data, the BS first extracts semantic information from the text, which is represented by a knowledge graph. Subsequently, the BS omits certain relational information based on the shared probability graph to reduce the data size. Upon receiving the compressed semantic data, the user can automatically restore missing information using the shared probability graph and predefined rules. This approach brings additional computational resource consumption while effectively reducing communication resource consumption. Considering the limitations of wireless resources, we address the problem of joint communication and computation resource allocation design, aiming at minimizing the total communication and computation energy consumption of the network while adhering to latency, transmit power, and semantic constraints. Simulation results demonstrate the effectiveness of the proposed system.
△ Less
Submitted 5 October, 2023; v1 submitted 16 September, 2023;
originally announced October 2023.
-
Quantum Privacy-preserving Two-party Circle Intersection Protocol Based on Phase-encoded Query
Authors:
Zi-Xian Li,
Qi Yang,
Bao Feng,
Wen-Jie Liu
Abstract:
Privacy-preserving geometric intersection (PGI) is an important issue in Secure multiparty computation (SMC). The existing quantum PGI protocols are mainly based on grid coding, which requires a lot of computational complexity. The phase-encoded query method which has been used in some Quantum SMC protocols is suitable to solve the decision problem, but it needs to apply high dimensional Oracle op…
▽ More
Privacy-preserving geometric intersection (PGI) is an important issue in Secure multiparty computation (SMC). The existing quantum PGI protocols are mainly based on grid coding, which requires a lot of computational complexity. The phase-encoded query method which has been used in some Quantum SMC protocols is suitable to solve the decision problem, but it needs to apply high dimensional Oracle operators. In this paper, we use the principle of phase-encoded query to solve an important PGI problem, namely privacy-preserving two-party circle intersection. We study the implementation of Oracle operator in detail, and achieve polynomial computational complexity by decompsing it into quantum arithmetic operations. Performance analysis shows that our protocol is correct and efficient, and can protect the privacy of all participants against internal and external attacks.
△ Less
Submitted 29 September, 2023;
originally announced September 2023.
-
UAV-assisted Semantic Communication with Hybrid Action Reinforcement Learning
Authors:
Peiyuan Si,
Jun Zhao,
Kwok-Yan Lam,
Qing Yang
Abstract:
In this paper, we aim to explore the use of uplink semantic communications with the assistance of UAV in order to improve data collection effiicency for metaverse users in remote areas. To reduce the time for uplink data collection while balancing the trade-off between reconstruction quality and computational energy cost, we propose a hybrid action reinforcement learning (RL) framework to make dec…
▽ More
In this paper, we aim to explore the use of uplink semantic communications with the assistance of UAV in order to improve data collection effiicency for metaverse users in remote areas. To reduce the time for uplink data collection while balancing the trade-off between reconstruction quality and computational energy cost, we propose a hybrid action reinforcement learning (RL) framework to make decisions on semantic model scale, channel allocation, transmission power, and UAV trajectory. The variables are classified into discrete type and continuous type, which are optimized by two different RL agents to generate the combined action. Simulation results indicate that the proposed hybrid action reinforcement learning framework can effectively improve the efficiency of uplink semantic data collection under different parameter settings and outperforms the benchmark scenarios.
△ Less
Submitted 1 December, 2023; v1 submitted 18 August, 2023;
originally announced September 2023.
-
Results on Elastic Cross Sections in Proton-Proton Collisions at $\sqrt{s} = 510$ GeV with the STAR Detector at RHIC
Authors:
STAR Collaboration,
M. I. Abdulhamid,
B. E. Aboona,
J. Adam,
L. Adamczyk,
J. R. Adams,
I. Aggarwal,
M. M. Aggarwal,
Z. Ahammed,
E. C. Aschenauer,
S. Aslam,
J. Atchison,
V. Bairathi,
J. G. Ball Cap,
K. Barish,
R. Bellwied,
P. Bhagat,
A. Bhasin,
S. Bhatta,
S. R. Bhosale,
J. Bielcik,
J. Bielcikova,
J. D. Brandenburg,
C. Broodo,
X. Z. Cai
, et al. (343 additional authors not shown)
Abstract:
We report results on an elastic cross section measurement in proton-proton collisions at a center-of-mass energy $\sqrt{s}=510$ GeV, obtained with the Roman Pot setup of the STAR experiment at the Relativistic Heavy Ion Collider (RHIC). The elastic differential cross section is measured in the four-momentum transfer squared range $0.23 \leq -t \leq 0.67$ GeV$^2$. We find that a constant slope $B$…
▽ More
We report results on an elastic cross section measurement in proton-proton collisions at a center-of-mass energy $\sqrt{s}=510$ GeV, obtained with the Roman Pot setup of the STAR experiment at the Relativistic Heavy Ion Collider (RHIC). The elastic differential cross section is measured in the four-momentum transfer squared range $0.23 \leq -t \leq 0.67$ GeV$^2$. We find that a constant slope $B$ does not fit the data in the aforementioned $t$ range, and we obtain a much better fit using a second-order polynomial for $B(t)$. The $t$ dependence of $B$ is determined using six subintervals of $t$ in the STAR measured $t$ range, and is in good agreement with the phenomenological models. The measured elastic differential cross section $\mathrm{d}σ/\mathrm{dt}$ agrees well with the results obtained at $\sqrt{s} = 546$ GeV for proton--antiproton collisions by the UA4 experiment. We also determine that the integrated elastic cross section within the STAR $t$-range is $σ^\mathrm{fid}_\mathrm{el} = 462.1 \pm 0.9 (\mathrm{stat.}) \pm 1.1 (\mathrm {syst.}) \pm 11.6 (\mathrm {scale})$~$μ\mathrm{b}$.
△ Less
Submitted 6 May, 2024; v1 submitted 28 September, 2023;
originally announced September 2023.
-
On symbology and differential equations of Feynman integrals from Schubert analysis
Authors:
Song He,
Xuhang Jiang,
Jiahao Liu,
Qinglin Yang
Abstract:
We take the first step in generalizing the so-called "Schubert analysis", originally proposed in twistor space for four-dimensional kinematics, to the study of symbol letters and more detailed information on canonical differential equations for Feynman integral families in general dimensions with general masses. The basic idea is to work in embedding space and compute possible cross-ratios built f…
▽ More
We take the first step in generalizing the so-called "Schubert analysis", originally proposed in twistor space for four-dimensional kinematics, to the study of symbol letters and more detailed information on canonical differential equations for Feynman integral families in general dimensions with general masses. The basic idea is to work in embedding space and compute possible cross-ratios built from (Lorentz products of) maximal cut solutions for all integrals in the family. We demonstrate the power of the method using the most general one-loop integrals, as well as various two-loop planar integral families (such as sunrise, double-triangle and double-box) in general dimensions. Not only can we obtain all symbol letters as cross-ratios from maximal-cut solutions, but we also reproduce entries in the canonical differential equations satisfied by a basis of dlog integrals.
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
Influence Pathway Discovery on Social Media
Authors:
Xinyi Liu,
Ruijie Wang,
Dachun Sun,
**ning Li,
Christina Youn,
You Lyu,
Jianyuan Zhan,
Dayou Wu,
Xinhe Xu,
Mingjun Liu,
Xinshuo Lei,
Zhihao Xu,
Yutong Zhang,
Zehao Li,
Qikai Yang,
Tarek Abdelzaher
Abstract:
This paper addresses influence pathway discovery, a key emerging problem in today's online media. We propose a discovery algorithm that leverages recently published work on unsupervised interpretable ideological embedding, a map** of ideological beliefs (done in a self-supervised fashion) into interpretable low-dimensional spaces. Computing the ideological embedding at scale allows one to analyz…
▽ More
This paper addresses influence pathway discovery, a key emerging problem in today's online media. We propose a discovery algorithm that leverages recently published work on unsupervised interpretable ideological embedding, a map** of ideological beliefs (done in a self-supervised fashion) into interpretable low-dimensional spaces. Computing the ideological embedding at scale allows one to analyze correlations between the ideological positions of leaders, influencers, news portals, or population segments, deriving potential influence pathways. The work is motivated by the importance of social media as the preeminent means for global interactions and collaborations on today's Internet, as well as their frequent (mis-)use to wield influence that targets social beliefs and attitudes of selected populations. Tools that enable the understanding and map** of influence propagation through population segments on social media are therefore increasingly important. In this paper, influence is measured by the perceived ideological shift over time that is correlated with influencers' activity. Correlated shifts in ideological embeddings indicate changes, such as swings/switching (among competing ideologies), polarization (depletion of neutral ideological positions), escalation/radicalization (shifts to more extreme versions of the ideology), or unification/cooldown (shifts towards more neutral stances). Case-studies are presented to explore selected influence pathways (i) in a recent French election, (ii) during political discussions in the Philippines, and (iii) for some Russian messaging during the Russia/Ukraine conflict.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
SJTU-TMQA: A quality assessment database for static mesh with texture map
Authors:
Bingyang Cui,
Qi Yang,
Kaifa Yang,
Yiling Xu,
Xiaozhong Xu,
Shan Liu
Abstract:
In recent years, static meshes with texture maps have become one of the most prevalent digital representations of 3D shapes in various applications, such as animation, gaming, medical imaging, and cultural heritage applications. However, little research has been done on the quality assessment of textured meshes, which hinders the development of quality-oriented applications, such as mesh compressi…
▽ More
In recent years, static meshes with texture maps have become one of the most prevalent digital representations of 3D shapes in various applications, such as animation, gaming, medical imaging, and cultural heritage applications. However, little research has been done on the quality assessment of textured meshes, which hinders the development of quality-oriented applications, such as mesh compression and enhancement. In this paper, we create a large-scale textured mesh quality assessment database, namely SJTU-TMQA, which includes 21 reference meshes and 945 distorted samples. The meshes are rendered into processed video sequences and then conduct subjective experiments to obtain mean opinion scores (MOS). The diversity of content and accuracy of MOS has been shown to validate its heterogeneity and reliability. The impact of various types of distortion on human perception is demonstrated. 13 state-of-the-art objective metrics are evaluated on SJTU-TMQA. The results report the highest correlation of around 0.6, indicating the need for more effective objective metrics. The SJTU-TMQA is available at https://ccccby.github.io
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
IAIFNet: An Illumination-Aware Infrared and Visible Image Fusion Network
Authors:
Qiao Yang,
Yu Zhang,
Zi**g Zhao,
Jian Zhang,
Shunli Zhang
Abstract:
Infrared and visible image fusion (IVIF) is used to generate fusion images with comprehensive features of both images, which is beneficial for downstream vision tasks. However, current methods rarely consider the illumination condition in low-light environments, and the targets in the fused images are often not prominent. To address the above issues, we propose an Illumination-Aware Infrared and V…
▽ More
Infrared and visible image fusion (IVIF) is used to generate fusion images with comprehensive features of both images, which is beneficial for downstream vision tasks. However, current methods rarely consider the illumination condition in low-light environments, and the targets in the fused images are often not prominent. To address the above issues, we propose an Illumination-Aware Infrared and Visible Image Fusion Network, named as IAIFNet. In our framework, an illumination enhancement network first estimates the incident illumination maps of input images. Afterwards, with the help of proposed adaptive differential fusion module (ADFM) and salient target aware module (STAM), an image fusion network effectively integrates the salient features of the illumination-enhanced infrared and visible images into a fusion image of high visual quality. Extensive experimental results verify that our method outperforms five state-of-the-art methods of fusing infrared and visible images.
△ Less
Submitted 26 May, 2024; v1 submitted 26 September, 2023;
originally announced September 2023.
-
SSPFusion: A Semantic Structure-Preserving Approach for Infrared and Visible Image Fusion
Authors:
Qiao Yang,
Yu Zhang,
Jian Zhang,
Zi**g Zhao,
Shunli Zhang,
**qiao Wang,
Junzhe Chen
Abstract:
Most existing learning-based infrared and visible image fusion (IVIF) methods exhibit massive redundant information in the fusion images, i.e., yielding edge-blurring effect or unrecognizable for object detectors. To alleviate these issues, we propose a semantic structure-preserving approach for IVIF, namely SSPFusion. At first, we design a Structural Feature Extractor (SFE) to extract the structu…
▽ More
Most existing learning-based infrared and visible image fusion (IVIF) methods exhibit massive redundant information in the fusion images, i.e., yielding edge-blurring effect or unrecognizable for object detectors. To alleviate these issues, we propose a semantic structure-preserving approach for IVIF, namely SSPFusion. At first, we design a Structural Feature Extractor (SFE) to extract the structural features of infrared and visible images. Then, we introduce a multi-scale Structure-Preserving Fusion (SPF) module to fuse the structural features of infrared and visible images, while maintaining the consistency of semantic structures between the fusion and source images. Owing to these two effective modules, our method is able to generate high-quality fusion images from pairs of infrared and visible images, which can boost the performance of downstream computer-vision tasks. Experimental results on three benchmarks demonstrate that our method outperforms eight state-of-the-art image fusion methods in terms of both qualitative and quantitative evaluations. The code for our method, along with additional comparison results, will be made available at: https://github.com/QiaoYang-CV/SSPFUSION.
△ Less
Submitted 26 December, 2023; v1 submitted 26 September, 2023;
originally announced September 2023.
-
Longitudinal and transverse spin transfer to $Λ$ and $\overlineΛ$ hyperons in polarized $p$+$p$ collisions at $\sqrt{s} = 200$ GeV
Authors:
STAR Collaboration,
M. I. Abdulhamid,
B. E. Aboona,
J. Adam,
L. Adamczyk,
J. R. Adams,
I. Aggarwal,
M. M. Aggarwal,
Z. Ahammed,
D. M. Anderson,
E. C. Aschenauer,
S. Aslam,
J. Atchison,
V. Bairathi,
W. Baker,
J. G. Ball Cap,
K. Barish,
R. Bellwied,
P. Bhagat,
A. Bhasin,
S. Bhatta,
J. Bielcik,
J. Bielcikova,
J. D. Brandenburg,
X. Z. Cai
, et al. (357 additional authors not shown)
Abstract:
The longitudinal and transverse spin transfers to $Λ$ ($\overlineΛ$) hyperons in polarized proton-proton collisions are expected to be sensitive to the helicity and transversity distributions, respectively, of (anti-)strange quarks in the proton, and to the corresponding polarized fragmentation functions. We report improved measurements of the longitudinal spin transfer coefficient, $D_{LL}$, and…
▽ More
The longitudinal and transverse spin transfers to $Λ$ ($\overlineΛ$) hyperons in polarized proton-proton collisions are expected to be sensitive to the helicity and transversity distributions, respectively, of (anti-)strange quarks in the proton, and to the corresponding polarized fragmentation functions. We report improved measurements of the longitudinal spin transfer coefficient, $D_{LL}$, and the transverse spin transfer coefficient, $D_{TT}$, to $Λ$ and $\overlineΛ$ in polarized proton-proton collisions at $\sqrt{s}$ = 200 GeV by the STAR experiment at RHIC. The data set includes longitudinally polarized proton-proton collisions with an integrated luminosity of 52 pb$^{-1}$, and transversely polarized proton-proton collisions with a similar integrated luminosity. Both data sets have about twice the statistics of previous results and cover a kinematic range of $|η_{Λ(\overlineΛ)}|$ $<$ 1.2 and transverse momentum $p_{T,{Λ(\overlineΛ)}}$ up to 8 GeV/$c$. We also report the first measurements of the hyperon spin transfer coefficients $D_{LL}$ and $D_{TT}$ as a function of the fractional jet momentum $z$ carried by the hyperon, which can provide more direct constraints on the polarized fragmentation functions.
△ Less
Submitted 7 December, 2023; v1 submitted 25 September, 2023;
originally announced September 2023.
-
SAMN: A Sample Attention Memory Network Combining SVM and NN in One Architecture
Authors:
Qiaoling Yang,
Linkai Luo,
Haoyu Zhang,
Hong Peng,
Ziyang Chen
Abstract:
Support vector machine (SVM) and neural networks (NN) have strong complementarity. SVM focuses on the inner operation among samples while NN focuses on the operation among the features within samples. Thus, it is promising and attractive to combine SVM and NN, as it may provide a more powerful function than SVM or NN alone. However, current work on combining them lacks true integration. To address…
▽ More
Support vector machine (SVM) and neural networks (NN) have strong complementarity. SVM focuses on the inner operation among samples while NN focuses on the operation among the features within samples. Thus, it is promising and attractive to combine SVM and NN, as it may provide a more powerful function than SVM or NN alone. However, current work on combining them lacks true integration. To address this, we propose a sample attention memory network (SAMN) that effectively combines SVM and NN by incorporating sample attention module, class prototypes, and memory block to NN. SVM can be viewed as a sample attention machine. It allows us to add a sample attention module to NN to implement the main function of SVM. Class prototypes are representatives of all classes, which can be viewed as alternatives to support vectors. The memory block is used for the storage and update of class prototypes. Class prototypes and memory block effectively reduce the computational cost of sample attention and make SAMN suitable for multi-classification tasks. Extensive experiments show that SAMN achieves better classification performance than single SVM or single NN with similar parameter sizes, as well as the previous best model for combining SVM and NN. The sample attention mechanism is a flexible module that can be easily deepened and incorporated into neural networks that require it.
△ Less
Submitted 25 September, 2023;
originally announced September 2023.
-
Reaction plane correlated triangular flow in Au+Au collisions at $\sqrt{s_{NN}}=3$ GeV
Authors:
STAR Collaboration,
M. I. Abdulhamid,
B. E. Aboona,
J. Adam,
L. Adamczyk,
J. R. Adams,
I. Aggarwal,
M. M. Aggarwal,
Z. Ahammed,
E. C. Aschenauer,
S. Aslam,
J. Atchison,
V. Bairathi,
J. G. Ball Cap,
K. Barish,
R. Bellwied,
P. Bhagat,
A. Bhasin,
S. Bhatta,
S. R. Bhosale,
J. Bielcik,
J. Bielcikova,
J. D. Brandenburg,
C. Broodo,
X. Z. Cai
, et al. (341 additional authors not shown)
Abstract:
We measure triangular flow relative to the reaction plane at 3 GeV center-of-mass energy in Au+Au collisions at the BNL Relativistic Heavy Ion Collider. A significant $v_3$ signal for protons is observed, which increases for higher rapidity, higher transverse momentum, and more peripheral collisions. The triangular flow is essentially rapidity-odd with a slope at mid-rapidity, $dv_3/dy|_{(y=0)}$,…
▽ More
We measure triangular flow relative to the reaction plane at 3 GeV center-of-mass energy in Au+Au collisions at the BNL Relativistic Heavy Ion Collider. A significant $v_3$ signal for protons is observed, which increases for higher rapidity, higher transverse momentum, and more peripheral collisions. The triangular flow is essentially rapidity-odd with a slope at mid-rapidity, $dv_3/dy|_{(y=0)}$, opposite in sign compared to the slope for directed flow. No significant $v_3$ signal is observed for charged pions and kaons. Comparisons with models suggest that a mean field potential is required to describe these results, and that the triangular shape of the participant nucleons is the result of stop** and nuclear geometry.
△ Less
Submitted 19 April, 2024; v1 submitted 21 September, 2023;
originally announced September 2023.
-
NeighViz: Towards Better Understanding of Neighborhood Effects on Social Groups with Spatial Data
Authors:
Yue Yu,
Yifang Wang,
Qisen Yang,
Di Weng,
Yongjun Zhang,
Xiaogang Wu,
Yingcai Wu,
Huamin Qu
Abstract:
Understanding how local environments influence individual behaviors, such as voting patterns or suicidal tendencies, is crucial in social science to reveal and reduce spatial disparities and promote social well-being. With the increasing availability of large-scale individual-level census data, new analytical opportunities arise for social scientists to explore human behaviors (e.g., political eng…
▽ More
Understanding how local environments influence individual behaviors, such as voting patterns or suicidal tendencies, is crucial in social science to reveal and reduce spatial disparities and promote social well-being. With the increasing availability of large-scale individual-level census data, new analytical opportunities arise for social scientists to explore human behaviors (e.g., political engagement) among social groups at a fine-grained level. However, traditional statistical methods mostly focus on global, aggregated spatial correlations, which are limited to understanding and comparing the impact of local environments (e.g., neighborhoods) on human behaviors among social groups. In this study, we introduce a new analytical framework for analyzing multi-variate neighborhood effects between social groups. We then propose NeighVi, an interactive visual analytics system that helps social scientists explore, understand, and verify the influence of neighborhood effects on human behaviors. Finally, we use a case study to illustrate the effectiveness and usability of our system.
△ Less
Submitted 20 September, 2023;
originally announced September 2023.
-
Deep conditional generative models for longitudinal single-slice abdominal computed tomography harmonization
Authors:
Xin Yu,
Qi Yang,
Yucheng Tang,
Riqiang Gao,
Shunxing Bao,
Leon Y. Cai,
Ho Hin Lee,
Yuankai Huo,
Ann Zenobia Moore,
Luigi Ferrucci,
Bennett A. Landman
Abstract:
Two-dimensional single-slice abdominal computed tomography (CT) provides a detailed tissue map with high resolution allowing quantitative characterization of relationships between health conditions and aging. However, longitudinal analysis of body composition changes using these scans is difficult due to positional variation between slices acquired in different years, which leading to different or…
▽ More
Two-dimensional single-slice abdominal computed tomography (CT) provides a detailed tissue map with high resolution allowing quantitative characterization of relationships between health conditions and aging. However, longitudinal analysis of body composition changes using these scans is difficult due to positional variation between slices acquired in different years, which leading to different organs/tissues captured. To address this issue, we propose C-SliceGen, which takes an arbitrary axial slice in the abdominal region as a condition and generates a pre-defined vertebral level slice by estimating structural changes in the latent space. Our experiments on 2608 volumetric CT data from two in-house datasets and 50 subjects from the 2015 Multi-Atlas Abdomen Labeling Challenge dataset (BTCV) Challenge demonstrate that our model can generate high-quality images that are realistic and similar. We further evaluate our method's capability to harmonize longitudinal positional variation on 1033 subjects from the Baltimore Longitudinal Study of Aging (BLSA) dataset, which contains longitudinal single abdominal slices, and confirmed that our method can harmonize the slice positional variance in terms of visceral fat area. This approach provides a promising direction for map** slices from different vertebral levels to a target slice and reducing positional variance for single-slice longitudinal analysis. The source code is available at: https://github.com/MASILab/C-SliceGen.
△ Less
Submitted 17 September, 2023;
originally announced September 2023.
-
Listening for the Axion Echo with the 21 CentiMeter Array
Authors:
Ariel Arza,
Quan Guo,
Lei Wu,
Qiaoli Yang,
Xiaolong Yang,
Qiang Yuan,
Bin Zhu
Abstract:
The axion is a hypothetical elementary particle that could solve the long-standing strong CP problem in particle physics and the dark matter mystery in the cosmos. Due to the stimulation of the ambient photons, the axion dark matter decay into photons is significantly enhanced so that its echo signal could be detected by terrestrial telescopes. As a pathfinder, we study the expected sensitivity of…
▽ More
The axion is a hypothetical elementary particle that could solve the long-standing strong CP problem in particle physics and the dark matter mystery in the cosmos. Due to the stimulation of the ambient photons, the axion dark matter decay into photons is significantly enhanced so that its echo signal could be detected by terrestrial telescopes. As a pathfinder, we study the expected sensitivity of searching for the axion dark matter in the mass range between $0.41$ and $1.6μ\text{eV}$ with the 21 CentiMeter Array (21CMA). We aim to cover the whole 21CMA frequency range in two years by using a 1MW emitter. We find that the resulting sensitivity on the axion-photon coupling could surpass other existing limits by about one order of magnitude.
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
DCTTS: Discrete Diffusion Model with Contrastive Learning for Text-to-speech Generation
Authors:
Zhichao Wu,
Qiulin Li,
Sixing Liu,
Qun Yang
Abstract:
In the Text-to-speech(TTS) task, the latent diffusion model has excellent fidelity and generalization, but its expensive resource consumption and slow inference speed have always been a challenging. This paper proposes Discrete Diffusion Model with Contrastive Learning for Text-to-Speech Generation(DCTTS). The following contributions are made by DCTTS: 1) The TTS diffusion model based on discrete…
▽ More
In the Text-to-speech(TTS) task, the latent diffusion model has excellent fidelity and generalization, but its expensive resource consumption and slow inference speed have always been a challenging. This paper proposes Discrete Diffusion Model with Contrastive Learning for Text-to-Speech Generation(DCTTS). The following contributions are made by DCTTS: 1) The TTS diffusion model based on discrete space significantly lowers the computational consumption of the diffusion model and improves sampling speed; 2) The contrastive learning method based on discrete space is used to enhance the alignment connection between speech and text and improve sampling quality; and 3) It uses an efficient text encoder to simplify the model's parameters and increase computational efficiency. The experimental results demonstrate that the approach proposed in this paper has outstanding speech synthesis quality and sampling speed while significantly reducing the resource consumption of diffusion model. The synthesized samples are available at https://github.com/lawtherWu/DCTTS.
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
Work Statistics and Adiabatic Assumption in Nonequilibrium Many-Body Theory
Authors:
Yi Zuo,
Qinghong Yang,
Bang-Gui Liu,
Dong E Liu
Abstract:
Keldysh field theory, based on adiabatic assumptions, serves as an widely used framework for addressing nonequilibrium many-body systems. Nonetheless, the validity of such adiabatic assumptions when addressing interacting Gibbs states remains a topic of contention. We use the knowledge of work statistics developed in nonequilibrium thermodynamics to study this problem. Consequently, we deduce a un…
▽ More
Keldysh field theory, based on adiabatic assumptions, serves as an widely used framework for addressing nonequilibrium many-body systems. Nonetheless, the validity of such adiabatic assumptions when addressing interacting Gibbs states remains a topic of contention. We use the knowledge of work statistics developed in nonequilibrium thermodynamics to study this problem. Consequently, we deduce a universal theorem delineating the characteristics of evolutions that transition an initial Gibbs state to another. Based on this theorem, we analytically ascertain that adiabatic evolutions fail to transition a non-interacting Gibbs state to its interacting counterpart. However, this adiabatic approach remains a superior approximation relative to its non-adiabatic counterpart. Numerics verifying our theory and predictions are also provided. Furthermore, our findings render insights into the preparation of Gibbs states within the domain of quantum computation.
△ Less
Submitted 21 September, 2023; v1 submitted 12 September, 2023;
originally announced September 2023.
-
Superfolded configuration induced low thermal conductivity in two-dimensional carbon allotropes revealed via machine learning force constant potential
Authors:
Linfeng Yu,
Kexin Dong,
Qi Yang,
Yi Zhang,
Xiong Zheng,
Huimin Wang,
Zhenzhen Qin,
Guangzhao Qin
Abstract:
Understanding the fundamental link between structure and functionalization is crucial for the design and optimization of functional materials, since different structural configurations could trigger materials to demonstrate diverse physical, chemical, and electronic properties. However, the correlation between crystal structure and thermal conductivity (\k{appa}) remains enigmatic. In this study,…
▽ More
Understanding the fundamental link between structure and functionalization is crucial for the design and optimization of functional materials, since different structural configurations could trigger materials to demonstrate diverse physical, chemical, and electronic properties. However, the correlation between crystal structure and thermal conductivity (\k{appa}) remains enigmatic. In this study, taking two-dimensional (2D) carbon allotropes as study cases, we utilize phonon Boltzmann transport equation (BTE) along with machine learning force constant potential to thoroughly explore the complex folding structure of pure sp2 hybridized carbon materials from the perspective of crystal structure, mode-level phonon resolved thermal transport, and atomic interactions, with the goal of identifying the underlying relationship between 2D geometry and \k{appa}. We propose two potential structure evolution mechanisms for targeted thermal transport properties: in-plane and out-of-plane folding evolutions, which are generally applicable to 2D carbon allotropes. It is revealed that the folded structure produces strong symmetry breaking, and simultaneously produces exceptionally strongly suppressed phonon group velocities, strong phonon-phonon scattering, and weak phonon hydrodynamics, which ultimately lead to low \k{appa}. The insight into the folded effect of atomic structures on thermal transport deepens our understanding of the relationship between structure and functionalization, which offers straightforward guidance for designing novel nanomaterials with targeted \k{appa}, as well as propel developments in materials science and engineering.
△ Less
Submitted 28 January, 2024; v1 submitted 11 September, 2023;
originally announced September 2023.
-
Efficient Link Prediction in Continuous-Time Dynamic Networks using Optimal Transmission and Metropolis Hastings Sampling
Authors:
Ruizhi Zhang,
Wei Wei,
Qiming Yang,
Zhenyu Shi,
Xiangnan Feng,
Zhiming Zheng
Abstract:
Efficient link prediction in continuous-time dynamic networks is a challenging problem that has attracted much research attention in recent years. A widely used approach to dynamic network link prediction is to extract the local structure of the target link through temporal random walk on the network and learn node features using a coding model. However, this approach often assumes that candidate…
▽ More
Efficient link prediction in continuous-time dynamic networks is a challenging problem that has attracted much research attention in recent years. A widely used approach to dynamic network link prediction is to extract the local structure of the target link through temporal random walk on the network and learn node features using a coding model. However, this approach often assumes that candidate temporal neighbors follow some certain types of distributions, which may be inappropriate for real-world networks, thereby incurring information loss. To address this limitation, we propose a framework in continuous-time dynamic networks based on Optimal Transmission (OT) and Metropolis Hastings (MH) sampling (COM). Specifically, we use optimal transmission theory to calculate the Wasserstein distance between the current node and the time-valid candidate neighbors to minimize information loss in node information propagation. Additionally, we employ the MH algorithm to obtain higher-order structural relationships in the vicinity of the target link, as it is a Markov Chain Monte Carlo method and can flexibly simulate target distributions with complex patterns. We demonstrate the effectiveness of our proposed method through experiments on eight datasets from different fields.
△ Less
Submitted 10 September, 2023;
originally announced September 2023.
-
From Text to Mask: Localizing Entities Using the Attention of Text-to-Image Diffusion Models
Authors:
Changming Xiao,
Qi Yang,
Feng Zhou,
Changshui Zhang
Abstract:
Diffusion models have revolted the field of text-to-image generation recently. The unique way of fusing text and image information contributes to their remarkable capability of generating highly text-related images. From another perspective, these generative models imply clues about the precise correlation between words and pixels. In this work, a simple but effective method is proposed to utilize…
▽ More
Diffusion models have revolted the field of text-to-image generation recently. The unique way of fusing text and image information contributes to their remarkable capability of generating highly text-related images. From another perspective, these generative models imply clues about the precise correlation between words and pixels. In this work, a simple but effective method is proposed to utilize the attention mechanism in the denoising network of text-to-image diffusion models. Without re-training nor inference-time optimization, the semantic grounding of phrases can be attained directly. We evaluate our method on Pascal VOC 2012 and Microsoft COCO 2014 under weakly-supervised semantic segmentation setting and our method achieves superior performance to prior methods. In addition, the acquired word-pixel correlation is found to be generalizable for the learned text embedding of customized generation methods, requiring only a few modifications. To validate our discovery, we introduce a new practical task called "personalized referring image segmentation" with a new dataset. Experiments in various situations demonstrate the advantages of our method compared to strong baselines on this task. In summary, our work reveals a novel way to extract the rich multi-modal knowledge hidden in diffusion models for segmentation.
△ Less
Submitted 8 September, 2023;
originally announced September 2023.
-
Enhancing Hierarchical Transformers for Whole Brain Segmentation with Intracranial Measurements Integration
Authors:
Xin Yu,
Yucheng Tang,
Qi Yang,
Ho Hin Lee,
Shunxing Bao,
Yuankai Huo,
Bennett A. Landman
Abstract:
Whole brain segmentation with magnetic resonance imaging (MRI) enables the non-invasive measurement of brain regions, including total intracranial volume (TICV) and posterior fossa volume (PFV). Enhancing the existing whole brain segmentation methodology to incorporate intracranial measurements offers a heightened level of comprehensiveness in the analysis of brain structures. Despite its potentia…
▽ More
Whole brain segmentation with magnetic resonance imaging (MRI) enables the non-invasive measurement of brain regions, including total intracranial volume (TICV) and posterior fossa volume (PFV). Enhancing the existing whole brain segmentation methodology to incorporate intracranial measurements offers a heightened level of comprehensiveness in the analysis of brain structures. Despite its potential, the task of generalizing deep learning techniques for intracranial measurements faces data availability constraints due to limited manually annotated atlases encompassing whole brain and TICV/PFV labels. In this paper, we enhancing the hierarchical transformer UNesT for whole brain segmentation to achieve segmenting whole brain with 133 classes and TICV/PFV simultaneously. To address the problem of data scarcity, the model is first pretrained on 4859 T1-weighted (T1w) 3D volumes sourced from 8 different sites. These volumes are processed through a multi-atlas segmentation pipeline for label generation, while TICV/PFV labels are unavailable. Subsequently, the model is finetuned with 45 T1w 3D volumes from Open Access Series Imaging Studies (OASIS) where both 133 whole brain classes and TICV/PFV labels are available. We evaluate our method with Dice similarity coefficients(DSC). We show that our model is able to conduct precise TICV/PFV estimation while maintaining the 132 brain regions performance at a comparable level. Code and trained model are available at: https://github.com/MASILab/UNesT/tree/main/wholebrainSeg.
△ Less
Submitted 10 April, 2024; v1 submitted 7 September, 2023;
originally announced September 2023.
-
Leveraging Reward Consistency for Interpretable Feature Discovery in Reinforcement Learning
Authors:
Qisen Yang,
Huanqian Wang,
Mukun Tong,
Wenjie Shi,
Gao Huang,
Shiji Song
Abstract:
The black-box nature of deep reinforcement learning (RL) hinders them from real-world applications. Therefore, interpreting and explaining RL agents have been active research topics in recent years. Existing methods for post-hoc explanations usually adopt the action matching principle to enable an easy understanding of vision-based RL agents. In this paper, it is argued that the commonly used acti…
▽ More
The black-box nature of deep reinforcement learning (RL) hinders them from real-world applications. Therefore, interpreting and explaining RL agents have been active research topics in recent years. Existing methods for post-hoc explanations usually adopt the action matching principle to enable an easy understanding of vision-based RL agents. In this paper, it is argued that the commonly used action matching principle is more like an explanation of deep neural networks (DNNs) than the interpretation of RL agents. It may lead to irrelevant or misplaced feature attribution when different DNNs' outputs lead to the same rewards or different rewards result from the same outputs. Therefore, we propose to consider rewards, the essential objective of RL agents, as the essential objective of interpreting RL agents as well. To ensure reward consistency during interpretable feature discovery, a novel framework (RL interpreting RL, denoted as RL-in-RL) is proposed to solve the gradient disconnection from actions to rewards. We verify and evaluate our method on the Atari 2600 games as well as Duckietown, a challenging self-driving car simulator environment. The results show that our method manages to keep reward (or return) consistency and achieves high-quality feature attribution. Further, a series of analytical experiments validate our assumption of the action matching principle's limitations.
△ Less
Submitted 4 September, 2023;
originally announced September 2023.
-
Hundreds Guide Millions: Adaptive Offline Reinforcement Learning with Expert Guidance
Authors:
Qisen Yang,
Shenzhi Wang,
Qihang Zhang,
Gao Huang,
Shiji Song
Abstract:
Offline reinforcement learning (RL) optimizes the policy on a previously collected dataset without any interactions with the environment, yet usually suffers from the distributional shift problem. To mitigate this issue, a typical solution is to impose a policy constraint on a policy improvement objective. However, existing methods generally adopt a ``one-size-fits-all'' practice, i.e., kee** on…
▽ More
Offline reinforcement learning (RL) optimizes the policy on a previously collected dataset without any interactions with the environment, yet usually suffers from the distributional shift problem. To mitigate this issue, a typical solution is to impose a policy constraint on a policy improvement objective. However, existing methods generally adopt a ``one-size-fits-all'' practice, i.e., kee** only a single improvement-constraint balance for all the samples in a mini-batch or even the entire offline dataset. In this work, we argue that different samples should be treated with different policy constraint intensities. Based on this idea, a novel plug-in approach named Guided Offline RL (GORL) is proposed. GORL employs a guiding network, along with only a few expert demonstrations, to adaptively determine the relative importance of the policy improvement and policy constraint for every sample. We theoretically prove that the guidance provided by our method is rational and near-optimal. Extensive experiments on various environments suggest that GORL can be easily installed on most offline RL algorithms with statistically significant performance improvements.
△ Less
Submitted 4 September, 2023;
originally announced September 2023.
-
Critical roles of edge turbulent transport in the formation of high-field-side high-density front and density limit disruption in J-TEXT tokamak
Authors:
Peng Shi,
Yuhan Wang,
Li Gao,
Hongjuan Sun1,
Qinghu Yang,
Xin Xu,
Chengshuo Shen,
Yanqiu Chen,
Qinlin Tao,
Zhipeng Chen,
Haosheng Wu,
Lu Wang,
Zhongyong Chen,
Nengchao Wang,
Zhoujun Yang,
**gchun Li,
Yonghua Ding,
Yuan Pan,
J-TEXT team
Abstract:
This article presents an in-depth study of the sequence of events leading to density limit disruption in J-TEXT tokamak plasmas, with an emphasis on boudary turbulent transport and the high-field-side high-density (HFSHD) front. These phenomena were extensively investigated by using Langmuir probe and Polarimeter-interferometer diagnostics.
This article presents an in-depth study of the sequence of events leading to density limit disruption in J-TEXT tokamak plasmas, with an emphasis on boudary turbulent transport and the high-field-side high-density (HFSHD) front. These phenomena were extensively investigated by using Langmuir probe and Polarimeter-interferometer diagnostics.
△ Less
Submitted 1 September, 2023;
originally announced September 2023.
-
Comparing Spatial Navigation and Human Environment Interaction in Virtual Reality vs. Identical Real Environments across the Adult Lifespan
Authors:
Saleh Kalantari,
Bill Tong Xu,
Armin Mostafavi,
Anne Seoyoung Lee,
Qi Yang
Abstract:
Virtual reality (VR) is increasingly being used as a research platform for investigating human responses to environmental variables. While VR provides tremendous advantages in terms of variable isolation and manipulation, and ease of data-collection, some researchers have expressed concerns about the ecological validity of VR-based findings. In the current study we replicated a real-world, multi-l…
▽ More
Virtual reality (VR) is increasingly being used as a research platform for investigating human responses to environmental variables. While VR provides tremendous advantages in terms of variable isolation and manipulation, and ease of data-collection, some researchers have expressed concerns about the ecological validity of VR-based findings. In the current study we replicated a real-world, multi-level educational facility in VR, and compared data collected in the VR and real-world environments as participants (n=36) completed identical wayfinding tasks. We found significant differences in all of the measures used, including distance covered, number of mistakes made, time for task completion, spatial memory, extent of backtracking, observation of directional signs, perceived uncertainty levels, perceived cognitive workload, and perceived task difficulty. We also analyzed potential age-related effects to look for heightened VR/real response discrepancies among older adult participants (>55 years) compared to younger adults. This analysis yielded no significant effects of age. Finally, we examined the spatial distribution of self-reported wayfinding uncertainty across the building floorplan, finding that areas in which uncertainty was most pronounced were similar between the real-world and VR settings. Thus, participants appeared to be responding to the same environmental features in the real and VR conditions, but the extent of these responses was significantly different. Overall, the findings suggest that when VR is used to contrast varying environmental design conditions the resulting data should be interpreted cautiously and should not be generalized into real-world conclusions without further validation.
△ Less
Submitted 30 August, 2023;
originally announced August 2023.
-
TC-LIF: A Two-Compartment Spiking Neuron Model for Long-Term Sequential Modelling
Authors:
Shimin Zhang,
Qu Yang,
Chenxiang Ma,
Jibin Wu,
Haizhou Li,
Kay Chen Tan
Abstract:
The identification of sensory cues associated with potential opportunities and dangers is frequently complicated by unrelated events that separate useful cues by long delays. As a result, it remains a challenging task for state-of-the-art spiking neural networks (SNNs) to establish long-term temporal dependency between distant cues. To address this challenge, we propose a novel biologically inspir…
▽ More
The identification of sensory cues associated with potential opportunities and dangers is frequently complicated by unrelated events that separate useful cues by long delays. As a result, it remains a challenging task for state-of-the-art spiking neural networks (SNNs) to establish long-term temporal dependency between distant cues. To address this challenge, we propose a novel biologically inspired Two-Compartment Leaky Integrate-and-Fire spiking neuron model, dubbed TC-LIF. The proposed model incorporates carefully designed somatic and dendritic compartments that are tailored to facilitate learning long-term temporal dependencies. Furthermore, a theoretical analysis is provided to validate the effectiveness of TC-LIF in propagating error gradients over an extended temporal duration. Our experimental results, on a diverse range of temporal classification tasks, demonstrate superior temporal classification capability, rapid training convergence, and high energy efficiency of the proposed TC-LIF model. Therefore, this work opens up a myriad of opportunities for solving challenging temporal processing tasks on emerging neuromorphic computing systems. Our code is publicly available at https://github.com/ZhangShimin1/TC-LIF.
△ Less
Submitted 17 February, 2024; v1 submitted 25 August, 2023;
originally announced August 2023.
-
Ground-to-Aerial Person Search: Benchmark Dataset and Approach
Authors:
Shizhou Zhang,
Qingchun Yang,
De Cheng,
Yinghui Xing,
Guoqiang Liang,
Peng Wang,
Yanning Zhang
Abstract:
In this work, we construct a large-scale dataset for Ground-to-Aerial Person Search, named G2APS, which contains 31,770 images of 260,559 annotated bounding boxes for 2,644 identities appearing in both of the UAVs and ground surveillance cameras. To our knowledge, this is the first dataset for cross-platform intelligent surveillance applications, where the UAVs could work as a powerful complement…
▽ More
In this work, we construct a large-scale dataset for Ground-to-Aerial Person Search, named G2APS, which contains 31,770 images of 260,559 annotated bounding boxes for 2,644 identities appearing in both of the UAVs and ground surveillance cameras. To our knowledge, this is the first dataset for cross-platform intelligent surveillance applications, where the UAVs could work as a powerful complement for the ground surveillance cameras. To more realistically simulate the actual cross-platform Ground-to-Aerial surveillance scenarios, the surveillance cameras are fixed about 2 meters above the ground, while the UAVs capture videos of persons at different location, with a variety of view-angles, flight attitudes and flight modes. Therefore, the dataset has the following unique characteristics: 1) drastic view-angle changes between query and gallery person images from cross-platform cameras; 2) diverse resolutions, poses and views of the person images under 9 rich real-world scenarios. On basis of the G2APS benchmark dataset, we demonstrate detailed analysis about current two-step and end-to-end person search methods, and further propose a simple yet effective knowledge distillation scheme on the head of the ReID network, which achieves state-of-the-art performances on both of the G2APS and the previous two public person search datasets, i.e., PRW and CUHK-SYSU. The dataset and source code available on \url{https://github.com/yqc123456/HKD_for_person_search}.
△ Less
Submitted 24 August, 2023;
originally announced August 2023.
-
A Survey for Federated Learning Evaluations: Goals and Measures
Authors:
Di Chai,
Leye Wang,
Liu Yang,
Junxue Zhang,
Kai Chen,
Qiang Yang
Abstract:
Evaluation is a systematic approach to assessing how well a system achieves its intended purpose. Federated learning (FL) is a novel paradigm for privacy-preserving machine learning that allows multiple parties to collaboratively train models without sharing sensitive data. However, evaluating FL is challenging due to its interdisciplinary nature and diverse goals, such as utility, efficiency, and…
▽ More
Evaluation is a systematic approach to assessing how well a system achieves its intended purpose. Federated learning (FL) is a novel paradigm for privacy-preserving machine learning that allows multiple parties to collaboratively train models without sharing sensitive data. However, evaluating FL is challenging due to its interdisciplinary nature and diverse goals, such as utility, efficiency, and security. In this survey, we first review the major evaluation goals adopted in the existing studies and then explore the evaluation metrics used for each goal. We also introduce FedEval, an open-source platform that provides a standardized and comprehensive evaluation framework for FL algorithms in terms of their utility, efficiency, and security. Finally, we discuss several challenges and future research directions for FL evaluation.
△ Less
Submitted 23 March, 2024; v1 submitted 22 August, 2023;
originally announced August 2023.
-
KnowledGPT: Enhancing Large Language Models with Retrieval and Storage Access on Knowledge Bases
Authors:
Xintao Wang,
Qianwen Yang,
Yongting Qiu,
Jiaqing Liang,
Qianyu He,
Zhouhong Gu,
Yanghua Xiao,
Wei Wang
Abstract:
Large language models (LLMs) have demonstrated impressive impact in the field of natural language processing, but they still struggle with several issues regarding, such as completeness, timeliness, faithfulness and adaptability. While recent efforts have focuses on connecting LLMs with external knowledge sources, the integration of knowledge bases (KBs) remains understudied and faces several chal…
▽ More
Large language models (LLMs) have demonstrated impressive impact in the field of natural language processing, but they still struggle with several issues regarding, such as completeness, timeliness, faithfulness and adaptability. While recent efforts have focuses on connecting LLMs with external knowledge sources, the integration of knowledge bases (KBs) remains understudied and faces several challenges. In this paper, we introduce KnowledGPT, a comprehensive framework to bridge LLMs with various knowledge bases, facilitating both the retrieval and storage of knowledge. The retrieval process employs the program of thought prompting, which generates search language for KBs in code format with pre-defined functions for KB operations. Besides retrieval, KnowledGPT offers the capability to store knowledge in a personalized KB, catering to individual user demands. With extensive experiments, we show that by integrating LLMs with KBs, KnowledGPT properly answers a broader range of questions requiring world knowledge compared with vanilla LLMs, utilizing both knowledge existing in widely-known KBs and extracted into personalized KBs.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
Boundary-RL: Reinforcement Learning for Weakly-Supervised Prostate Segmentation in TRUS Images
Authors:
Weixi Yi,
Vasilis Stavrinides,
Zachary M. C. Baum,
Qianye Yang,
Dean C. Barratt,
Matthew J. Clarkson,
Yipeng Hu,
Shaheer U. Saeed
Abstract:
We propose Boundary-RL, a novel weakly supervised segmentation method that utilises only patch-level labels for training. We envision the segmentation as a boundary detection problem, rather than a pixel-level classification as in previous works. This outlook on segmentation may allow for boundary delineation under challenging scenarios such as where noise artefacts may be present within the regio…
▽ More
We propose Boundary-RL, a novel weakly supervised segmentation method that utilises only patch-level labels for training. We envision the segmentation as a boundary detection problem, rather than a pixel-level classification as in previous works. This outlook on segmentation may allow for boundary delineation under challenging scenarios such as where noise artefacts may be present within the region-of-interest (ROI) boundaries, where traditional pixel-level classification-based weakly supervised methods may not be able to effectively segment the ROI. Particularly of interest, ultrasound images, where intensity values represent acoustic impedance differences between boundaries, may also benefit from the boundary delineation approach. Our method uses reinforcement learning to train a controller function to localise boundaries of ROIs using a reward derived from a pre-trained boundary-presence classifier. The classifier indicates when an object boundary is encountered within a patch, as the controller modifies the patch location in a sequential Markov decision process. The classifier itself is trained using only binary patch-level labels of object presence, which are the only labels used during training of the entire boundary delineation framework, and serves as a weak signal to inform the boundary delineation. The use of a controller function ensures that a sliding window over the entire image is not necessary. It also prevents possible false-positive or -negative cases by minimising number of patches passed to the boundary-presence classifier. We evaluate our proposed approach for a clinically relevant task of prostate gland segmentation on trans-rectal ultrasound images. We show improved performance compared to other tested weakly supervised methods, using the same labels e.g., multiple instance learning.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
Emergence of charge density wave and superconducting phase transitions through Lorentz-invariant interactions in the Haldane-Hubbard model
Authors:
Qiao Yang,
Yu-Biao Wu,
Lin Zhuang,
Ji-Min Zhao,
Wu-Ming Liu
Abstract:
We derive Lorentz-invariant four-fermion interactions, including Nambu-Jona-Lasinio type and superconducting type, which are widely studied in high-energy physics, from the honeycomb lattice Hamiltonian with Hubbard interaction. We investigate the phase transitions induced by these two interactions and consider the effects of the chemical potential and magnetic flux (Haldane mass term) on these ph…
▽ More
We derive Lorentz-invariant four-fermion interactions, including Nambu-Jona-Lasinio type and superconducting type, which are widely studied in high-energy physics, from the honeycomb lattice Hamiltonian with Hubbard interaction. We investigate the phase transitions induced by these two interactions and consider the effects of the chemical potential and magnetic flux (Haldane mass term) on these phase transitions. We find that the charge-density-wave and superconductivity generated by the attractive interactions are mainly controlled by the chemical potential, while the magnetic flux delimits the domain of phase transition. Our analysis underscores the influence of the initial topological state on the phase transitions, a facet largely overlooked in prior studies. We present experimental protocols using cold atoms to verify our theoretical results.
△ Less
Submitted 1 June, 2024; v1 submitted 21 August, 2023;
originally announced August 2023.
-
PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation
Authors:
Hanbing Liu,
Jun-Yan He,
Zhi-Qi Cheng,
Wangmeng Xiang,
Qize Yang,
Wenhao Chai,
Gaoang Wang,
Xu Bao,
Bin Luo,
Yifeng Geng,
Xuansong Xie
Abstract:
Existing 3D human pose estimators face challenges in adapting to new datasets due to the lack of 2D-3D pose pairs in training sets. To overcome this issue, we propose \textit{Multi-Hypothesis \textbf{P}ose \textbf{Syn}thesis \textbf{D}omain \textbf{A}daptation} (\textbf{PoSynDA}) framework to bridge this data disparity gap in target domain. Typically, PoSynDA uses a diffusion-inspired structure to…
▽ More
Existing 3D human pose estimators face challenges in adapting to new datasets due to the lack of 2D-3D pose pairs in training sets. To overcome this issue, we propose \textit{Multi-Hypothesis \textbf{P}ose \textbf{Syn}thesis \textbf{D}omain \textbf{A}daptation} (\textbf{PoSynDA}) framework to bridge this data disparity gap in target domain. Typically, PoSynDA uses a diffusion-inspired structure to simulate 3D pose distribution in the target domain. By incorporating a multi-hypothesis network, PoSynDA generates diverse pose hypotheses and aligns them with the target domain. To do this, it first utilizes target-specific source augmentation to obtain the target domain distribution data from the source domain by decoupling the scale and position parameters. The process is then further refined through the teacher-student paradigm and low-rank adaptation. With extensive comparison of benchmarks such as Human3.6M and MPI-INF-3DHP, PoSynDA demonstrates competitive performance, even comparable to the target-trained MixSTE model\cite{zhang2022mixste}. This work paves the way for the practical application of 3D human pose estimation in unseen domains. The code is available at https://github.com/hbing-l/PoSynDA.
△ Less
Submitted 16 October, 2023; v1 submitted 18 August, 2023;
originally announced August 2023.
-
Learning A Coarse-to-Fine Diffusion Transformer for Image Restoration
Authors:
Liyan Wang,
Qinyu Yang,
Cong Wang,
Wei Wang,
**shan Pan,
Zhixun Su
Abstract:
Recent years have witnessed the remarkable performance of diffusion models in various vision tasks. However, for image restoration that aims to recover clear images with sharper details from given degraded observations, diffusion-based methods may fail to recover promising results due to inaccurate noise estimation. Moreover, simple constraining noises cannot effectively learn complex degradation…
▽ More
Recent years have witnessed the remarkable performance of diffusion models in various vision tasks. However, for image restoration that aims to recover clear images with sharper details from given degraded observations, diffusion-based methods may fail to recover promising results due to inaccurate noise estimation. Moreover, simple constraining noises cannot effectively learn complex degradation information, which subsequently hinders the model capacity. To solve the above problems, we propose a coarse-to-fine diffusion Transformer (C2F-DFT) for image restoration. Specifically, our C2F-DFT contains diffusion self-attention (DFSA) and diffusion feed-forward network (DFN) within a new coarse-to-fine training scheme. The DFSA and DFN respectively capture the long-range diffusion dependencies and learn hierarchy diffusion representation to facilitate better restoration. In the coarse training stage, our C2F-DFT estimates noises and then generates the final clean image by a sampling algorithm. To further improve the restoration quality, we propose a simple yet effective fine training scheme. It first exploits the coarse-trained diffusion model with fixed steps to generate restoration results, which then would be constrained with corresponding ground-truth ones to optimize the models to remedy the unsatisfactory results affected by inaccurate noise estimation. Extensive experiments show that C2F-DFT significantly outperforms diffusion-based restoration method IR-SDE and achieves competitive performance compared with Transformer-based state-of-the-art methods on $3$ tasks, including image deraining, image deblurring, and real image denoising. Code is available at https://github.com/wlydlut/C2F-DFT.
△ Less
Submitted 8 October, 2023; v1 submitted 16 August, 2023;
originally announced August 2023.
-
Detecting Quadratically Coupled Ultra-light Dark Matter with Stimulated Annihilation
Authors:
Yuanlin Gong,
Xin Liu,
Lei Wu,
Qiaoli Yang,
Bin Zhu
Abstract:
Ultra-light Dark Matter (ULDM) is one of the most promising DM candidates. Due to the Bose enhancement, we find the annihilation rate of the ULDM in the presence of background photon radiation can be greatly enhanced and produce a distinctive reflected electromagnetic wave with an angular frequency equal to the ULDM mass. We propose to utilize such stimulated annihilation to probe the ULDM with th…
▽ More
Ultra-light Dark Matter (ULDM) is one of the most promising DM candidates. Due to the Bose enhancement, we find the annihilation rate of the ULDM in the presence of background photon radiation can be greatly enhanced and produce a distinctive reflected electromagnetic wave with an angular frequency equal to the ULDM mass. We propose to utilize such stimulated annihilation to probe the ULDM with the electromagnetic quadratic coupling by emitting a beam of radio into space. With a power of 50 MW emitter, we forecast the sensitivity of quadratic coupling in different local halo models for low-frequency radio telescopes, such as LOFAR, UTR-2 and ngLOBO.
△ Less
Submitted 12 February, 2024; v1 submitted 16 August, 2023;
originally announced August 2023.
-
Growth of millimeter-sized high-quality CuFeSe$_2$ single crystals by the molten salt method and study of their semiconducting behavior
Authors:
Mingwei Ma,
Binbin Ruan,
Menghu Zhou,
Yadong Gu,
Qingxin Dong,
Qingsong Yang,
Qiaoyu Wang,
Lewei Chen,
Yunqing Shi,
Junkun Yi,
Genfu Chen,
Zhian Ren
Abstract:
An eutectic AlCl$_3$/KCl molten salt method in a horizontal configuration was employed to grow millimeter-sized and composition homogeneous CuFeSe$_2$ single crystals due to the continuous growth process in a temperature gradient induced solution convection. The typical as-grown CuFeSe$_2$ single crystals in cubic forms are nearly 1.6$\times$1.2$\times$1.0 mm3 in size. The chemical composition and…
▽ More
An eutectic AlCl$_3$/KCl molten salt method in a horizontal configuration was employed to grow millimeter-sized and composition homogeneous CuFeSe$_2$ single crystals due to the continuous growth process in a temperature gradient induced solution convection. The typical as-grown CuFeSe$_2$ single crystals in cubic forms are nearly 1.6$\times$1.2$\times$1.0 mm3 in size. The chemical composition and homogeneity of the crystals was examined by both inductively coupled plasma atomic emission spectroscopy and energy dispersive spectrometer with Cu:Fe:Se = 0.96:1.00:1.99 consistent with the stoichiometric composition of CuFeSe$_2$. The magnetic measurements suggest a ferrimagnetic or weak ferromagnetic transition below T$_C$ = 146 K and the resistivity reveals a semiconducting behavior and an abrupt increase below T$_C$.
△ Less
Submitted 16 August, 2023;
originally announced August 2023.