Search | arXiv e-print repository

FedRC: A Rapid-Converged Hierarchical Federated Learning Framework in Street Scene Semantic Understanding

Authors: Wei-Bin Kou, Qingfeng Lin, Ming Tang, Shuai Wang, Guangxu Zhu, Yik-Chung Wu

Abstract: Street Scene Semantic Understanding (denoted as TriSU) is a crucial but complex task for world-wide distributed autonomous driving (AD) vehicles (e.g., Tesla). Its inference model faces poor generalization issue due to inter-city domain-shift. Hierarchical Federated Learning (HFL) offers a potential solution for improving TriSU model generalization, but suffers from slow convergence rate because o… ▽ More Street Scene Semantic Understanding (denoted as TriSU) is a crucial but complex task for world-wide distributed autonomous driving (AD) vehicles (e.g., Tesla). Its inference model faces poor generalization issue due to inter-city domain-shift. Hierarchical Federated Learning (HFL) offers a potential solution for improving TriSU model generalization, but suffers from slow convergence rate because of vehicles' surrounding heterogeneity across cities. Going beyond existing HFL works that have deficient capabilities in complex tasks, we propose a rapid-converged heterogeneous HFL framework (FedRC) to address the inter-city data heterogeneity and accelerate HFL model convergence rate. In our proposed FedRC framework, both single RGB image and RGB dataset are modelled as Gaussian distributions in HFL aggregation weight design. This approach not only differentiates each RGB sample instead of typically equalizing them, but also considers both data volume and statistical properties rather than simply taking data quantity into consideration. Extensive experiments on the TriSU task using across-city datasets demonstrate that FedRC converges faster than the state-of-the-art benchmark by 38.7%, 37.5%, 35.5%, and 40.6% in terms of mIoU, mPrecision, mRecall, and mF1, respectively. Furthermore, qualitative evaluations in the CARLA simulation environment confirm that the proposed FedRC framework delivers top-tier performance. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: This work has been accepted by 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

arXiv:2407.00955 [pdf, other]

Task-oriented Over-the-air Computation for Edge-device Co-inference with Balanced Classification Accuracy

Authors: Xiang Jiao, Dingzhu Wen, Guangxu Zhu, Wei Jiang, Wu Luo, Yuanming Shi

Abstract: Edge-device co-inference, which concerns the cooperation between edge devices and an edge server for completing inference tasks over wireless networks, has been a promising technique for enabling various kinds of intelligent services at the network edge, e.g., auto-driving. In this paradigm, the concerned design objective of the network shifts from the traditional communication throughput to the e… ▽ More Edge-device co-inference, which concerns the cooperation between edge devices and an edge server for completing inference tasks over wireless networks, has been a promising technique for enabling various kinds of intelligent services at the network edge, e.g., auto-driving. In this paradigm, the concerned design objective of the network shifts from the traditional communication throughput to the effective and efficient execution of the inference task underpinned by the network, measured by, e.g., the inference accuracy and latency. In this paper, a task-oriented over-the-air computation scheme is proposed for a multidevice artificial intelligence system. Particularly, a novel tractable inference accuracy metric is proposed for classification tasks, which is called minimum pair-wise discriminant gain. Unlike prior work measuring the average of all class pairs in feature space, it measures the minimum distance of all class pairs. By maximizing the minimum pair-wise discriminant gain instead of its average counterpart, any pair of classes can be better separated in the feature space, and thus leading to a balanced and improved inference accuracy for all classes. Besides, this paper jointly optimizes the minimum discriminant gain of all feature elements instead of separately maximizing that of each element in the existing designs. As a result, the transmit power can be adaptively allocated to the feature elements according to their different contributions to the inference accuracy, opening an extra degree of freedom to improve inference performance. Extensive experiments are conducted using a concrete use case of human motion recognition to verify the superiority of the proposed design over the benchmarking scheme. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: This paper was accepted by IEEE Transactions on Vehicular Technology on June 30, 2024

arXiv:2407.00297 [pdf]

UADSN: Uncertainty-Aware Dual-Stream Network for Facial Nerve Segmentation

Authors: Guanghao Zhu, Lin Liu, **g Zhang, Xiaohui Du, Ruqian Hao, Juanxiu Liu

Abstract: Facial nerve segmentation is crucial for preoperative path planning in cochlear implantation surgery. Recently, researchers have proposed some segmentation methods, such as atlas-based and deep learning-based methods. However, since the facial nerve is a tubular organ with a diameter of only 1.0-1.5mm, it is challenging to locate and segment the facial nerve in CT scans. In this work, we propose a… ▽ More Facial nerve segmentation is crucial for preoperative path planning in cochlear implantation surgery. Recently, researchers have proposed some segmentation methods, such as atlas-based and deep learning-based methods. However, since the facial nerve is a tubular organ with a diameter of only 1.0-1.5mm, it is challenging to locate and segment the facial nerve in CT scans. In this work, we propose an uncertainty-aware dualstream network (UADSN). UADSN consists of a 2D segmentation stream and a 3D segmentation stream. Predictions from two streams are used to identify uncertain regions, and a consistency loss is employed to supervise the segmentation of these regions. In addition, we introduce channel squeeze & spatial excitation modules into the skip connections of U-shaped networks to extract meaningful spatial information. In order to consider topologypreservation, a clDice loss is introduced into the supervised loss function. Experimental results on the facial nerve dataset demonstrate the effectiveness of UADSN and our submodules. △ Less

Submitted 28 June, 2024; originally announced July 2024.

arXiv:2407.00270 [pdf, ps, other]

Regularity of integral closures of edge ideals of weighted oriented graphs

Authors: Nguyen Cong Minh, Thanh Vu, Guangjun Zhu

Abstract: We prove that the regularity cannot increase when taking the integral closure for edge ideals of arbitrary weighted oriented graphs. We prove that the regularity cannot increase when taking the integral closure for edge ideals of arbitrary weighted oriented graphs. △ Less

Submitted 28 June, 2024; originally announced July 2024.

MSC Class: 13B22; 13D02; 13F55

arXiv:2406.19931 [pdf, other]

Decoupling General and Personalized Knowledge in Federated Learning via Additive and Low-Rank Decomposition

Authors: Xinghao Wu, Xuefeng Liu, Jianwei Niu, Haolin Wang, Shaojie Tang, Guogang Zhu, Hao Su

Abstract: To address data heterogeneity, the key strategy of Personalized Federated Learning (PFL) is to decouple general knowledge (shared among clients) and client-specific knowledge, as the latter can have a negative impact on collaboration if not removed. Existing PFL methods primarily adopt a parameter partitioning approach, where the parameters of a model are designated as one of two types: parameters… ▽ More To address data heterogeneity, the key strategy of Personalized Federated Learning (PFL) is to decouple general knowledge (shared among clients) and client-specific knowledge, as the latter can have a negative impact on collaboration if not removed. Existing PFL methods primarily adopt a parameter partitioning approach, where the parameters of a model are designated as one of two types: parameters shared with other clients to extract general knowledge and parameters retained locally to learn client-specific knowledge. However, as these two types of parameters are put together like a jigsaw puzzle into a single model during the training process, each parameter may simultaneously absorb both general and client-specific knowledge, thus struggling to separate the two types of knowledge effectively. In this paper, we introduce FedDecomp, a simple but effective PFL paradigm that employs parameter additive decomposition to address this issue. Instead of assigning each parameter of a model as either a shared or personalized one, FedDecomp decomposes each parameter into the sum of two parameters: a shared one and a personalized one, thus achieving a more thorough decoupling of shared and personalized knowledge compared to the parameter partitioning method. In addition, as we find that retaining local knowledge of specific clients requires much lower model capacity compared with general knowledge across all clients, we let the matrix containing personalized parameters be low rank during the training process. Moreover, a new alternating training strategy is proposed to further improve the performance. Experimental results across multiple datasets and varying degrees of data heterogeneity demonstrate that FedDecomp outperforms state-of-the-art methods up to 4.9\%. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: 12 pages, 8 figures

arXiv:2406.19649 [pdf]

AstMatch: Adversarial Self-training Consistency Framework for Semi-Supervised Medical Image Segmentation

Authors: Guanghao Zhu, **g Zhang, Juanxiu Liu, Xiaohui Du, Ruqian Hao, Yong Liu, Lin Liu

Abstract: Semi-supervised learning (SSL) has shown considerable potential in medical image segmentation, primarily leveraging consistency regularization and pseudo-labeling. However, many SSL approaches only pay attention to low-level consistency and overlook the significance of pseudo-label reliability. Therefore, in this work, we propose an adversarial self-training consistency framework (AstMatch). First… ▽ More Semi-supervised learning (SSL) has shown considerable potential in medical image segmentation, primarily leveraging consistency regularization and pseudo-labeling. However, many SSL approaches only pay attention to low-level consistency and overlook the significance of pseudo-label reliability. Therefore, in this work, we propose an adversarial self-training consistency framework (AstMatch). Firstly, we design an adversarial consistency regularization (ACR) approach to enhance knowledge transfer and strengthen prediction consistency under varying perturbation intensities. Second, we apply a feature matching loss for adversarial training to incorporate high-level consistency regularization. Additionally, we present the pyramid channel attention (PCA) and efficient channel and spatial attention (ECSA) modules to improve the discriminator's performance. Finally, we propose an adaptive self-training (AST) approach to ensure the pseudo-labels' quality. The proposed AstMatch has been extensively evaluated with cutting-edge SSL methods on three public-available datasets. The experimental results under different labeled ratios indicate that AstMatch outperforms other existing methods, achieving new state-of-the-art performance. Our code will be available at https://github.com/GuanghaoZhu663/AstMatch. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.18810 [pdf]

Assisting Tibetan Students in Learning Quantum Mechanics via Mathematica

Authors: Guangtian Zhu, **g Hu, Chun Du

Abstract: Undergraduate students of physics in Tibet have great difficulty learning quantum mechanics (QM). We attempt to use PER-based methods to help Tibetan students learn QM. In this preliminary study, we incorporate Mathematica in a QM course at Tibet University and record students' learning experiences. Tibetan students tend to have subjective feelings of learning Mathematica, whereas Han students (ma… ▽ More Undergraduate students of physics in Tibet have great difficulty learning quantum mechanics (QM). We attempt to use PER-based methods to help Tibetan students learn QM. In this preliminary study, we incorporate Mathematica in a QM course at Tibet University and record students' learning experiences. Tibetan students tend to have subjective feelings of learning Mathematica, whereas Han students (majority) are more focused on the operational techniques of Mathematica. The results also suggest that both Tibetan students and Han students show limited improvement in time-independent Schrodinger equations after learning QM with Mathematica. Further effort is needed to improve the academic literacy skills of physics students in Tibet. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.18577 [pdf]

Measurement of dynamic nonlocal deformation using nanodiamond sensors

Authors: Yue Cui, Weng-Hang Leong, Guoli Zhu, Ren-Bao Liu, Quan Li

Abstract: Nonlocal deformation sensing achieved by integrating atomic force microscopy indentation with nanodiamond-based orientation tracking features high precision and high spatial resolution, providing a useful technique for studying the mechanical properties of soft biological systems. However, this technique is currently limited to lifeless systems because it cannot differentiate the indentation-induc… ▽ More Nonlocal deformation sensing achieved by integrating atomic force microscopy indentation with nanodiamond-based orientation tracking features high precision and high spatial resolution, providing a useful technique for studying the mechanical properties of soft biological systems. However, this technique is currently limited to lifeless systems because it cannot differentiate the indentation-induced deformation from that associated with live activities or other external perturbations. Here we develop a dynamic nonlocal deformation sensing method using oscillatory nanoindentation and spectroscopic analysis to overcome this limitation. The method realizes both temporally and spatially resolved mechanical analysis, with tens of microsecond time-lag precision, nanometer vertical deformation precision, and sub-hundred nanometer lateral spatial resolution, leading to the disclosure of surface/interface effects in the mechanical response of viscoelastic materials and live cells. Neglecting surface tension would underestimate the liquid-like characteristics of the materials. This work demonstrates nanodiamond sensors as a useful tool for spatial-temporal mechanical analysis of soft, complex bio-relevant materials. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 33 pages (4 figures) + 26 pages (20 figures)

arXiv:2406.17538 [pdf, other]

SKD-TSTSAN: Three-Stream Temporal-Shift Attention Network Based on Self-Knowledge Distillation for Micro-Expression Recognition

Authors: Guanghao Zhu, Lin Liu, Yuhao Hu, Haixin Sun, Fang Liu, Xiaohui Du, Ruqian Hao, Juanxiu Liu, Yong Liu, Hao Deng, **g Zhang

Abstract: Micro-expressions (MEs) are subtle facial movements that occur spontaneously when people try to conceal the real emotions. Micro-expression recognition (MER) is crucial in many fields, including criminal analysis and psychotherapy. However, MER is challenging since MEs have low intensity and ME datasets are small in size. To this end, a three-stream temporal-shift attention network based on self-k… ▽ More Micro-expressions (MEs) are subtle facial movements that occur spontaneously when people try to conceal the real emotions. Micro-expression recognition (MER) is crucial in many fields, including criminal analysis and psychotherapy. However, MER is challenging since MEs have low intensity and ME datasets are small in size. To this end, a three-stream temporal-shift attention network based on self-knowledge distillation (SKD-TSTSAN) is proposed in this paper. Firstly, to address the low intensity of ME muscle movements, we utilize learning-based motion magnification modules to enhance the intensity of ME muscle movements. Secondly, we employ efficient channel attention (ECA) modules in the local-spatial stream to make the network focus on facial regions that are highly relevant to MEs. In addition, temporal shift modules (TSMs) are used in the dynamic-temporal stream, which enables temporal modeling with no additional parameters by mixing ME motion information from two different temporal domains. Furthermore, we introduce self-knowledge distillation (SKD) into the MER task by introducing auxiliary classifiers and using the deepest section of the network for supervision, encouraging all blocks to fully explore the features of the training set. Finally, extensive experiments are conducted on four ME datasets: CASME II, SAMM, MMEW, and CAS(ME)3. The experimental results demonstrate that our SKD-TSTSAN outperforms other existing methods and achieves new state-of-the-art performance. Our code will be available at https://github.com/GuanghaoZhu663/SKD-TSTSAN. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.16967 [pdf, other]

Remaining useful life prediction of rolling bearings based on refined composite multi-scale attention entropy and dispersion entropy

Authors: Yunchong Long, Qinkang Pang, Guangjie Zhu, Junxian Cheng, Xiangshun Li

Abstract: Remaining useful life (RUL) prediction based on vibration signals is crucial for ensuring the safe operation and effective health management of rotating machinery. Existing studies often extract health indicators (HI) from time domain and frequency domain features to analyze complex vibration signals, but these features may not accurately capture the degradation process. In this study, we propose… ▽ More Remaining useful life (RUL) prediction based on vibration signals is crucial for ensuring the safe operation and effective health management of rotating machinery. Existing studies often extract health indicators (HI) from time domain and frequency domain features to analyze complex vibration signals, but these features may not accurately capture the degradation process. In this study, we propose a degradation feature extraction method called Fusion of Multi-Modal Multi-Scale Entropy (FMME), which utilizes multi-modal Refined Composite Multi-scale Attention Entropy (RCMATE) and Fluctuation Dispersion Entropy (RCMFDE), to solve the problem that the existing degradation features cannot accurately reflect the degradation process. Firstly, the Empirical Mode Decomposition (EMD) is employed to decompose the dual-channel vibration signals of bearings into multiple modals. The main modals are then selected for further analysis. The subsequent step involves the extraction of RCMATE and RCMFDE from each modal, followed by wavelet denoising. Next, a novel metric is proposed to evaluate the quality of degradation features. The attention entropy and dispersion entropy of the optimal scales under different modals are fused using Laplacian Eigenmap (LE) to obtain the health indicators. Finally, RUL prediction is performed through the similarity of health indicators between fault samples and bearings to be predicted. Experimental results demonstrate that the proposed method yields favorable outcomes across diverse operating conditions. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: 12pages, 9 figures

arXiv:2406.16929 [pdf, other]

Modelling the 5G Energy Consumption using Real-world Data: Energy Fingerprint is All You Need

Authors: Tingwei Chen, Yantao Wang, Hanzhi Chen, Zijian Zhao, Xinhao Li, Nicola Piovesan, Guangxu Zhu, Qingjiang Shi

Abstract: The introduction of fifth-generation (5G) radio technology has revolutionized communications, bringing unprecedented automation, capacity, connectivity, and ultra-fast, reliable communications. However, this technological leap comes with a substantial increase in energy consumption, presenting a significant challenge. To improve the energy efficiency of 5G networks, it is imperative to develop sop… ▽ More The introduction of fifth-generation (5G) radio technology has revolutionized communications, bringing unprecedented automation, capacity, connectivity, and ultra-fast, reliable communications. However, this technological leap comes with a substantial increase in energy consumption, presenting a significant challenge. To improve the energy efficiency of 5G networks, it is imperative to develop sophisticated models that accurately reflect the influence of base station (BS) attributes and operational conditions on energy usage.Importantly, addressing the complexity and interdependencies of these diverse features is particularly challenging, both in terms of data processing and model architecture design. This paper proposes a novel 5G base stations energy consumption modelling method by learning from a real-world dataset used in the ITU 5G Base Station Energy Consumption Modelling Challenge in which our model ranked second. Unlike existing methods that omit the Base Station Identifier (BSID) information and thus fail to capture the unique energy fingerprint in different base stations, we incorporate the BSID into the input features and encoding it with an embedding layer for precise representation. Additionally, we introduce a novel masked training method alongside an attention mechanism to further boost the model's generalization capabilities and accuracy. After evaluation, our method demonstrates significant improvements over existing models, reducing Mean Absolute Percentage Error (MAPE) from 12.75% to 4.98%, leading to a performance gain of more than 60%. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.12820 [pdf, other]

Realizing string-net condensation: Fibonacci anyon braiding for universal gates and sampling chromatic polynomials

Authors: Zlatko K. Minev, Khadijeh Najafi, Swarnadeep Majumder, Juven Wang, Ady Stern, Eun-Ah Kim, Chao-Ming Jian, Guanyu Zhu

Abstract: Fibonacci string-net condensate, a complex topological state that supports non-Abelian anyon excitations, holds promise for fault-tolerant universal quantum computation. However, its realization by a static-lattice Hamiltonian has remained elusive due to the inherent high-order interactions demanded. Here, we introduce a scalable dynamical string-net preparation (DSNP) approach, suitable even for… ▽ More Fibonacci string-net condensate, a complex topological state that supports non-Abelian anyon excitations, holds promise for fault-tolerant universal quantum computation. However, its realization by a static-lattice Hamiltonian has remained elusive due to the inherent high-order interactions demanded. Here, we introduce a scalable dynamical string-net preparation (DSNP) approach, suitable even for near-term quantum processors, that can dynamically prepare the state through reconfigurable graphs. DSNP enables the creation and manipulation of the Fibonacci string-net condensate (Fib-SNC). Using a superconducting quantum processor, we couple the DSNP approach with a composite error-mitigation strategy on deep circuits to successfully create, measure, and braid Fibonacci anyons in two spatial dimensions (2D) demonstrating their potential for universal quantum computation. To this end, we measure anyon charges for two species of anyons associated with the doubled topological quantum field theory underlying Fid-SNC, with an average experimental accuracy of 94%. We validate that a scalable 2D braiding operation on a logical qubit encoded on three anyons yields the golden ratio $φ$ with 98% average accuracy and 8% measurement uncertainty. We further sample the Fib-SNC wavefunction to estimate the chromatic polynomial at $φ+2$ for various graphs. Given the established computational hardness of the chromatic polynomial, the wavefunction amplitude is classically hard to evaluate. Our results establish the first proof of principle that scalable DSNP can open doors to fault-tolerant universal quantum computation and to classically-hard problems. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 4 pages and 4 figures with Supplemental Materials (47 pages, 18 Figures)

arXiv:2406.11264 [pdf, ps, other]

Input-to-State Stabilization of 1-D Parabolic PDEs under Output Feedback Control

Authors: Yongchun Bi, Jun Zheng, Guchuan Zhu

Abstract: This paper addresses the problem of input-to-state stabilization for a class of parabolic equations with time-varying coefficients, as well as Dirichlet and Robin boundary disturbances. By using time-invariant kernel functions, which can reduce the complexity in control design and implementation, an observer-based output feedback controller is designed via backstep**. By using the generalized Ly… ▽ More This paper addresses the problem of input-to-state stabilization for a class of parabolic equations with time-varying coefficients, as well as Dirichlet and Robin boundary disturbances. By using time-invariant kernel functions, which can reduce the complexity in control design and implementation, an observer-based output feedback controller is designed via backstep**. By using the generalized Lyapunov method, which can be used to handle Dirichlet boundary terms, the input-to-state stability of the closed-loop system under output feedback control, as well as the state estimation error system, is established in the spatial $L^\infty$-norm. Numerical simulations are conducted to confirm the theoretical results and to illustrate the effectiveness of the proposed control scheme. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.09829 [pdf, other]

Open-Vocabulary Semantic Segmentation with Image Embedding Balancing

Authors: Xiangheng Shan, Dongyue Wu, Guilin Zhu, Yuanjie Shao, Nong Sang, Changxin Gao

Abstract: Open-vocabulary semantic segmentation is a challenging task, which requires the model to output semantic masks of an image beyond a close-set vocabulary. Although many efforts have been made to utilize powerful CLIP models to accomplish this task, they are still easily overfitting to training classes due to the natural gaps in semantic information between training and new classes. To overcome this… ▽ More Open-vocabulary semantic segmentation is a challenging task, which requires the model to output semantic masks of an image beyond a close-set vocabulary. Although many efforts have been made to utilize powerful CLIP models to accomplish this task, they are still easily overfitting to training classes due to the natural gaps in semantic information between training and new classes. To overcome this challenge, we propose a novel framework for openvocabulary semantic segmentation called EBSeg, incorporating an Adaptively Balanced Decoder (AdaB Decoder) and a Semantic Structure Consistency loss (SSC Loss). The AdaB Decoder is designed to generate different image embeddings for both training and new classes. Subsequently, these two types of embeddings are adaptively balanced to fully exploit their ability to recognize training classes and generalization ability for new classes. To learn a consistent semantic structure from CLIP, the SSC Loss aligns the inter-classes affinity in the image feature space with that in the text feature space of CLIP, thereby improving the generalization ability of our model. Furthermore, we employ a frozen SAM image encoder to complement the spatial information that CLIP features lack due to the low training image resolution and image-level supervision inherent in CLIP. Extensive experiments conducted across various benchmarks demonstrate that the proposed EBSeg outperforms the state-of-the-art methods. Our code and trained models will be here: https://github.com/slonetime/EBSeg. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: CVPR2024

arXiv:2406.08901 [pdf, other]

Fractional Chern insulator candidate in twisted bilayer checkboard lattice

Authors: Jia-Zheng Ma, Rui-Zhen Huang, Guo-Yi Zhu, Ji-Yao Chen, Dao-Xin Yao

Abstract: We investigate a fractional Chern insulator (FCI) candidate arising from Moiré bands with higher Chern number C=2 on a magic angle twisted bilayer checkboard lattice (MATBCB). There are two nearly flat low lying bands in the single particle energy spectrum under the first magic angle $φ\approx 1.608^{\circ}$ and chiral limit. We find MATBCB hosts a nearly uniform Berry curvature distribution and e… ▽ More We investigate a fractional Chern insulator (FCI) candidate arising from Moiré bands with higher Chern number C=2 on a magic angle twisted bilayer checkboard lattice (MATBCB). There are two nearly flat low lying bands in the single particle energy spectrum under the first magic angle $φ\approx 1.608^{\circ}$ and chiral limit. We find MATBCB hosts a nearly uniform Berry curvature distribution and exhibits tiny violation of quantum geometric trace condition in the first moiré Brillourin Zone (mBZ), indicating that there is a nearly ideal quantum geometry in MATBCB in single particle level. Turning on projected Coulomb interactions, we perform exact diagonalization and find a ten-fold ground state quasi-degeneracy in many body energy spectrum with filling fraction $ν=1/5$. The ten-fold quasi-degenrate ground states further show spectra flow under flux pum**. By diagnosing the particle entanglement spectrum (PES) of the ground states, we obtain a clear PES gap and quasi-hole state counting consistent with Halperin spin singlet generalized Pauli principle, suggesting that a fractional Chern insulator is realized in this system. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 14 pages, 9 figures

arXiv:2406.08698 [pdf, other]

Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes… ▽ More In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes of astrophysical $γ$-ray background while large amount of dark matter. By analyzing more than 700 days observational data at LHAASO, no significant dark matter signal from 1 TeV to 1 EeV is detected. Accordingly we derive the most stringent constraints on the ultra-heavy dark matter annihilation cross-section up to EeV. The constraints on the lifetime of dark matter in decay mode are also derived. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 17 pages, 12 figures, accepted by PRL

arXiv:2406.06110 [pdf, other]

Recurrent Context Compression: Efficiently Expanding the Context Window of LLM

Authors: Chensen Huang, Guibo Zhu, Xuepeng Wang, Yifei Luo, Guo**g Ge, Haoran Chen, Dong Yi, **qiao Wang

Abstract: To extend the context length of Transformer-based large language models (LLMs) and improve comprehension capabilities, we often face limitations due to computational resources and bounded memory storage capacity. This work introduces a method called Recurrent Context Compression (RCC), designed to efficiently expand the context window length of LLMs within constrained storage space. We also invest… ▽ More To extend the context length of Transformer-based large language models (LLMs) and improve comprehension capabilities, we often face limitations due to computational resources and bounded memory storage capacity. This work introduces a method called Recurrent Context Compression (RCC), designed to efficiently expand the context window length of LLMs within constrained storage space. We also investigate the issue of poor model responses when both instructions and context are compressed in downstream tasks, and propose an instruction reconstruction method to mitigate this problem. We validated the effectiveness of our approach on multiple tasks, achieving a compression rate of up to 32x on text reconstruction tasks with a BLEU4 score close to 0.95, and nearly 100\% accuracy on a passkey retrieval task with a sequence length of 1M. Finally, our method demonstrated competitive performance in long-text question-answering tasks compared to non-compressed methods, while significantly saving storage resources in long-text inference tasks. Our code, models, and demo are available at https://github.com/WUHU-G/RCC_Transformer △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.05130 [pdf, other]

An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models

Authors: Xiongtao Zhou, Jie He, Yuhua Ke, Guangyao Zhu, Víctor Gutiérrez-Basulto, Jeff Z. Pan

Abstract: Multimodal large language models (MLLMs) fine-tuned with multimodal instruction datasets have demonstrated remarkable capabilities in multimodal tasks. However, fine-tuning all parameters of MLLMs has become challenging as they usually contain billions of parameters. To address this issue, we study parameter-efficient fine-tuning (PEFT) methods for MLLMs. We aim to identify effective methods for e… ▽ More Multimodal large language models (MLLMs) fine-tuned with multimodal instruction datasets have demonstrated remarkable capabilities in multimodal tasks. However, fine-tuning all parameters of MLLMs has become challenging as they usually contain billions of parameters. To address this issue, we study parameter-efficient fine-tuning (PEFT) methods for MLLMs. We aim to identify effective methods for enhancing the performance of MLLMs in scenarios where only a limited number of parameters are trained. This paper conducts empirical studies using four popular PEFT methods to fine-tune the LLM component of open-source MLLMs. We present a comprehensive analysis that encompasses various aspects, including the impact of PEFT methods on various models, parameters and location of the PEFT module, size of fine-tuning data, model stability based on PEFT methods, MLLM's generalization, and hallucination. We evaluated four PEFT methods on seven datasets from two different categories: unseen and seen datasets. Across all experiments, we show that the adapter is the best-performing PEFT method. At the same time, fine-tuning the connector layers leads to improved performance in most MLLMs. Code and data are available at https://github.com/alenai97/PEFT-MLLM.git. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: ACL finding 2024

arXiv:2406.01085 [pdf, other]

FedAdOb: Privacy-Preserving Federated Deep Learning with Adaptive Obfuscation

Authors: Hanlin Gu, Jiahuan Luo, Yan Kang, Yuan Yao, Gongxi Zhu, Bowen Li, Lixin Fan, Qiang Yang

Abstract: Federated learning (FL) has emerged as a collaborative approach that allows multiple clients to jointly learn a machine learning model without sharing their private data. The concern about privacy leakage, albeit demonstrated under specific conditions, has triggered numerous follow-up research in designing powerful attacking methods and effective defending mechanisms aiming to thwart these attacki… ▽ More Federated learning (FL) has emerged as a collaborative approach that allows multiple clients to jointly learn a machine learning model without sharing their private data. The concern about privacy leakage, albeit demonstrated under specific conditions, has triggered numerous follow-up research in designing powerful attacking methods and effective defending mechanisms aiming to thwart these attacking methods. Nevertheless, privacy-preserving mechanisms employed in these defending methods invariably lead to compromised model performances due to a fixed obfuscation applied to private data or gradients. In this article, we, therefore, propose a novel adaptive obfuscation mechanism, coined FedAdOb, to protect private data without yielding original model performances. Technically, FedAdOb utilizes passport-based adaptive obfuscation to ensure data privacy in both horizontal and vertical federated learning settings. The privacy-preserving capabilities of FedAdOb, specifically with regard to private features and labels, are theoretically proven through Theorems 1 and 2. Furthermore, extensive experimental evaluations conducted on various datasets and network architectures demonstrate the effectiveness of FedAdOb by manifesting its superior trade-off between privacy preservation and model performance, surpassing existing methods. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.00255 [pdf]

Measuring eye-tracking accuracy and its impact on usability in apple vision pro

Authors: Zehao Huang, Gancheng Zhu, Xiaoting Duan, Rong Wang, Yongkai Li, Shuai Zhang, Zhiguo Wang

Abstract: With built-in eye-tracking cameras, the Apple Vision Pro (AVP) enables gaze-based interaction, eye image rendering on external screens, and iris recognition for device unlocking. One of the technological advancements of the AVP is its heavy reliance on gaze- and gesture-based interaction. However, limited information is available regarding the specifics of the eye-tracking device in the AVP, and r… ▽ More With built-in eye-tracking cameras, the Apple Vision Pro (AVP) enables gaze-based interaction, eye image rendering on external screens, and iris recognition for device unlocking. One of the technological advancements of the AVP is its heavy reliance on gaze- and gesture-based interaction. However, limited information is available regarding the specifics of the eye-tracking device in the AVP, and raw gaze data is inaccessible to developers. This study evaluated the eye-tracking accuracy of the AVP, leveraging foveated rendering, and examined how tracking accuracy relates to user-reported usability. The results revealed an overall gaze error of 2.5° (or 61.95 pixels) within a tested field of view (FOV) of approximately 34° x 18°. As expected, the lowest gaze error was observed in the central FOV, with higher gaze errors in peripheral areas. The usability and learnability scores of the AVP, measured using the standard System Usability Scale (SUS), were 73 and 70, respectively. Importantly, no statistically reliable correlation between gaze error and usability scores was found. These results suggest that the eye-tracking accuracy of the AVP is comparable to other VR/AR headsets. While eye-tracking accuracy is critical for gaze-based interaction, it is not the sole determinant of user experience in AR/VR. △ Less

Submitted 31 May, 2024; originally announced June 2024.

Comments: 10 pages, 7 figures and 2 tables

arXiv:2405.20795 [pdf, other]

InsightSee: Advancing Multi-agent Vision-Language Models for Enhanced Visual Understanding

Authors: Huaxiang Zhang, Yaojia Mu, Guo-Niu Zhu, Zhongxue Gan

Abstract: Accurate visual understanding is imperative for advancing autonomous systems and intelligent robots. Despite the powerful capabilities of vision-language models (VLMs) in processing complex visual scenes, precisely recognizing obscured or ambiguously presented visual elements remains challenging. To tackle such issues, this paper proposes InsightSee, a multi-agent framework to enhance VLMs' interp… ▽ More Accurate visual understanding is imperative for advancing autonomous systems and intelligent robots. Despite the powerful capabilities of vision-language models (VLMs) in processing complex visual scenes, precisely recognizing obscured or ambiguously presented visual elements remains challenging. To tackle such issues, this paper proposes InsightSee, a multi-agent framework to enhance VLMs' interpretative capabilities in handling complex visual understanding scenarios. The framework comprises a description agent, two reasoning agents, and a decision agent, which are integrated to refine the process of visual information interpretation. The design of these agents and the mechanisms by which they can be enhanced in visual information processing are presented. Experimental results demonstrate that the InsightSee framework not only boosts performance on specific visual tasks but also retains the original models' strength. The proposed framework outperforms state-of-the-art algorithms in 6 out of 9 benchmark tests, with a substantial advancement in multimodal understanding. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.19789 [pdf, other]

Estimating before Debiasing: A Bayesian Approach to Detaching Prior Bias in Federated Semi-Supervised Learning

Authors: Guogang Zhu, Xuefeng Liu, Xinghao Wu, Shaojie Tang, Chao Tang, Jianwei Niu, Hao Su

Abstract: Federated Semi-Supervised Learning (FSSL) leverages both labeled and unlabeled data on clients to collaboratively train a model.In FSSL, the heterogeneous data can introduce prediction bias into the model, causing the model's prediction to skew towards some certain classes. Existing FSSL methods primarily tackle this issue by enhancing consistency in model parameters or outputs. However, as the mo… ▽ More Federated Semi-Supervised Learning (FSSL) leverages both labeled and unlabeled data on clients to collaboratively train a model.In FSSL, the heterogeneous data can introduce prediction bias into the model, causing the model's prediction to skew towards some certain classes. Existing FSSL methods primarily tackle this issue by enhancing consistency in model parameters or outputs. However, as the models themselves are biased, merely constraining their consistency is not sufficient to alleviate prediction bias. In this paper, we explore this bias from a Bayesian perspective and demonstrate that it principally originates from label prior bias within the training data. Building upon this insight, we propose a debiasing method for FSSL named FedDB. FedDB utilizes the Average Prediction Probability of Unlabeled Data (APP-U) to approximate the biased prior.During local training, FedDB employs APP-U to refine pseudo-labeling through Bayes' theorem, thereby significantly reducing the label prior bias. Concurrently, during the model aggregation, FedDB uses APP-U from participating clients to formulate unbiased aggregate weights, thereby effectively diminishing bias in the global model. Experimental results show that FedDB can surpass existing FSSL methods. The code is available at https://github.com/GuogangZhu/FedDB. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: Accepted by IJCAI 2024

arXiv:2405.18826 [pdf, ps, other]

Isovalent alloying assisted anomalous valley Hall effect in hexagonal antiferromagnetic monolayer

Authors: San-Dong Guo, Liguo Zhang, Xiao-Shu Guo, Gangqiang Zhu

Abstract: Exploring combination of antiferromagnetic (AFM) spintronics and anomalous valley Hall effect (AVHE) is one of the most important questions for valleytronic applications. The key to address this issue is to achieve spin splitting around the valleys in AFM systems. Here, we propose a possible way for achieving AVHE in hexagonal AFM monolayer, which involves the isovalent alloying. This can break th… ▽ More Exploring combination of antiferromagnetic (AFM) spintronics and anomalous valley Hall effect (AVHE) is one of the most important questions for valleytronic applications. The key to address this issue is to achieve spin splitting around the valleys in AFM systems. Here, we propose a possible way for achieving AVHE in hexagonal AFM monolayer, which involves the isovalent alloying. This can break the combined symmetry ($PT$ symmetry) of spatial inversion ($P$) and time reversal ($T$), giving rise to spin splitting. More specifically, the large spin splitting around the Fermi energy level owes to $d$ orbital mismatch among these different transition metal ions. Based on first-principles calculations, the proposed way can be verified in out-of-plane AFM $\mathrm{CrMoC_2S_6}$ monolayer, which possesses spontaneous valley polarization and spitting splitting, providing possibility to realize AVHE. It is also proved that tensile strain can strengthen the valley splitting and maintain the out-of-plane AFM ordering. Our works provide an experimentally feasible way for develo** AFM valleytronic devices. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 6 pages, 7 figures

arXiv:2405.15474 [pdf, other]

Unlearning during Learning: An Efficient Federated Machine Unlearning Method

Authors: Hanlin Gu, Gongxi Zhu, Jie Zhang, Xinyuan Zhao, Yuxing Han, Lixin Fan, Qiang Yang

Abstract: In recent years, Federated Learning (FL) has garnered significant attention as a distributed machine learning paradigm. To facilitate the implementation of the right to be forgotten, the concept of federated machine unlearning (FMU) has also emerged. However, current FMU approaches often involve additional time-consuming steps and may not offer comprehensive unlearning capabilities, which renders… ▽ More In recent years, Federated Learning (FL) has garnered significant attention as a distributed machine learning paradigm. To facilitate the implementation of the right to be forgotten, the concept of federated machine unlearning (FMU) has also emerged. However, current FMU approaches often involve additional time-consuming steps and may not offer comprehensive unlearning capabilities, which renders them less practical in real FL scenarios. In this paper, we introduce FedAU, an innovative and efficient FMU framework aimed at overcoming these limitations. Specifically, FedAU incorporates a lightweight auxiliary unlearning module into the learning process and employs a straightforward linear operation to facilitate unlearning. This approach eliminates the requirement for extra time-consuming steps, rendering it well-suited for FL. Furthermore, FedAU exhibits remarkable versatility. It not only enables multiple clients to carry out unlearning tasks concurrently but also supports unlearning at various levels of granularity, including individual data samples, specific classes, and even at the client level. We conducted extensive experiments on MNIST, CIFAR10, and CIFAR100 datasets to evaluate the performance of FedAU. The results demonstrate that FedAU effectively achieves the desired unlearning effect while maintaining model accuracy. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: Accepted by IJCAI 2024

arXiv:2405.13751 [pdf, other]

GameVLM: A Decision-making Framework for Robotic Task Planning Based on Visual Language Models and Zero-sum Games

Authors: Aoran Mei, Jianhua Wang, Guo-Niu Zhu, Zhongxue Gan

Abstract: With their prominent scene understanding and reasoning capabilities, pre-trained visual-language models (VLMs) such as GPT-4V have attracted increasing attention in robotic task planning. Compared with traditional task planning strategies, VLMs are strong in multimodal information parsing and code generation and show remarkable efficiency. Although VLMs demonstrate great potential in robotic task… ▽ More With their prominent scene understanding and reasoning capabilities, pre-trained visual-language models (VLMs) such as GPT-4V have attracted increasing attention in robotic task planning. Compared with traditional task planning strategies, VLMs are strong in multimodal information parsing and code generation and show remarkable efficiency. Although VLMs demonstrate great potential in robotic task planning, they suffer from challenges like hallucination, semantic complexity, and limited context. To handle such issues, this paper proposes a multi-agent framework, i.e., GameVLM, to enhance the decision-making process in robotic task planning. In this study, VLM-based decision and expert agents are presented to conduct the task planning. Specifically, decision agents are used to plan the task, and the expert agent is employed to evaluate these task plans. Zero-sum game theory is introduced to resolve inconsistencies among different agents and determine the optimal solution. Experimental results on real robots demonstrate the efficacy of the proposed framework, with an average success rate of 83.3%. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.13623 [pdf, other]

doi 10.1103/PhysRevLett.132.193602

Nonreciprocal Superradiant Phase Transitions and Multicriticality in a Cavity QED System

Authors: Gui-Lei Zhu, Chang-Sheng Hu, Wei Qin, Xin-You Lü, Franco Nori

Abstract: We demonstrate the emergence of nonreciprocal superradiant phase transitions and novel multicriticality in a cavity quantum electrodynamics (QED) system, where a two-level atom interacts with two counter-propagating modes of a whispering-gallery-mode (WGM) microcavity. The cavity rotates at a certain angular velocity, and is directionally squeezed by a unidirectional parametric pum** $χ^{(2)}$ n… ▽ More We demonstrate the emergence of nonreciprocal superradiant phase transitions and novel multicriticality in a cavity quantum electrodynamics (QED) system, where a two-level atom interacts with two counter-propagating modes of a whispering-gallery-mode (WGM) microcavity. The cavity rotates at a certain angular velocity, and is directionally squeezed by a unidirectional parametric pum** $χ^{(2)}$ nonlinearity. The combination of cavity rotation and directional squeezing leads to nonreciprocal first- and second-order superradiant phase transitions. These transitions do not require ultrastrong atom-field couplings and can be easily controlled by the external pump field. Through a full quantum description of the system Hamiltonian, we identify two types of multicritical points in the phase diagram, both of which exhibit controllable nonreciprocity. These results open a new door for all-optical manipulation of superradiant transitions and multicritical behaviors in light-matter systems, with potential applications in engineering various integrated nonreciprocal quantum devices △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: 18 pages, 10 figures

Journal ref: Phys. Rev. Lett. 132, 193602 (2024)

arXiv:2405.13048 [pdf]

Human-Generative AI Collaborative Problem Solving Who Leads and How Students Perceive the Interactions

Authors: Gaoxia Zhu, Vidya Sudarshan, Jason Fok Kow, Yew Soon Ong

Abstract: This research investigates distinct human-generative AI collaboration types and students' interaction experiences when collaborating with generative AI (i.e., ChatGPT) for problem-solving tasks and how these factors relate to students' sense of agency and perceived collaborative problem solving. By analyzing the surveys and reflections of 79 undergraduate students, we identified three human-genera… ▽ More This research investigates distinct human-generative AI collaboration types and students' interaction experiences when collaborating with generative AI (i.e., ChatGPT) for problem-solving tasks and how these factors relate to students' sense of agency and perceived collaborative problem solving. By analyzing the surveys and reflections of 79 undergraduate students, we identified three human-generative AI collaboration types: even contribution, human leads, and AI leads. Notably, our study shows that 77.21% of students perceived they led or had even contributed to collaborative problem-solving when collaborating with ChatGPT. On the other hand, 15.19% of the human participants indicated that the collaborations were led by ChatGPT, indicating a potential tendency for students to rely on ChatGPT. Furthermore, 67.09% of students perceived their interaction experiences with ChatGPT to be positive or mixed. We also found a positive correlation between positive interaction experience and a sense of positive agency. The results of this study contribute to our understanding of the collaboration between students and generative AI and highlight the need to study further why some students let ChatGPT lead collaborative problem-solving and how to enhance their interaction experience through curriculum and technology design. △ Less

Submitted 18 May, 2024; originally announced May 2024.

Comments: This paper appears at the IEEE Conference on Artificial Intelligence (CAI) 2024

arXiv:2405.11826 [pdf, other]

Data quality control system and long-term performance monitor of the LHAASO-KM2A

Authors: Zhen Cao, F. Aharonian, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, W. Bian, A. V. Bukevich, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, H. X. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. Chen , et al. (263 additional authors not shown)

Abstract: The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To… ▽ More The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To ensure the reliability of the LHAASO-KM2A data, a three-level quality control system has been established. It is used to monitor the status of detector units, stability of reconstructed parameters and the performance of the array based on observations of the Crab Nebula and Moon shadow. This paper will introduce the control system and its application on the LHAASO-KM2A data collected from August 2021 to July 2023. During this period, the pointing and angular resolution of the array were stable. From the observations of the Moon shadow and Crab Nebula, the results achieved using the two methods are consistent with each other. According to the observation of the Crab Nebula at energies from 25 TeV to 100 TeV, the time averaged pointing errors are estimated to be $-0.003^{\circ} \pm 0.005^{\circ}$ and $0.001^{\circ} \pm 0.006^{\circ}$ in the R.A. and Dec directions, respectively. △ Less

Submitted 13 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: 15 pages, 9 figures

arXiv:2405.11719 [pdf, other]

Non-Abelian Self-Correcting Quantum Memory

Authors: Po-Shen Hsin, Ryohei Kobayashi, Guanyu Zhu

Abstract: We construct a family of infinitely many new candidate non-Abelian self-correcting topological quantum memories in $D\geq 5+1$ spacetime dimensions without particle excitations using local commuting non-Pauli stabilizer lattice models and field theories of $\mathbb{Z}_2^3$ higher-form gauge fields with nontrivial topological action. We call such non-Pauli stabilizer models magic stabilizer codes.… ▽ More We construct a family of infinitely many new candidate non-Abelian self-correcting topological quantum memories in $D\geq 5+1$ spacetime dimensions without particle excitations using local commuting non-Pauli stabilizer lattice models and field theories of $\mathbb{Z}_2^3$ higher-form gauge fields with nontrivial topological action. We call such non-Pauli stabilizer models magic stabilizer codes. The family of topological orders have Abelian electric excitations and non-Abelian magnetic excitations that obey Ising-like fusion rules, generalizing the dihedral group $\mathbb{D}_8$ gauge theory in 2+1d. The simplest example includes a new non-Abelian self-correcting memory in 5+1d with Abelian loop excitations and non-Abelian membrane excitations. We use a Peierls argument to demonstrate the self-correction property and the thermal stability, and devise a probablistic local cellular-automaton decoder. △ Less

Submitted 19 May, 2024; originally announced May 2024.

Comments: 27 pages, 2 figures

arXiv:2405.09136 [pdf, other]

doi 10.1017/jfm.2024.350

Self-diffusiophoretic propulsion of a spheroidal particle in a shear-thinning fluid

Authors: Guangpu Zhu, Brandon van Gogh, Lailai Zhu, On Shun Pak, Yi Man

Abstract: Shear-thinning viscosity is a non-Newtonian behaviour that active particles often encounter in biological fluids such as blood and mucus. The fundamental question of how this ubiquitous non-Newtonian rheology affects the propulsion of active particles has attracted substantial interest. In particular, spherical Janus particles driven by self-diffusiophresis, a major physico-chemical propulsion mec… ▽ More Shear-thinning viscosity is a non-Newtonian behaviour that active particles often encounter in biological fluids such as blood and mucus. The fundamental question of how this ubiquitous non-Newtonian rheology affects the propulsion of active particles has attracted substantial interest. In particular, spherical Janus particles driven by self-diffusiophresis, a major physico-chemical propulsion mechanism of synthetic active particles, were shown to always swim slower in a shear-thinning fluid than in a Newtonian fluid. In this work, we move beyond the spherical limit to examine the effect of particle eccentricity on self-diffusiophoretic propulsion in a shear-thinning fluid. We use a combination of asymptotic analysis and numerical simulations to show that shear-thinning rheology can enhance self-diffusiophoretic propulsion of a spheroidal particle, in stark contrast to previous findings for the spherical case. A systematic characterization of the dependence of the propulsion speed on the particle's active surface coverage has also uncovered an intriguing feature associated with the propulsion speeds of a pair of complementarily coated particles not previously reported. Symmetry arguments are presented to elucidate how this new feature emerges as a combined effect of anisotropy of the spheroidal geometry and nonlinearity in fluid rheology. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: 21 Pages, 8 figures

Journal ref: Journal of Fluid Mechanics, 986, A39 (2024)

arXiv:2405.07691 [pdf, other]

Discovery of Very-high-energy Gamma-ray Emissions from the Low Luminosity AGN NGC 4278 by LHAASO

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) i… ▽ More The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) is compatible with NGC 4278 within $\sim0.03$ degree. Variation analysis shows an indication of the variability at a few months level in the TeV band, which is consistent with low frequency observations. Based on these observations, we report the detection of TeV $γ$-ray emissions from this low-luminosity AGN NGC 4278. The observations by LHAASO-WCDA during active period has a significance level of 8.8\,$σ$ with best-fit photon spectral index $\varGamma=2.56\pm0.14$ and a flux $f_{1-10\,\rm{TeV}}=(7.0\pm1.1_{\rm{sta}}\pm0.35_{\rm{syst}})\times10^{-13}\,\rm{photons\,cm^{-2}\,s^{-1}}$, or approximately $5\%$ of the Crab Nebula. The discovery of VHE from NGC 4278 indicates that the compact, weak radio jet can efficiently accelerate particles and emit TeV photons. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 11 pages, 5 figures

arXiv:2405.05122 [pdf]

Knowledge Gaps and Research Needs for Modeling CO2 Mineralization in the Basalt-CO2-Water System: A Review of Laboratory Experiments

Authors: Peng Lu, John Apps, Guanru Zhang, Alexander Gysi Chen Zhu

Abstract: Carbon capture and storage in basalt is being actively investigated as a scalable climate change mitigation option. Accurate geochemical modeling prediction of the extent and rate of CO2 mineralization is a critical component in assessing the local and global feasibility and efficacy of this strategy. In this study, we review basalt-CO2-water interaction experimental studies conducted during the l… ▽ More Carbon capture and storage in basalt is being actively investigated as a scalable climate change mitigation option. Accurate geochemical modeling prediction of the extent and rate of CO2 mineralization is a critical component in assessing the local and global feasibility and efficacy of this strategy. In this study, we review basalt-CO2-water interaction experimental studies conducted during the last two decades to determine whether they provide useable information for geochemical modeling. Most of the cited experiments generate data on the temporal evolution of water composition, and a few provide identification of secondary precipitates and their compositions, offering empirical and semi-quantitative information about the reactivity of basalts and the likelihood of secondary carbonate mineralization at various temperatures, pHs, and pCO2 conditions. However, most experiments provide insufficient information on the properties and quantity of secondary minerals formed, prohibiting accurate mass balance calculations and hence more quantitative geochemical modeling studies. Primary Ca, Mg, and Fe-bearing minerals in basalt control the availability of major ions released into aqueous solution for carbonate precipitation, and many secondary minerals, i.e., smectites, Ca-Mg-Fe carbonates, and zeolites, provide sinks for the same major ions, some of which are difficult to quantify experimentally. Thus, we have a multi-source and multi-sink inverse mass balance problem with insufficient constraints on the bulk system in which the temporal evolution of major ions does not provide sufficient information on which mineral(s) dissolve or the sequence of dissolution and precipitation reactions. Going forward, we propose that future experimental work should focus on trace elements and multiple isotopic tracers and better characterize the solid reaction products with modern analytical instruments. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.04146 [pdf, other]

pFedLVM: A Large Vision Model (LVM)-Driven and Latent Feature-Based Personalized Federated Learning Framework in Autonomous Driving

Authors: Wei-Bin Kou, Qingfeng Lin, Ming Tang, Sheng Xu, Rongguang Ye, Yang Leng, Shuai Wang, Guofa Li, Zhenyu Chen, Guangxu Zhu, Yik-Chung Wu

Abstract: Deep learning-based Autonomous Driving (AD) models often exhibit poor generalization due to data heterogeneity in an ever domain-shifting environment. While Federated Learning (FL) could improve the generalization of an AD model (known as FedAD system), conventional models often struggle with under-fitting as the amount of accumulated training data progressively increases. To address this issue, i… ▽ More Deep learning-based Autonomous Driving (AD) models often exhibit poor generalization due to data heterogeneity in an ever domain-shifting environment. While Federated Learning (FL) could improve the generalization of an AD model (known as FedAD system), conventional models often struggle with under-fitting as the amount of accumulated training data progressively increases. To address this issue, instead of conventional small models, employing Large Vision Models (LVMs) in FedAD is a viable option for better learning of representations from a vast volume of data. However, implementing LVMs in FedAD introduces three challenges: (I) the extremely high communication overheads associated with transmitting LVMs between participating vehicles and a central server; (II) lack of computing resource to deploy LVMs on each vehicle; (III) the performance drop due to LVM focusing on shared features but overlooking local vehicle characteristics. To overcome these challenges, we propose pFedLVM, a LVM-Driven, Latent Feature-Based Personalized Federated Learning framework. In this approach, the LVM is deployed only on central server, which effectively alleviates the computational burden on individual vehicles. Furthermore, the exchange between central server and vehicles are the learned features rather than the LVM parameters, which significantly reduces communication overhead. In addition, we utilize both shared features from all participating vehicles and individual characteristics from each vehicle to establish a personalized learning mechanism. This enables each vehicle's model to learn features from others while preserving its personalized characteristics, thereby outperforming globally shared models trained in general FL. Extensive experiments demonstrate that pFedLVM outperforms the existing state-of-the-art approaches. △ Less

Submitted 17 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

Comments: This paper was submitted to CVPR 2024 in Nov. 2023

arXiv:2404.19182 [pdf, other]

Robust Proximity Detection using On-Device Gait Monitoring

Authors: Yuqian Hu, Guozhen Zhu, Beibei Wang, K. J. Ray Liu

Abstract: Proximity detection in indoor environments based on WiFi signals has gained significant attention in recent years. Existing works rely on the dynamic signal reflections and their extracted features are dependent on motion strength. To address this issue, we design a robust WiFi-based proximity detector by considering gait monitoring. Specifically, we propose a gait score that accurately evaluates… ▽ More Proximity detection in indoor environments based on WiFi signals has gained significant attention in recent years. Existing works rely on the dynamic signal reflections and their extracted features are dependent on motion strength. To address this issue, we design a robust WiFi-based proximity detector by considering gait monitoring. Specifically, we propose a gait score that accurately evaluates gait presence by leveraging the speed estimated from the autocorrelation function (ACF) of channel state information (CSI). By combining this gait score with a proximity feature, our approach effectively distinguishes different transition patterns, enabling more reliable proximity detection. In addition, to enhance the stability of the detection process, we employ a state machine and extract temporal information, ensuring continuous proximity detection even during subtle movements. Extensive experiments conducted in different environments demonstrate an overall detection rate of 92.5% and a low false alarm rate of 1.12% with a delay of 0.825s. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: This work has been accepted in IEEE 9th World Forum on Internet of Things (WFIoT)

arXiv:2404.13671 [pdf, other]

FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization

Authors: Zhaopeng Gu, Bingke Zhu, Guibo Zhu, Yingying Chen, Hao Li, Ming Tang, **qiao Wang

Abstract: Zero-shot anomaly detection (ZSAD) methods entail detecting anomalies directly without access to any known normal or abnormal samples within the target item categories. Existing approaches typically rely on the robust generalization capabilities of multimodal pretrained models, computing similarities between manually crafted textual features representing "normal" or "abnormal" semantics and image… ▽ More Zero-shot anomaly detection (ZSAD) methods entail detecting anomalies directly without access to any known normal or abnormal samples within the target item categories. Existing approaches typically rely on the robust generalization capabilities of multimodal pretrained models, computing similarities between manually crafted textual features representing "normal" or "abnormal" semantics and image features to detect anomalies and localize anomalous patches. However, the generic descriptions of "abnormal" often fail to precisely match diverse types of anomalies across different object categories. Additionally, computing feature similarities for single patches struggles to pinpoint specific locations of anomalies with various sizes and scales. To address these issues, we propose a novel ZSAD method called FiLo, comprising two components: adaptively learned Fine-Grained Description (FG-Des) and position-enhanced High-Quality Localization (HQ-Loc). FG-Des introduces fine-grained anomaly descriptions for each category using Large Language Models (LLMs) and employs adaptively learned textual templates to enhance the accuracy and interpretability of anomaly detection. HQ-Loc, utilizing Grounding DINO for preliminary localization, position-enhanced text prompts, and Multi-scale Multi-shape Cross-modal Interaction (MMCI) module, facilitates more accurate localization of anomalies of different sizes and shapes. Experimental results on datasets like MVTec and VisA demonstrate that FiLo significantly improves the performance of ZSAD in both detection and localization, achieving state-of-the-art performance with an image-level AUC of 83.9% and a pixel-level AUC of 95.9% on the VisA dataset. △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.06007 [pdf, other]

Collaborative Edge AI Inference over Cloud-RAN

Authors: Pengfei Zhang, Dingzhu Wen, Guangxu Zhu, Qimei Chen, Kaifeng Han, Yuanming Shi

Abstract: In this paper, a cloud radio access network (Cloud-RAN) based collaborative edge AI inference architecture is proposed. Specifically, geographically distributed devices capture real-time noise-corrupted sensory data samples and extract the noisy local feature vectors, which are then aggregated at each remote radio head (RRH) to suppress sensing noise. To realize efficient uplink feature aggregatio… ▽ More In this paper, a cloud radio access network (Cloud-RAN) based collaborative edge AI inference architecture is proposed. Specifically, geographically distributed devices capture real-time noise-corrupted sensory data samples and extract the noisy local feature vectors, which are then aggregated at each remote radio head (RRH) to suppress sensing noise. To realize efficient uplink feature aggregation, we allow each RRH receives local feature vectors from all devices over the same resource blocks simultaneously by leveraging an over-the-air computation (AirComp) technique. Thereafter, these aggregated feature vectors are quantized and transmitted to a central processor (CP) for further aggregation and downstream inference tasks. Our aim in this work is to maximize the inference accuracy via a surrogate accuracy metric called discriminant gain, which measures the discernibility of different classes in the feature space. The key challenges lie on simultaneously suppressing the coupled sensing noise, AirComp distortion caused by hostile wireless channels, and the quantization error resulting from the limited capacity of fronthaul links. To address these challenges, this work proposes a joint transmit precoding, receive beamforming, and quantization error control scheme to enhance the inference accuracy. Extensive numerical experiments demonstrate the effectiveness and superiority of our proposed optimization algorithm compared to various baselines. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: This paper is accepted by IEEE Transactions on Communications on 08-Apr-2024

arXiv:2404.05033 [pdf, other]

Magic Boundaries of 3D Color Codes

Authors: Zijian Song, Guanyu Zhu

Abstract: We investigate boundaries of 3D color codes and provide a systematic classification into 101 distinct boundary types. The elementary types of boundaries are codimension-1 (2D) boundaries that condense electric particle ($Z$-type) or magnetic flux ($X$-type) excitations in the 3D color code, including the $Z$-boundary condensing only electric particles, the $X$-boundary condensing only the magnetic… ▽ More We investigate boundaries of 3D color codes and provide a systematic classification into 101 distinct boundary types. The elementary types of boundaries are codimension-1 (2D) boundaries that condense electric particle ($Z$-type) or magnetic flux ($X$-type) excitations in the 3D color code, including the $Z$-boundary condensing only electric particles, the $X$-boundary condensing only the magnetic flux, and other boundaries condensing both electric and magnetic excitations. Two novel types of boundaries can be generated based on certain elementary types. The first type is generated by applying transversal-$T$ gate on the entire code in the presence of the $X$-boundary, which effectively sweeps the codimension-1 (2D) $T$-domain wall across the system and attaches it to the $X$-boundary. Since the $T$-domain wall cannot condense on the $X$-boundary, a new magic boundary is produced, where the boundary stabilizers contain $XS$-stabilizers going beyond the conventional Pauli stabilizer formalism and hence contains `magic'. Neither electric nor magnetic excitations can condense on such a magic boundary, and only the composite of the magnetic flux and codimension-2 (1D) $S$-domain wall can condense on it, which makes the magic boundary going beyond the classification of the Lagrangian subgroup. The second type is generated by applying transversal-$S$ gate on a codimension-1 (2D) submanifold in the presence of certain codimension-1 (2D) boundaries, which effectively sweeps the $S$-domain wall across this submanifold and attaches it onto the boundary. This generates a codimension-2 (1D) nested boundary at the intersection. We also connect these novel boundaries to their previously discovered counterpart in the $\mathbb{Z}_2^3$ gauge theory equivalent to three copies of 3D toric codes, where the $S$ and $T$ domain walls correspond to gauged symmetry-protected topological (SPT) defects. △ Less

Submitted 9 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

Comments: 42 pages, 12 figures

arXiv:2404.04801 [pdf, ps, other]

doi 10.1007/s41605-024-00467-8

LHAASO-KM2A detector simulation using Geant4

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (254 additional authors not shown)

Abstract: KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with… ▽ More KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with large altitude difference (30 m) and huge coverage (1.3 km^2). In this paper, the design of the KM2A simulation code G4KM2A based on Geant4 is introduced. The process of G4KM2A is optimized mainly in memory consumption to avoid memory overffow. Some simpliffcations are used to signiffcantly speed up the execution of G4KM2A. The running time is reduced by at least 30 times compared to full detector simulation. The particle distributions and the core/angle resolution comparison between simulation and experimental data of the full KM2A array are also presented, which show good agreement. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.04685 [pdf]

Recent Advances in Nanophotonics for Optofluidics

Authors: Sen Yang, Chuchuan Hong, Guodong Zhu, Theodore H. Anyika, Ikjun Hong, Justus C. Ndukaife

Abstract: Optofluidics is dedicated to achieving integrated control of particle and fluid motion, particularly on the micrometer scale, by utilizing light to direct fluid flow and particle motion. The field has seen significant growth recently, driven by the concerted efforts of researchers across various scientific disciplines, notably for its successful applications in biomedical science. In this review,… ▽ More Optofluidics is dedicated to achieving integrated control of particle and fluid motion, particularly on the micrometer scale, by utilizing light to direct fluid flow and particle motion. The field has seen significant growth recently, driven by the concerted efforts of researchers across various scientific disciplines, notably for its successful applications in biomedical science. In this review, we explore a range of optofluidic architectures developed over the past decade, with a primary focus on mechanisms for precise control of micro and nanoscale biological objects and their applications in sensing. Regarding nanoparticle manipulation, we delve into mechanisms based on optical nanotweezers using nanolocalized light fields and light-based hybrid effects with dramatically improved performance and capabilities. In the context of sensing, we emphasize those works that used optofluidics to aggregate molecules or particles to promote sensing and detection. Additionally, we highlight emerging research directions, encompassing both fundamental principles and practical applications in the field. △ Less

Submitted 6 April, 2024; originally announced April 2024.

arXiv:2404.01591 [pdf, other]

Language Model Guided Interpretable Video Action Reasoning

Authors: Ning Wang, Guangming Zhu, HS Li, Liang Zhang, Syed Afaq Ali Shah, Mohammed Bennamoun

Abstract: While neural networks have excelled in video action recognition tasks, their black-box nature often obscures the understanding of their decision-making processes. Recent approaches used inherently interpretable models to analyze video actions in a manner akin to human reasoning. These models, however, usually fall short in performance compared to their black-box counterparts. In this work, we pres… ▽ More While neural networks have excelled in video action recognition tasks, their black-box nature often obscures the understanding of their decision-making processes. Recent approaches used inherently interpretable models to analyze video actions in a manner akin to human reasoning. These models, however, usually fall short in performance compared to their black-box counterparts. In this work, we present a new framework named Language-guided Interpretable Action Recognition framework (LaIAR). LaIAR leverages knowledge from language models to enhance both the recognition capabilities and the interpretability of video models. In essence, we redefine the problem of understanding video model decisions as a task of aligning video and language models. Using the logical reasoning captured by the language model, we steer the training of the video model. This integrated approach not only improves the video model's adaptability to different domains but also boosts its overall performance. Extensive experiments on two complex video action datasets, Charades & CAD-120, validates the improved performance and interpretability of our LaIAR framework. The code of LaIAR is available at https://github.com/NingWang2049/LaIAR. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR 2024

arXiv:2404.00836 [pdf, ps, other]

Rethinking Resource Management in Edge Learning: A Joint Pre-training and Fine-tuning Design Paradigm

Authors: Zhonghao Lyu, Yuchen Li, Guangxu Zhu, Jie Xu, H. Vincent Poor, Shuguang Cui

Abstract: In some applications, edge learning is experiencing a shift in focusing from conventional learning from scratch to new two-stage learning unifying pre-training and task-specific fine-tuning. This paper considers the problem of joint communication and computation resource management in a two-stage edge learning system. In this system, model pre-training is first conducted at an edge server via cent… ▽ More In some applications, edge learning is experiencing a shift in focusing from conventional learning from scratch to new two-stage learning unifying pre-training and task-specific fine-tuning. This paper considers the problem of joint communication and computation resource management in a two-stage edge learning system. In this system, model pre-training is first conducted at an edge server via centralized learning on local pre-stored general data, and then task-specific fine-tuning is performed at edge devices based on the pre-trained model via federated edge learning. For the two-stage learning model, we first analyze the convergence behavior (in terms of the average squared gradient norm bound), which characterizes the impacts of various system parameters such as the number of learning rounds and batch sizes in the two stages on the convergence rate. Based on our analytical results, we then propose a joint communication and computation resource management design to minimize an average squared gradient norm bound, subject to constraints on the transmit power, overall system energy consumption, and training delay. The decision variables include the number of learning rounds, batch sizes, clock frequencies, and transmit power control for both pre-training and fine-tuning stages. Finally, numerical results are provided to evaluate the effectiveness of our proposed design. It is shown that the proposed joint resource management over the pre-training and fine-tuning stages well balances the system performance trade-off among the training accuracy, delay, and energy consumption. The proposed design is also shown to effectively leverage the inherent trade-off between pre-training and fine-tuning, which arises from the differences in data distribution between pre-stored general data versus real-time task-specific data, thus efficiently optimizing overall system performance. △ Less

Submitted 31 March, 2024; originally announced April 2024.

arXiv:2403.16397 [pdf, other]

RadioGAT: A Joint Model-based and Data-driven Framework for Multi-band Radiomap Reconstruction via Graph Attention Networks

Authors: Xiaojie Li, Songyang Zhang, Hang Li, Xiaoyang Li, Lexi Xu, Haigao Xu, Hui Mei, Guangxu Zhu, Nan Qi, Ming Xiao

Abstract: Multi-band radiomap reconstruction (MB-RMR) is a key component in wireless communications for tasks such as spectrum management and network planning. However, traditional machine-learning-based MB-RMR methods, which rely heavily on simulated data or complete structured ground truth, face significant deployment challenges. These challenges stem from the differences between simulated and actual data… ▽ More Multi-band radiomap reconstruction (MB-RMR) is a key component in wireless communications for tasks such as spectrum management and network planning. However, traditional machine-learning-based MB-RMR methods, which rely heavily on simulated data or complete structured ground truth, face significant deployment challenges. These challenges stem from the differences between simulated and actual data, as well as the scarcity of real-world measurements. To address these challenges, our study presents RadioGAT, a novel framework based on Graph Attention Network (GAT) tailored for MB-RMR within a single area, eliminating the need for multi-region datasets. RadioGAT innovatively merges model-based spatial-spectral correlation encoding with data-driven radiomap generalization, thus minimizing the reliance on extensive data sources. The framework begins by transforming sparse multi-band data into a graph structure through an innovative encoding strategy that leverages radio propagation models to capture the spatial-spectral correlation inherent in the data. This graph-based representation not only simplifies data handling but also enables tailored label sampling during training, significantly enhancing the framework's adaptability for deployment. Subsequently, The GAT is employed to generalize the radiomap information across various frequency bands. Extensive experiments using raytracing datasets based on real-world environments have demonstrated RadioGAT's enhanced accuracy in supervised learning settings and its robustness in semi-supervised scenarios. These results underscore RadioGAT's effectiveness and practicality for MB-RMR in environments with limited data availability. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: submitted to IEEE journal for possible publication

arXiv:2403.15145 [pdf, ps, other]

Robust Resource Allocation for STAR-RIS Assisted SWIPT Systems

Authors: Guangyu Zhu, Xidong Mu, Li Guo, Ao Huang, Shibiao Xu

Abstract: A simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) assisted simultaneous wireless information and power transfer (SWIPT) system is proposed. More particularly, an STAR-RIS is deployed to assist in the information/power transfer from a multi-antenna access point (AP) to multiple single-antenna information users (IUs) and energy users (EUs), where two practica… ▽ More A simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) assisted simultaneous wireless information and power transfer (SWIPT) system is proposed. More particularly, an STAR-RIS is deployed to assist in the information/power transfer from a multi-antenna access point (AP) to multiple single-antenna information users (IUs) and energy users (EUs), where two practical STAR-RIS operating protocols, namely energy splitting (ES) and time switching (TS), are employed. Under the imperfect channel state information (CSI) condition, a multi-objective optimization problem (MOOP) framework, that simultaneously maximizes the minimum data rate and minimum harvested power, is employed to investigate the fundamental rate-energy trade-off between IUs and EUs. To obtain the optimal robust resource allocation strategy, the MOOP is first transformed into a single-objective optimization problem (SOOP) via the ε-constraint method, which is then reformulated by approximating semi-infinite inequality constraints with the S-procedure. For ES, an alternating optimization (AO)-based algorithm is proposed to jointly design AP active beamforming and STAR-RIS passive beamforming, where a penalty method is leveraged in STAR-RIS beamforming design. Furthermore, the developed algorithm is extended to optimize the time allocation policy and beamforming vectors in a two-layer iterative manner for TS. Numerical results reveal that: 1) deploying STAR-RISs achieves a significant performance gain over conventional RISs, especially in terms of harvested power for EUs; 2) the ES protocol obtains a better user fairness performance when focusing only on IUs or EUs, while the TS protocol yields a better balance between IUs and EUs; 3) the imperfect CSI affects IUs more significantly than EUs, whereas TS can confer a more robust design to attenuate these effects. △ Less

Submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.12400 [pdf, other]

Finding the Missing Data: A BERT-inspired Approach Against Package Loss in Wireless Sensing

Authors: Zijian Zhao, Tingwei Chen, Fanyi Meng, Hang Li, Xiaoyang Li, Guangxu Zhu

Abstract: Despite the development of various deep learning methods for Wi-Fi sensing, package loss often results in noncontinuous estimation of the Channel State Information (CSI), which negatively impacts the performance of the learning models. To overcome this challenge, we propose a deep learning model based on Bidirectional Encoder Representations from Transformers (BERT) for CSI recovery, named CSI-BER… ▽ More Despite the development of various deep learning methods for Wi-Fi sensing, package loss often results in noncontinuous estimation of the Channel State Information (CSI), which negatively impacts the performance of the learning models. To overcome this challenge, we propose a deep learning model based on Bidirectional Encoder Representations from Transformers (BERT) for CSI recovery, named CSI-BERT. CSI-BERT can be trained in an self-supervised manner on the target dataset without the need for additional data. Furthermore, unlike traditional interpolation methods that focus on one subcarrier at a time, CSI-BERT captures the sequential relationships across different subcarriers. Experimental results demonstrate that CSI-BERT achieves lower error rates and faster speed compared to traditional interpolation methods, even when facing with high loss rates. Moreover, by harnessing the recovered CSI obtained from CSI-BERT, other deep learning models like Residual Network and Recurrent Neural Network can achieve an average increase in accuracy of approximately 15\% in Wi-Fi sensing tasks. The collected dataset WiGesture and code for our model are publicly available at https://github.com/RS2002/CSI-BERT. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 6 pages, accepted by IEEE INFOCOM Deepwireless Workshop 2024

arXiv:2403.11693 [pdf, other]

Beamforming Design for Semantic-Bit Coexisting Communication System

Authors: Maojun Zhang, Guangxu Zhu, Richeng **, Xiaoming Chen, Qingjiang Shi, Caijun Zhong, Kaibin Huang

Abstract: Semantic communication (SemCom) is emerging as a key technology for future sixth-generation (6G) systems. Unlike traditional bit-level communication (BitCom), SemCom directly optimizes performance at the semantic level, leading to superior communication efficiency. Nevertheless, the task-oriented nature of SemCom renders it challenging to completely replace BitCom. Consequently, it is desired to c… ▽ More Semantic communication (SemCom) is emerging as a key technology for future sixth-generation (6G) systems. Unlike traditional bit-level communication (BitCom), SemCom directly optimizes performance at the semantic level, leading to superior communication efficiency. Nevertheless, the task-oriented nature of SemCom renders it challenging to completely replace BitCom. Consequently, it is desired to consider a semantic-bit coexisting communication system, where a base station (BS) serves SemCom users (sem-users) and BitCom users (bit-users) simultaneously. Such a system faces severe and heterogeneous inter-user interference. In this context, this paper provides a new semantic-bit coexisting communication framework and proposes a spatial beamforming scheme to accommodate both types of users. Specifically, we consider maximizing the semantic rate for semantic users while ensuring the quality-of-service (QoS) requirements for bit-users. Due to the intractability of obtaining the exact closed-form expression of the semantic rate, a data driven method is first applied to attain an approximated expression via data fitting. With the resulting complex transcendental function, majorization minimization (MM) is adopted to convert the original formulated problem into a multiple-ratio problem, which allows fractional programming (FP) to be used to further transform the problem into an inhomogeneous quadratically constrained quadratic programs (QCQP) problem. Solving the problem leads to a semi-closed form solution with undetermined Lagrangian factors that can be updated by a fixed point algorithm. Extensive simulation results demonstrate that the proposed beamforming scheme significantly outperforms conventional beamforming algorithms such as zero-forcing (ZF), maximum ratio transmission (MRT), and weighted minimum mean-square error (WMMSE). △ Less

Submitted 22 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: Submitted to IEEE for possible publication

arXiv:2403.11550 [pdf, other]

TARN-VIST: Topic Aware Reinforcement Network for Visual Storytelling

Authors: Weiran Chen, Xin Li, Jiaqi Su, Guiqian Zhu, Ying Li, Yi Ji, Chun** Liu

Abstract: As a cross-modal task, visual storytelling aims to generate a story for an ordered image sequence automatically. Different from the image captioning task, visual storytelling requires not only modeling the relationships between objects in the image but also mining the connections between adjacent images. Recent approaches primarily utilize either end-to-end frameworks or multi-stage frameworks to… ▽ More As a cross-modal task, visual storytelling aims to generate a story for an ordered image sequence automatically. Different from the image captioning task, visual storytelling requires not only modeling the relationships between objects in the image but also mining the connections between adjacent images. Recent approaches primarily utilize either end-to-end frameworks or multi-stage frameworks to generate relevant stories, but they usually overlook latent topic information. In this paper, in order to generate a more coherent and relevant story, we propose a novel method, Topic Aware Reinforcement Network for VIsual StoryTelling (TARN-VIST). In particular, we pre-extracted the topic information of stories from both visual and linguistic perspectives. Then we apply two topic-consistent reinforcement learning rewards to identify the discrepancy between the generated story and the human-labeled story so as to refine the whole generation process. Extensive experimental results on the VIST dataset and human evaluation demonstrate that our proposed model outperforms most of the competitive models across multiple evaluation metrics. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.10493 [pdf, other]

MusicHiFi: Fast High-Fidelity Stereo Vocoding

Authors: Ge Zhu, Juan-Pablo Caceres, Zhiyao Duan, Nicholas J. Bryan

Abstract: Diffusion-based audio and music generation models commonly generate music by constructing an image representation of audio (e.g., a mel-spectrogram) and then converting it to audio using a phase reconstruction model or vocoder. Typical vocoders, however, produce monophonic audio at lower resolutions (e.g., 16-24 kHz), which limits their effectiveness. We propose MusicHiFi -- an efficient high-fide… ▽ More Diffusion-based audio and music generation models commonly generate music by constructing an image representation of audio (e.g., a mel-spectrogram) and then converting it to audio using a phase reconstruction model or vocoder. Typical vocoders, however, produce monophonic audio at lower resolutions (e.g., 16-24 kHz), which limits their effectiveness. We propose MusicHiFi -- an efficient high-fidelity stereophonic vocoder. Our method employs a cascade of three generative adversarial networks (GANs) that convert low-resolution mel-spectrograms to audio, upsamples to high-resolution audio via bandwidth expansion, and upmixes to stereophonic audio. Compared to previous work, we propose 1) a unified GAN-based generator and discriminator architecture and training procedure for each stage of our cascade, 2) a new fast, near downsampling-compatible bandwidth extension module, and 3) a new fast downmix-compatible mono-to-stereo upmixer that ensures the preservation of monophonic content in the output. We evaluate our approach using both objective and subjective listening tests and find our approach yields comparable or better audio quality, better spatialization control, and significantly faster inference speed compared to past work. Sound examples are at https://MusicHiFi.github.io/web/. △ Less

Submitted 20 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.10010 [pdf, other]

doi 10.1103/PhysRevLett.132.131002

Measurements of All-Particle Energy Spectrum and Mean Logarithmic Mass of Cosmic Rays from 0.3 to 30 PeV with LHAASO-KM2A

Authors: The LHAASO Collaboration, Zhen Cao, F. Aharonian, Q. An, A. Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen , et al. (256 additional authors not shown)

Abstract: We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at… ▽ More We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at $3.67 \pm 0.05 \pm 0.15$ PeV. Below the knee, the spectral index is found to be -$2.7413 \pm 0.0004 \pm 0.0050$, while above the knee, it is -$3.128 \pm 0.005 \pm 0.027$, with the sharpness of the transition measured with a statistical error of 2%. The mean logarithmic mass of cosmic rays is almost heavier than helium in the whole measured energy range. It decreases from 1.7 at 0.3 PeV to 1.3 at 3 PeV, representing a 24% decline following a power law with an index of -$0.1200 \pm 0.0003 \pm 0.0341$. This is equivalent to an increase in abundance of light components. Above the knee, the mean logarithmic mass exhibits a power law trend towards heavier components, which is reversal to the behavior observed in the all-particle energy spectrum. Additionally, the knee position and the change in power-law index are approximately the same. These findings suggest that the knee observed in the all-particle spectrum corresponds to the knee of the light component, rather than the medium-heavy components. △ Less

Submitted 26 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: 8 pages, 3 figures

Journal ref: Physical Review Letters 132, 131002 (2024)

arXiv:2403.06361 [pdf, other]

See Through Their Minds: Learning Transferable Neural Representation from Cross-Subject fMRI

Authors: Yulong Liu, Yongqiang Ma, Guibo Zhu, Haodong **g, Nanning Zheng

Abstract: Deciphering visual content from functional Magnetic Resonance Imaging (fMRI) helps illuminate the human vision system. However, the scarcity of fMRI data and noise hamper brain decoding model performance. Previous approaches primarily employ subject-specific models, sensitive to training sample size. In this paper, we explore a straightforward but overlooked solution to address data scarcity. We p… ▽ More Deciphering visual content from functional Magnetic Resonance Imaging (fMRI) helps illuminate the human vision system. However, the scarcity of fMRI data and noise hamper brain decoding model performance. Previous approaches primarily employ subject-specific models, sensitive to training sample size. In this paper, we explore a straightforward but overlooked solution to address data scarcity. We propose shallow subject-specific adapters to map cross-subject fMRI data into unified representations. Subsequently, a shared deeper decoding model decodes cross-subject features into the target feature space. During training, we leverage both visual and textual supervision for multi-modal brain decoding. Our model integrates a high-level perception decoding pipeline and a pixel-wise reconstruction pipeline guided by high-level perceptions, simulating bottom-up and top-down processes in neuroscience. Empirical experiments demonstrate robust neural representation learning across subjects for both pipelines. Moreover, merging high-level and low-level information improves both low-level and high-level reconstruction metrics. Additionally, we successfully transfer learned general knowledge to new subjects by training new adapters with limited training data. Compared to previous state-of-the-art methods, notably pre-training-based methods (Mind-Vis and fMRI-PTE), our approach achieves comparable or superior results across diverse tasks, showing promise as an alternative method for cross-subject fMRI data pre-training. Our code and pre-trained weights will be publicly released at https://github.com/YulongBonjour/See_Through_Their_Minds. △ Less

Submitted 13 June, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

Comments: A versatile brain decoding method learning from cross-subject fMRI data

arXiv:2403.04767 [pdf, other]

Robust teleportation of a surface code and cascade of topological quantum phase transitions

Authors: Finn Eckstein, Bo Han, Simon Trebst, Guo-Yi Zhu

Abstract: Teleportation is a facet where quantum measurements can act as a powerful resource in quantum physics, as local measurements allow to steer quantum information in a non-local way. While this has long been established for a single Bell pair, the teleportation of a fault-tolerant logical qubit presents a fundamentally different challenge as it requires the teleportation of a many-qubit state. Here w… ▽ More Teleportation is a facet where quantum measurements can act as a powerful resource in quantum physics, as local measurements allow to steer quantum information in a non-local way. While this has long been established for a single Bell pair, the teleportation of a fault-tolerant logical qubit presents a fundamentally different challenge as it requires the teleportation of a many-qubit state. Here we investigate a tangible protocol for teleporting a long-range entangled surface code state using elementary Bell measurements and its stability in the presence of tunable coherent errors. We relate the underlying threshold problem to the physics of anyon condensation under weak measurements and map it to a variant of the Ashkin-Teller model of statistical mechanics with Nishimori type disorder, which gives rise to a cascade of phase transitions. Tuning the angle of the local Bell measurements, we find a continuously varying threshold. Notably, the threshold moves to infinity for the $X+Z$ angle along the self-dual line -- indicating an optimal protocol that is fault-tolerant even in the presence of coherent noise. Our teleportation protocol, which can be readily implemented in dynamically configurable Rydberg atom arrays, thereby gives guidance for a practical demonstration of the power of quantum measurements. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 5 + 4 pages; 3 + 4 figures

Showing 1–50 of 630 results for author: Zhu, G