Search | arXiv e-print repository

On Robust Wasserstein Barycenter: The Model and Algorithm

Authors: Xu Wang, Jiawei Huang, Qingyuan Yang, **peng Zhang

Abstract: The Wasserstein barycenter problem is to compute the average of $m$ given probability measures, which has been widely studied in many different areas; however, real-world data sets are often noisy and huge, which impedes its applications in practice. Hence, in this paper, we focus on improving the computational efficiency of two types of robust Wasserstein barycenter problem (RWB): fixed-support R… ▽ More The Wasserstein barycenter problem is to compute the average of $m$ given probability measures, which has been widely studied in many different areas; however, real-world data sets are often noisy and huge, which impedes its applications in practice. Hence, in this paper, we focus on improving the computational efficiency of two types of robust Wasserstein barycenter problem (RWB): fixed-support RWB (fixed-RWB) and free-support RWB (free-RWB); actually, the former is a subroutine of the latter. Firstly, we improve efficiency through model reducing; we reduce RWB as an augmented Wasserstein barycenter problem, which works for both fixed-RWB and free-RWB. Especially, fixed-RWB can be computed within $\widetilde{O}(\frac{mn^2}{ε_+})$ time by using an off-the-shelf solver, where $ε_+$ is the pre-specified additive error and $n$ is the size of locations of input measures. Then, for free-RWB, we leverage a quality guaranteed data compression technique, coreset, to accelerate computation by reducing the data set size $m$. It shows that running algorithms on the coreset is enough instead of on the original data set. Next, by combining the model reducing and coreset techniques above, we propose an algorithm for free-RWB by updating the weights and locations alternatively. Finally, our experiments demonstrate the efficiency of our techniques. △ Less

Submitted 25 December, 2023; originally announced December 2023.

Comments: Algorithms for accelerating robust Wasserstein barycenter problem

arXiv:2312.15584 [pdf]

doi 10.1103/PhysRevB.109.174412

Controllable magnon frequency comb in synthetic ferrimagnets

Authors: Y. Liu, T. T. Liu, Q. Q. Yang, G. Tian, Z. P. Hou, D. Y. Chen, Z. Fan, M. Zeng, X. B. Lu, X. S. Gao, M. H. Qin, J. M. Liu

Abstract: Magnon frequency comb provides opportunities for exploring magnon nonlinear effects and measuring the transmission magnon frequency in magnets, whose controllability becomes vital for modulating the operating frequency and improving the measurement accuracy. Nevertheless, such controllable frequency comb remains to be explored. In this work, we investigate theoretically and numerically the skyrmio… ▽ More Magnon frequency comb provides opportunities for exploring magnon nonlinear effects and measuring the transmission magnon frequency in magnets, whose controllability becomes vital for modulating the operating frequency and improving the measurement accuracy. Nevertheless, such controllable frequency comb remains to be explored. In this work, we investigate theoretically and numerically the skyrmion-induced magnon frequency comb effect generated by interaction between the magnon excitation mode and skyrmion breathing mode in synthetic ferrimagnets. It is revealed that both the skyrmion breathing mode and the magnon frequency gap closely depend on the net angular momentum δs, emphasizing the pivotal role of δs as an effective control parameter in governing the comb teeth. With the increase of δs, the skyrmion size decreases, which results in the enlargement of the breathing frequency and the distance between the comb teeth. Moreover, the dependences of the magnon frequency gap on δs and the inter-layer coupling allow one to modulate the comb lowest coherent frequency via structural control. Consequently, the coherent modes generated by the comb may range from gigahertz to terahertz frequencies, serving as a bridge between microwave and terahertz waves. Thus, this work represents a substantial advance in understanding the magnon frequency comb effect in ferrimagnets. △ Less

Submitted 11 March, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

Comments: 27 pages, 8 figures

Journal ref: Physical Review B 109, 174412 (2024)

arXiv:2312.11583 [pdf, other]

AI-Based Energy Transportation Safety: Pipeline Radial Threat Estimation Using Intelligent Sensing System

Authors: Chengyuan Zhu, Yiyuan Yang, Kaixiang Yang, Haifeng Zhang, Qinmin Yang, C. L. Philip Chen

Abstract: The application of artificial intelligence technology has greatly enhanced and fortified the safety of energy pipelines, particularly in safeguarding against external threats. The predominant methods involve the integration of intelligent sensors to detect external vibration, enabling the identification of event types and locations, thereby replacing manual detection methods. However, practical im… ▽ More The application of artificial intelligence technology has greatly enhanced and fortified the safety of energy pipelines, particularly in safeguarding against external threats. The predominant methods involve the integration of intelligent sensors to detect external vibration, enabling the identification of event types and locations, thereby replacing manual detection methods. However, practical implementation has exposed a limitation in current methods - their constrained ability to accurately discern the spatial dimensions of external signals, which complicates the authentication of threat events. Our research endeavors to overcome the above issues by harnessing deep learning techniques to achieve a more fine-grained recognition and localization process. This refinement is crucial in effectively identifying genuine threats to pipelines, thus enhancing the safety of energy transportation. This paper proposes a radial threat estimation method for energy pipelines based on distributed optical fiber sensing technology. Specifically, we introduce a continuous multi-view and multi-domain feature fusion methodology to extract comprehensive signal features and construct a threat estimation and recognition network. The utilization of collected acoustic signal data is optimized, and the underlying principle is elucidated. Moreover, we incorporate the concept of transfer learning through a pre-trained model, enhancing both recognition accuracy and training efficiency. Empirical evidence gathered from real-world scenarios underscores the efficacy of our method, notably in its substantial reduction of false alarms and remarkable gains in recognition accuracy. More generally, our method exhibits versatility and can be extrapolated to a broader spectrum of recognition tasks and scenarios. △ Less

Submitted 25 December, 2023; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: The 38th Annual AAAI Conference on Artificial Intelligence (AAAI 2024)

arXiv:2312.11111 [pdf, other]

The Good, The Bad, and Why: Unveiling Emotions in Generative AI

Authors: Cheng Li, **dong Wang, Yixuan Zhang, Kaijie Zhu, Xinyi Wang, Wenxin Hou, Jianxun Lian, Fang Luo, Qiang Yang, Xing Xie

Abstract: Emotion significantly impacts our daily behaviors and interactions. While recent generative AI models, such as large language models, have shown impressive performance in various tasks, it remains unclear whether they truly comprehend emotions. This paper aims to address this gap by incorporating psychological theories to gain a holistic understanding of emotions in generative AI models. Specifica… ▽ More Emotion significantly impacts our daily behaviors and interactions. While recent generative AI models, such as large language models, have shown impressive performance in various tasks, it remains unclear whether they truly comprehend emotions. This paper aims to address this gap by incorporating psychological theories to gain a holistic understanding of emotions in generative AI models. Specifically, we propose three approaches: 1) EmotionPrompt to enhance AI model performance, 2) EmotionAttack to impair AI model performance, and 3) EmotionDecode to explain the effects of emotional stimuli, both benign and malignant. Through extensive experiments involving language and multi-modal models on semantic understanding, logical reasoning, and generation tasks, we demonstrate that both textual and visual EmotionPrompt can boost the performance of AI models while EmotionAttack can hinder it. Additionally, EmotionDecode reveals that AI models can comprehend emotional stimuli akin to the mechanism of dopamine in the human brain. Our work heralds a novel avenue for exploring psychology to enhance our understanding of generative AI models. △ Less

Submitted 7 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: International Conference on Machine Learning (ICML) 2024; an extension to EmotionPrompt (arXiv:2307.11760)

arXiv:2312.11102 [pdf, ps, other]

Holographic Imaging with XL-MIMO and RIS: Illumination and Reflection Design

Authors: Giulia Torcolacci, Anna Guerra, Haiyang Zhang, Francesco Guidi, Qianyu Yang, Yonina C. Eldar, Davide Dardari

Abstract: This paper addresses a near-field imaging problem utilizing extremely large-scale multiple-input multiple-output (XL-MIMO) antennas and reconfigurable intelligent surfaces (RISs) already in place for wireless communications. To this end, we consider a system with a fixed transmitting antenna array illuminating a region of interest (ROI) and a fixed receiving antenna array inferring the ROI's scatt… ▽ More This paper addresses a near-field imaging problem utilizing extremely large-scale multiple-input multiple-output (XL-MIMO) antennas and reconfigurable intelligent surfaces (RISs) already in place for wireless communications. To this end, we consider a system with a fixed transmitting antenna array illuminating a region of interest (ROI) and a fixed receiving antenna array inferring the ROI's scattering coefficients. Leveraging XL-MIMO and high frequencies, the ROI is situated in the radiative near-field region of both antenna arrays, thus enhancing the degrees of freedom (DoF) (i.e., the channel matrix rank) of the illuminating and sensing channels available for imaging, here referred to as holographic imaging. To further boost the imaging performance, we optimize the illuminating waveform by solving a min-max optimization problem having the upper bound of the mean squared error (MSE) of the image estimate as the objective function. Additionally, we address the challenge of non-line-of-sight (NLOS) scenarios by considering the presence of a RIS and deriving its optimal reflection coefficients. Numerical results investigate the interplay between illumination optimization, geometric configuration (monostatic and bistatic), the DoF of the illuminating and sensing channels, image estimation accuracy, and image complexity. △ Less

Submitted 13 May, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.10640 [pdf, other]

Inverse design of coherent supercontinuum generation using free-form nanophotonic waveguides

Authors: Chia-Yi Lee, Yanwu Liu, Yinke Cheng, Cheng-Hao Lao, Qihuang Gong, Qi-Fan Yang

Abstract: Many key functionalities of optical frequency combs such as self-referencing and broad spectral access rely on coherent supercontinuum generation (SCG). While nanophotonic waveguides have emerged as a compact and power-efficient platform for SCG, their geometric degrees of freedom have not been fully utilized due to the underlying nonlinear and stochastic physics. Here, we introduce inverse design… ▽ More Many key functionalities of optical frequency combs such as self-referencing and broad spectral access rely on coherent supercontinuum generation (SCG). While nanophotonic waveguides have emerged as a compact and power-efficient platform for SCG, their geometric degrees of freedom have not been fully utilized due to the underlying nonlinear and stochastic physics. Here, we introduce inverse design to unlock free-form waveguides for coherent SCG. The efficacy of our design is numerically and experimentally demonstrated on Si3N4 waveguides, producing flat and coherent spectra from visible to mid-infrared wavelengths. Our work has direct applications in develo** chip-based broadband light sources for spectroscopy, metrology, and sensing across multiple spectral regimes. △ Less

Submitted 17 December, 2023; originally announced December 2023.

arXiv:2312.10472 [pdf, other]

Analyzing Generalization in Policy Networks: A Case Study with the Double-Integrator System

Authors: Ruining Zhang, Haoran Han, Maolong Lv, Qisong Yang, Jian Cheng

Abstract: Extensive utilization of deep reinforcement learning (DRL) policy networks in diverse continuous control tasks has raised questions regarding performance degradation in expansive state spaces where the input state norm is larger than that in the training environment. This paper aims to uncover the underlying factors contributing to such performance deterioration when dealing with expanded state sp… ▽ More Extensive utilization of deep reinforcement learning (DRL) policy networks in diverse continuous control tasks has raised questions regarding performance degradation in expansive state spaces where the input state norm is larger than that in the training environment. This paper aims to uncover the underlying factors contributing to such performance deterioration when dealing with expanded state spaces, using a novel analysis technique known as state division. In contrast to prior approaches that employ state division merely as a post-hoc explanatory tool, our methodology delves into the intrinsic characteristics of DRL policy networks. Specifically, we demonstrate that the expansion of state space induces the activation function $\tanh$ to exhibit saturability, resulting in the transformation of the state division boundary from nonlinear to linear. Our analysis centers on the paradigm of the double-integrator system, revealing that this gradual shift towards linearity imparts a control behavior reminiscent of bang-bang control. However, the inherent linearity of the division boundary prevents the attainment of an ideal bang-bang control, thereby introducing unavoidable overshooting. Our experimental investigations, employing diverse RL algorithms, establish that this performance phenomenon stems from inherent attributes of the DRL policy network, remaining consistent across various optimization algorithms. △ Less

Submitted 31 December, 2023; v1 submitted 16 December, 2023; originally announced December 2023.

arXiv:2312.10305 [pdf, other]

Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction

Authors: Zhaoxi Mu, Xinyu Yang, Sining Sun, Qing Yang

Abstract: Speech signals are inherently complex as they encompass both global acoustic characteristics and local semantic information. However, in the task of target speech extraction, certain elements of global and local semantic information in the reference speech, which are irrelevant to speaker identity, can lead to speaker confusion within the speech extraction network. To overcome this challenge, we p… ▽ More Speech signals are inherently complex as they encompass both global acoustic characteristics and local semantic information. However, in the task of target speech extraction, certain elements of global and local semantic information in the reference speech, which are irrelevant to speaker identity, can lead to speaker confusion within the speech extraction network. To overcome this challenge, we propose a self-supervised disentangled representation learning method. Our approach tackles this issue through a two-phase process, utilizing a reference speech encoding network and a global information disentanglement network to gradually disentangle the speaker identity information from other irrelevant factors. We exclusively employ the disentangled speaker identity information to guide the speech extraction network. Moreover, we introduce the adaptive modulation Transformer to ensure that the acoustic representation of the mixed signal remains undisturbed by the speaker embeddings. This component incorporates speaker embeddings as conditional information, facilitating natural and efficient guidance for the speech extraction network. Experimental results substantiate the effectiveness of our meticulously crafted approach, showcasing a substantial reduction in the likelihood of speaker confusion. △ Less

Submitted 19 January, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI2024

arXiv:2312.10031 [pdf, other]

Secular Dynamics of Compact Three-Planet Systems

Authors: Qing Yang, Daniel Tamayo

Abstract: The secular Laplace-Lagrange orbital solution, decomposing eccentricities into a set of uniformly precessing eigenmodes is a classical result that is typically solved numerically. However, in the limit where orbits are closely spaced, several simplifications make it possible to make analytical progress. We derive simple expressions for the eccentricity eigenmodes in a co-planar 3-planet system whe… ▽ More The secular Laplace-Lagrange orbital solution, decomposing eccentricities into a set of uniformly precessing eigenmodes is a classical result that is typically solved numerically. However, in the limit where orbits are closely spaced, several simplifications make it possible to make analytical progress. We derive simple expressions for the eccentricity eigenmodes in a co-planar 3-planet system where the middle planet is massless, and show that these approximate the true eigenmodes of more general systems with 3 massive planets in various limits. These results provide intuition for the secular dynamics of real systems, and have applications for understanding the stability boundary for compact multi-planet systems. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: 10 pages, submitted to ApJ

arXiv:2312.09007 [pdf, other]

LLMind: Orchestrating AI and IoT with LLM for Complex Task Execution

Authors: Hongwei Cui, Yuyang Du, Qun Yang, Yulin Shao, Soung Chang Liew

Abstract: The exploration of large language models (LLMs) for task planning and IoT automation has recently gained significant attention. However, existing works suffer from limitations in terms of resource accessibility, complex task planning, and efficiency. In this paper, we present LLMind, an LLM-based AI agent framework that enables effective collaboration among IoT devices for executing complex tasks.… ▽ More The exploration of large language models (LLMs) for task planning and IoT automation has recently gained significant attention. However, existing works suffer from limitations in terms of resource accessibility, complex task planning, and efficiency. In this paper, we present LLMind, an LLM-based AI agent framework that enables effective collaboration among IoT devices for executing complex tasks. Inspired by the functional specialization theory of the brain, our framework integrates an LLM with domain-specific AI modules, enhancing its capabilities. Complex tasks, which may involve collaborations of multiple domain-specific AI modules and IoT devices, are executed through a control script generated by the LLM using a Language-Code transformation approach, which first converts language descriptions to an intermediate finite-state machine (FSM) before final precise transformation to code. Furthermore, the framework incorporates a novel experience accumulation mechanism to enhance response speed and effectiveness, allowing the framework to evolve and become progressively sophisticated through continuing user and machine interactions. △ Less

Submitted 20 February, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

arXiv:2312.06723 [pdf, other]

Learning to See Low-Light Images via Feature Domain Adaptation

Authors: Qirui Yang, Qihua Cheng, Huan**g Yue, Le Zhang, Yihao Liu, **gyu Yang

Abstract: Raw low light image enhancement (LLIE) has achieved much better performance than the sRGB domain enhancement methods due to the merits of raw data. However, the ambiguity between noisy to clean and raw to sRGB map**s may mislead the single-stage enhancement networks. The two-stage networks avoid ambiguity by decoupling the two map**s but usually have large computing complexity. To solve this p… ▽ More Raw low light image enhancement (LLIE) has achieved much better performance than the sRGB domain enhancement methods due to the merits of raw data. However, the ambiguity between noisy to clean and raw to sRGB map**s may mislead the single-stage enhancement networks. The two-stage networks avoid ambiguity by decoupling the two map**s but usually have large computing complexity. To solve this problem, we propose a single-stage network empowered by Feature Domain Adaptation (FDA) to decouple the denoising and color map** tasks in raw LLIE. The denoising encoder is supervised by the clean raw image, and then the denoised features are adapted for the color map** task by an FDA module. We propose a Lineformer to serve as the FDA, which can well explore the global and local correlations with fewer line buffers (friendly to the line-based imaging process). During inference, the raw supervision branch is removed. In this way, our network combines the advantage of a two-stage enhancement process with the efficiency of single-stage inference. Experiments on four benchmark datasets demonstrate that our method achieves state-of-the-art performance with fewer computing costs (60% FLOPs of the two-stage method DNF). Our codes will be released after the acceptance of this work. △ Less

Submitted 19 December, 2023; v1 submitted 10 December, 2023; originally announced December 2023.

arXiv:2312.06614 [pdf, other]

AttenScribble: Attentive Similarity Learning for Scribble-Supervised Medical Image Segmentation

Authors: Mu Tian, Qinzhu Yang, Yi Gao

Abstract: The success of deep networks in medical image segmentation relies heavily on massive labeled training data. However, acquiring dense annotations is a time-consuming process. Weakly-supervised methods normally employ less expensive forms of supervision, among which scribbles started to gain popularity lately thanks to its flexibility. However, due to lack of shape and boundary information, it is ex… ▽ More The success of deep networks in medical image segmentation relies heavily on massive labeled training data. However, acquiring dense annotations is a time-consuming process. Weakly-supervised methods normally employ less expensive forms of supervision, among which scribbles started to gain popularity lately thanks to its flexibility. However, due to lack of shape and boundary information, it is extremely challenging to train a deep network on scribbles that generalizes on unlabeled pixels. In this paper, we present a straightforward yet effective scribble supervised learning framework. Inspired by recent advances of transformer based segmentation, we create a pluggable spatial self-attention module which could be attached on top of any internal feature layers of arbitrary fully convolutional network (FCN) backbone. The module infuses global interaction while kee** the efficiency of convolutions. Descended from this module, we construct a similarity metric based on normalized and symmetrized attention. This attentive similarity leads to a novel regularization loss that imposes consistency between segmentation prediction and visual affinity. This attentive similarity loss optimizes the alignment of FCN encoders, attention map** and model prediction. Ultimately, the proposed FCN+Attention architecture can be trained end-to-end guided by a combination of three learning objectives: partial segmentation loss, a customized masked conditional random fields and the proposed attentive similarity loss. Extensive experiments on public datasets (ACDC and CHAOS) showed that our framework not just out-performs existing state-of-the-art, but also delivers close performance to fully-supervised benchmark. Code will be available upon publication. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: 11 pages, 3 figures, a modified version was submitted to Computerized Medical Imaging and Graphics and is under review

arXiv:2312.06462 [pdf, other]

Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation

Authors: Qi Yang, Xing Nie, Tong Li, Pengfei Gao, Ying Guo, Cheng Zhen, Pengfei Yan, Shiming Xiang

Abstract: Recently, an audio-visual segmentation (AVS) task has been introduced, aiming to group pixels with sounding objects within a given video. This task necessitates a first-ever audio-driven pixel-level understanding of the scene, posing significant challenges. In this paper, we propose an innovative audio-visual transformer framework, termed COMBO, an acronym for COoperation of Multi-order Bilateral… ▽ More Recently, an audio-visual segmentation (AVS) task has been introduced, aiming to group pixels with sounding objects within a given video. This task necessitates a first-ever audio-driven pixel-level understanding of the scene, posing significant challenges. In this paper, we propose an innovative audio-visual transformer framework, termed COMBO, an acronym for COoperation of Multi-order Bilateral relatiOns. For the first time, our framework explores three types of bilateral entanglements within AVS: pixel entanglement, modality entanglement, and temporal entanglement. Regarding pixel entanglement, we employ a Siam-Encoder Module (SEM) that leverages prior knowledge to generate more precise visual features from the foundational model. For modality entanglement, we design a Bilateral-Fusion Module (BFM), enabling COMBO to align corresponding visual and auditory signals bi-directionally. As for temporal entanglement, we introduce an innovative adaptive inter-frame consistency loss according to the inherent rules of temporal. Comprehensive experiments and ablation studies on AVSBench-object (84.7 mIoU on S4, 59.2 mIou on MS3) and AVSBench-semantic (42.1 mIoU on AVSS) datasets demonstrate that COMBO surpasses previous state-of-the-art methods. Code and more results will be publicly available at https://yannqi.github.io/AVS-COMBO/. △ Less

Submitted 7 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: CVPR 2024 Highlight. 13 pages, 10 figures

arXiv:2312.06097 [pdf, other]

doi 10.1007/s11433-023-2219-7

The FAST all sky HI survey (FASHI): The first release of catalog

Authors: Chuan-Peng Zhang, M. Zhu, P. Jiang, C. Cheng, J. Wang, J. Wang, J. -L. Xu, X. -L. Liu, N. -P. Yu, L. Qian, H. Yu, M. Ai, Y. **g, C. Xu, Z. Liu, X. Guan, C. Sun, Q. Yang, M. Huang, Q. Hao, FAST Collaboration

Abstract: The FAST All Sky HI survey (FASHI) was designed to cover the entire sky observable by the Five-hundred-meter Aperture Spherical radio Telescope (FAST), spanning approximately 22000 square degrees of declination between -14 deg and +66 deg, and in the frequency range of 1050-1450 MHz, with the expectation of eventually detecting more than 100000 HI sources. Between August 2020 and June 2023, FASHI… ▽ More The FAST All Sky HI survey (FASHI) was designed to cover the entire sky observable by the Five-hundred-meter Aperture Spherical radio Telescope (FAST), spanning approximately 22000 square degrees of declination between -14 deg and +66 deg, and in the frequency range of 1050-1450 MHz, with the expectation of eventually detecting more than 100000 HI sources. Between August 2020 and June 2023, FASHI had covered more than 7600 square degrees, which is approximately 35% of the total sky observable by FAST. It has a median detection sensitivity of around 0.76 mJy/beam and a spectral line velocity resolution of ~6.4 km/s at a frequency of ~1.4 GHz. As of now, a total of 41741 extragalactic HI sources have been detected in the frequency range 1305.5-1419.5 MHz, corresponding to a redshift limit of z<0.09. By cross-matching FASHI sources with the Siena Galaxy Atlas (SGA) and the Sloan Digital Sky Survey (SDSS) catalogs, we found that 16972 (40.7%) sources have spectroscopic redshifts and 10975 (26.3%) sources have only photometric redshifts. Most of the remaining 13794 (33.0%) HI sources are located in the direction of the Galactic plane, making their optical counterparts difficult to identify due to high extinction or high contamination of Galactic stellar sources. Based on current survey results, the FASHI survey is an unprecedented blind extragalactic HI survey. It has higher spectral and spatial resolution and broader coverage than the Arecibo Legacy Fast ALFA Survey (ALFALFA). When completed, FASHI will provide the largest extragalactic HI catalog and an objective view of HI content and large-scale structure in the local universe. △ Less

Submitted 10 December, 2023; originally announced December 2023.

Comments: 22 pages, 12 figures, published in SCPMA. All catalogs are available at https://zcp521.github.io/fashi and https://fast.bao.ac.cn/cms/article/271/

Journal ref: Sci. China-Phys. Mech. Astron. 67, 219511 (2024)

arXiv:2312.05886 [pdf, ps, other]

Edge-connectivity kee** trees in $k$-edge-connected graphs

Authors: Qing Yang, Yingzhi Tian

Abstract: Mader [J. Combin. Theory Ser. B 40 (1986) 152-158] proved that every $k$-edge-connected graph $G$ with minimum degree at least $k+1$ contains a vertex $u$ such that $G-\{u\}$ is still $k$-edge-connected. In this paper, we prove that every $k$-edge-connected graph $G$ with minimum degree at least $k+2$ contains an edge $uv$ such that $G-\{u,v\}$ is $k$-edge-connected for any positive integer $k$. I… ▽ More Mader [J. Combin. Theory Ser. B 40 (1986) 152-158] proved that every $k$-edge-connected graph $G$ with minimum degree at least $k+1$ contains a vertex $u$ such that $G-\{u\}$ is still $k$-edge-connected. In this paper, we prove that every $k$-edge-connected graph $G$ with minimum degree at least $k+2$ contains an edge $uv$ such that $G-\{u,v\}$ is $k$-edge-connected for any positive integer $k$. In addition, we show that for any tree $T$ of order $m$, every $k$-edge-connected graph $G$ with minimum degree greater than $4(k+m)^2$ contains a subtree $T'$ isomorphic to $T$ such that $G-V(T')$ is $k$-edge-connected. △ Less

Submitted 10 December, 2023; originally announced December 2023.

arXiv:2312.05062 [pdf, ps, other]

Deep Learning Enabled Semantic Communication Systems for Video Transmission

Authors: Zhenguo Zhang, Qianqian Yang, Shibo He, Jiming Chen

Abstract: Semantic communication has emerged as a promising approach for improving efficient transmission in the next generation of wireless networks. Inspired by the success of semantic communication in different areas, we aim to provide a new semantic communication scheme from the semantic level. In this paper, we propose a novel DL-based semantic communication system for video transmission, which compact… ▽ More Semantic communication has emerged as a promising approach for improving efficient transmission in the next generation of wireless networks. Inspired by the success of semantic communication in different areas, we aim to provide a new semantic communication scheme from the semantic level. In this paper, we propose a novel DL-based semantic communication system for video transmission, which compacts semantic-related information to improve transmission efficiency. In particular, we utilize the Bi-optical flow to estimate residual information of inter-frame details. We also propose a feature choice module and a feature fusion module to drop semantically redundant features while paying more attention to the important semantic-related content. We employ a frame prediction module to reconstruct semantic features of the prediction frame from the received signal at the receiver. To enhance the system's robustness, we propose a noise attention module that assigns different importance weights to the extracted features. Simulation results indicate that our proposed method outperforms existing approaches in terms of transmission efficiency, achieving about 33.3\% reduction in the number of transmitted symbols while improving the peak signal-to-noise ratio (PSNR) performance by an average of 0.56dB. △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2312.05024 [pdf, other]

A Unified Framework for Unsupervised Domain Adaptation based on Instance Weighting

Authors: ****g Zhu, Feiyang Ye, Qiao Xiao, Pengxin Guo, Yu Zhang, Qiang Yang

Abstract: Despite the progress made in domain adaptation, solving Unsupervised Domain Adaptation (UDA) problems with a general method under complex conditions caused by label shifts between domains remains a formidable task. In this work, we comprehensively investigate four distinct UDA settings including closed set domain adaptation, partial domain adaptation, open set domain adaptation, and universal doma… ▽ More Despite the progress made in domain adaptation, solving Unsupervised Domain Adaptation (UDA) problems with a general method under complex conditions caused by label shifts between domains remains a formidable task. In this work, we comprehensively investigate four distinct UDA settings including closed set domain adaptation, partial domain adaptation, open set domain adaptation, and universal domain adaptation, where shared common classes between source and target domains coexist alongside domain-specific private classes. The prominent challenges inherent in diverse UDA settings center around the discrimination of common/private classes and the precise measurement of domain discrepancy. To surmount these challenges effectively, we propose a novel yet effective method called Learning Instance Weighting for Unsupervised Domain Adaptation (LIWUDA), which caters to various UDA settings. Specifically, the proposed LIWUDA method constructs a weight network to assign weights to each instance based on its probability of belonging to common classes, and designs Weighted Optimal Transport (WOT) for domain alignment by leveraging instance weights. Additionally, the proposed LIWUDA method devises a Separate and Align (SA) loss to separate instances with low similarities and align instances with high similarities. To guide the learning of the weight network, Intra-domain Optimal Transport (IOT) is proposed to enforce the weights of instances in common classes to follow a uniform distribution. Through the integration of those three components, the proposed LIWUDA method demonstrates its capability to address all four UDA settings in a unified manner. Experimental evaluations conducted on three benchmark datasets substantiate the effectiveness of the proposed LIWUDA method. △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2312.04822 [pdf, other]

SiCP: Simultaneous Individual and Cooperative Perception for 3D Object Detection in Connected and Automated Vehicles

Authors: Deyuan Qu, Qi Chen, Tianyu Bai, Andy Qin, Hongsheng Lu, Heng Fan, Song Fu, Qing Yang

Abstract: Cooperative perception for connected and automated vehicles is traditionally achieved through the fusion of feature maps from two or more vehicles. However, the absence of feature maps shared from other vehicles can lead to a significant decline in object detection performance for cooperative perception models compared to standalone 3D detection models. This drawback impedes the adoption of cooper… ▽ More Cooperative perception for connected and automated vehicles is traditionally achieved through the fusion of feature maps from two or more vehicles. However, the absence of feature maps shared from other vehicles can lead to a significant decline in object detection performance for cooperative perception models compared to standalone 3D detection models. This drawback impedes the adoption of cooperative perception as vehicle resources are often insufficient to concurrently employ two perception models. To tackle this issue, we present Simultaneous Individual and Cooperative Perception (SiCP), a generic framework that supports a wide range of the state-of-the-art standalone perception backbones and enhances them with a novel Dual-Perception Network (DP-Net) designed to facilitate both individual and cooperative perception. In addition to its lightweight nature with only 0.13M parameters, DP-Net is robust and retains crucial gradient information during feature map fusion. As demonstrated in a comprehensive evaluation on the OPV2V dataset, thanks to DP-Net, SiCP surpasses state-of-the-art cooperative perception solutions while preserving the performance of standalone perception solutions. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2312.04366 [pdf, other]

Chirality induced spin selectivity in chiral crystals

Authors: Qun Yang, Yongkang Li, Claudia Felser, Binghai Yan

Abstract: Chirality is a fundamental property of great importance in physics, chemistry, and biology, and has recently been found to generate unexpected spin polarization for electrons passing through organic molecules, known as chirality-induced spin selectivity (CISS). CISS shows promising application potential in spintronic devices, spin-controlled chemistry, and enantiomer separation. It focuses on orga… ▽ More Chirality is a fundamental property of great importance in physics, chemistry, and biology, and has recently been found to generate unexpected spin polarization for electrons passing through organic molecules, known as chirality-induced spin selectivity (CISS). CISS shows promising application potential in spintronic devices, spin-controlled chemistry, and enantiomer separation. It focuses on organic molecules that exhibit poor electronic conductivity and inherent complexities, such as the debated role of SOC at the molecule-metal interface. In this work, we go beyond organic molecules and study chiral solids with excellent electrical conductivity, intrinsic SOC, and topological electronic structures. We demonstrate that electrons exhibit both spin and orbital polarization as they pass through chiral crystals. Both polarization increases with material thickness before saturating to the bulk values. The spin polarization is proportional to intrinsic SOC while the orbital polarization is insensitive to SOC. The large spin polarization comes with strong electrical magnetochiral anisotropy in the magneto-transport of these chiral crystals (e.g., RhSi). Our work reveals the interplay of chirality, electron spin, and orbital in chiral crystals, paving the way for develo** chiral solids for chirality-induced phenomena. △ Less

Submitted 23 January, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

Comments: 23 pages, 5 figures

arXiv:2312.00360 [pdf, other]

Efficient Multimodal Semantic Segmentation via Dual-Prompt Learning

Authors: Shaohua Dong, Yunhe Feng, Qing Yang, Yan Huang, Dongfang Liu, Heng Fan

Abstract: Multimodal (e.g., RGB-Depth/RGB-Thermal) fusion has shown great potential for improving semantic segmentation in complex scenes (e.g., indoor/low-light conditions). Existing approaches often fully fine-tune a dual-branch encoder-decoder framework with a complicated feature fusion strategy for achieving multimodal semantic segmentation, which is training-costly due to the massive parameter updates… ▽ More Multimodal (e.g., RGB-Depth/RGB-Thermal) fusion has shown great potential for improving semantic segmentation in complex scenes (e.g., indoor/low-light conditions). Existing approaches often fully fine-tune a dual-branch encoder-decoder framework with a complicated feature fusion strategy for achieving multimodal semantic segmentation, which is training-costly due to the massive parameter updates in feature extraction and fusion. To address this issue, we propose a surprisingly simple yet effective dual-prompt learning network (dubbed DPLNet) for training-efficient multimodal (e.g., RGB-D/T) semantic segmentation. The core of DPLNet is to directly adapt a frozen pre-trained RGB model to multimodal semantic segmentation, reducing parameter updates. For this purpose, we present two prompt learning modules, comprising multimodal prompt generator (MPG) and multimodal feature adapter (MFA). MPG works to fuse the features from different modalities in a compact manner and is inserted from shadow to deep stages to generate the multi-level multimodal prompts that are injected into the frozen backbone, while MPG adapts prompted multimodal features in the frozen backbone for better multimodal semantic segmentation. Since both the MPG and MFA are lightweight, only a few trainable parameters (3.88M, 4.4% of the pre-trained backbone parameters) are introduced for multimodal feature fusion and learning. Using a simple decoder (3.27M parameters), DPLNet achieves new state-of-the-art performance or is on a par with other complex approaches on four RGB-D/T semantic segmentation datasets while satisfying parameter efficiency. Moreover, we show that DPLNet is general and applicable to other multimodal tasks such as salient object detection and video semantic segmentation. Without special design, DPLNet outperforms many complicated models. Our code will be available at github.com/ShaohuaDong2021/DPLNet. △ Less

Submitted 3 December, 2023; v1 submitted 1 December, 2023; originally announced December 2023.

Comments: 11 pages, 4 figures, 9 tables

arXiv:2311.17431 [pdf, other]

Grounding Foundation Models through Federated Transfer Learning: A General Framework

Authors: Yan Kang, Tao Fan, Hanlin Gu, Xiao** Zhang, Lixin Fan, Qiang Yang

Abstract: Foundation Models (FMs) such as GPT-4 encoded with vast knowledge and powerful emergent abilities have achieved remarkable success in various natural language processing and computer vision tasks. Grounding FMs by adapting them to domain-specific tasks or augmenting them with domain-specific knowledge enables us to exploit the full potential of FMs. However, grounding FMs faces several challenges,… ▽ More Foundation Models (FMs) such as GPT-4 encoded with vast knowledge and powerful emergent abilities have achieved remarkable success in various natural language processing and computer vision tasks. Grounding FMs by adapting them to domain-specific tasks or augmenting them with domain-specific knowledge enables us to exploit the full potential of FMs. However, grounding FMs faces several challenges, stemming primarily from constrained computing resources, data privacy, model heterogeneity, and model ownership. Federated Transfer Learning (FTL), the combination of federated learning and transfer learning, provides promising solutions to address these challenges. In recent years, the need for grounding FMs leveraging FTL, coined FTL-FM, has arisen strongly in both academia and industry. Motivated by the strong growth in FTL-FM research and the potential impact of FTL-FM on industrial applications, we propose an FTL-FM framework that formulates problems of grounding FMs in the federated learning setting, construct a detailed taxonomy based on the FTL-FM framework to categorize state-of-the-art FTL-FM works, and comprehensively overview FTL-FM works based on the proposed taxonomy. We also establish correspondences between FTL-FM and conventional phases of adapting FM so that FM practitioners can align their research works with FTL-FM. In addition, we overview advanced efficiency-improving and privacy-preserving techniques because efficiency and privacy are critical concerns in FTL-FM. Last, we discuss opportunities and future research directions of FTL-FM. △ Less

Submitted 29 March, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

Comments: In progress

arXiv:2311.17401 [pdf, ps, other]

Gene-MOE: A sparsely gated prognosis and classification framework exploiting pan-cancer genomic information

Authors: Xiangyu Meng, Xue Li, Qing Yang, Huanhuan Dai, Lian Qiao, Hongzhen Ding, Long Hao, Xun Wang

Abstract: Benefiting from the advancements in deep learning, various genomic analytical techniques, such as survival analysis, classification of tumors and their subtypes, and exploration of specific pathways, have significantly enhanced our understanding of the biological mechanisms driving cancer. However, the overfitting issue, arising from the limited number of patient samples, poses a challenge in impr… ▽ More Benefiting from the advancements in deep learning, various genomic analytical techniques, such as survival analysis, classification of tumors and their subtypes, and exploration of specific pathways, have significantly enhanced our understanding of the biological mechanisms driving cancer. However, the overfitting issue, arising from the limited number of patient samples, poses a challenge in improving the accuracy of genome analysis by deepening the neural network. Furthermore, it remains uncertain whether novel approaches such as the sparsely gated mixture of expert (MOE) and self-attention mechanisms can improve the accuracy of genomic analysis. In this paper, we introduce a novel sparsely gated RNA-seq analysis framework called Gene-MOE. This framework exploits the potential of the MOE layers and the proposed mixture of attention expert (MOAE) layers to enhance the analysis accuracy. Additionally, it addresses overfitting challenges by integrating pan-cancer information from 33 distinct cancer types through pre-training.We pre-trained Gene-MOE on TCGA pan-cancer RNA-seq dataset with 33 cancer types. Subsequently, we conducted experiments involving cancer classification and survival analysis based on the pre-trained Gene-MOE. According to the survival analysis results on 14 cancer types, Gene-MOE outperformed state-of-the-art models on 12 cancer types. Through detailed feature analysis, we found that the Gene-MOE model could learn rich feature representations of high-dimensional genes. According to the classification results, the total accuracy of the classification model for 33 cancer classifications reached 95.8%, representing the best performance compared to state-of-the-art models. These results indicate that Gene-MOE holds strong potential for use in cancer classification and survival analysis. △ Less

Submitted 18 December, 2023; v1 submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.16938 [pdf, other]

New insights into the doubly charmed exotic mesons

Authors: Di Guo, Qin-He Yang, Ling-Yun Dai, A. P. Szczepaniak

Abstract: Using effective Lagrangians constrained by the heavy quark spin symmetry and chiral symmetry, for the light quarks, we analyze the $D^0 D^0π^+$, $\bar{D}^0D^0π^0$ and $D^0\bar{D}^{*0}$ invariant mass spectra. Performing a simultaneous analysis of the doubly charmed and charm-anti-charm states gives further insights into the nature of the $T^+_{cc}$ and $χ^0_{c1}(3872)$, exotic hadrons. We find tha… ▽ More Using effective Lagrangians constrained by the heavy quark spin symmetry and chiral symmetry, for the light quarks, we analyze the $D^0 D^0π^+$, $\bar{D}^0D^0π^0$ and $D^0\bar{D}^{*0}$ invariant mass spectra. Performing a simultaneous analysis of the doubly charmed and charm-anti-charm states gives further insights into the nature of the $T^+_{cc}$ and $χ^0_{c1}(3872)$, exotic hadrons. We find that both states lie below the respective $DD^*$/$D\bar{D}^*$ thresholds. Also, the contributions of the triangle and box diagrams are negligible. △ Less

Submitted 10 May, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

Comments: 5 pages, 5 figures

arXiv:2311.16919 [pdf, other]

Contribution of coherent electron production to measurements of heavy-flavor decayed electrons in heavy-ion collisions

Authors: Shenghui Zhang, Rongrong Ma, Yuan**g Ji, Zebo Tang, Qian Yang, Yifei Zhang, Wangwei Zha

Abstract: Heavy quarks, produced at early stages of heavy-ion collisions, are an excellent probe of the Quark-Gluon Plasma (QGP) also created in these collisions. Electrons from open heavy-flavor hadron decays (HFE) are good proxies for heavy quarks, and have been measured extensively in the last two decades to study QGP properties. These measurements are traditionally carried out by subtracting all known b… ▽ More Heavy quarks, produced at early stages of heavy-ion collisions, are an excellent probe of the Quark-Gluon Plasma (QGP) also created in these collisions. Electrons from open heavy-flavor hadron decays (HFE) are good proxies for heavy quarks, and have been measured extensively in the last two decades to study QGP properties. These measurements are traditionally carried out by subtracting all known background sources from the inclusive electron sample. More recently, a significant enhancement of $e^+e^-$ pair production at very low transverse momenta was observed in peripheral heavy-ion collisions. The production characteristics is consistent with coherent photon-photon interactions, which should also constitute a background source to the HFE measurements. In this article, we provide theoretical predictions for the contribution of coherent electron production to HFE as a function of transverse momentum, centrality and collision energy in Au+Au and Pb+Pb collisions. △ Less

Submitted 15 April, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

arXiv:2311.16167 [pdf, other]

Moving Sampling Physics-informed Neural Networks induced by Moving Mesh PDE

Authors: Yu Yang, Qihong Yang, Yangtao Deng, Qiaolin He

Abstract: In this work, we propose an end-to-end adaptive sampling neural network (MMPDE-Net) based on the moving mesh method, which can adaptively generate new sampling points by solving the moving mesh PDE. This model focuses on improving the quality of sampling points generation. Moreover, we develop an iterative algorithm based on MMPDE-Net, which makes the sampling points more precise and controllable.… ▽ More In this work, we propose an end-to-end adaptive sampling neural network (MMPDE-Net) based on the moving mesh method, which can adaptively generate new sampling points by solving the moving mesh PDE. This model focuses on improving the quality of sampling points generation. Moreover, we develop an iterative algorithm based on MMPDE-Net, which makes the sampling points more precise and controllable. Since MMPDE-Net is a framework independent of the deep learning solver, we combine it with physics-informed neural networks (PINN) to propose moving sampling PINN (MS-PINN) and demonstrate its effectiveness by error analysis under some assumptions. Finally, we demonstrate the performance improvement of MS-PINN compared to PINN through numerical experiments of four typical examples, which numerically verify the effectiveness of our method. △ Less

Submitted 9 June, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

arXiv:2311.15451 [pdf, other]

Uncertainty-aware Language Modeling for Selective Question Answering

Authors: Qi Yang, Shreya Ravikumar, Fynn Schmitt-Ulms, Satvik Lolla, Ege Demir, Iaroslav Elistratov, Alex Lavaee, Sadhana Lolla, Elaheh Ahmadi, Daniela Rus, Alexander Amini, Alejandro Perez

Abstract: We present an automatic large language model (LLM) conversion approach that produces uncertainty-aware LLMs capable of estimating uncertainty with every prediction. Our approach is model- and data-agnostic, is computationally-efficient, and does not rely on external models or systems. We evaluate converted models on the selective question answering setting -- to answer as many questions as possibl… ▽ More We present an automatic large language model (LLM) conversion approach that produces uncertainty-aware LLMs capable of estimating uncertainty with every prediction. Our approach is model- and data-agnostic, is computationally-efficient, and does not rely on external models or systems. We evaluate converted models on the selective question answering setting -- to answer as many questions as possible while maintaining a given accuracy, forgoing providing predictions when necessary. As part of our results, we test BERT and Llama 2 model variants on the SQuAD extractive QA task and the TruthfulQA generative QA task. We show that using the uncertainty estimates provided by our approach to selectively answer questions leads to significantly higher accuracy over directly using model probabilities. △ Less

Submitted 26 November, 2023; originally announced November 2023.

arXiv:2311.13381 [pdf, other]

Confidant: Customizing Transformer-based LLMs via Collaborative Edge Training

Authors: Yuhao Chen, Yuxuan Yan, Qianqian Yang, Yuanchao Shu, Shibo He, Jiming Chen

Abstract: Transformer-based large language models (LLMs) have demonstrated impressive capabilities in a variety of natural language processing (NLP) tasks. Nonetheless, it is challenging to deploy and fine-tune LLMs on mobile edge devices with limited computing, memory, and energy budgets. In this paper, we propose Confidant, a multi-backend collaborative training framework for customizing state-of-the-art… ▽ More Transformer-based large language models (LLMs) have demonstrated impressive capabilities in a variety of natural language processing (NLP) tasks. Nonetheless, it is challenging to deploy and fine-tune LLMs on mobile edge devices with limited computing, memory, and energy budgets. In this paper, we propose Confidant, a multi-backend collaborative training framework for customizing state-of-the-art LLMs on commodity mobile devices like smartphones. Confidant partitions an LLM into several sub-models so that each fits into a mobile device's memory. A pipeline parallel training mechanism is further developed to ensure fast and efficient distributed training. In addition, we propose a novel backend scheduler to allocate different attention heads to heterogeneous compute hardware, including mobile CPU and GPUs, to maximize the compute resource utilization on each edge device. Our preliminary experimental results show that Confidant achieves at most 45.3% memory reduction and 8.03x inference speedup in practical settings. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Comments: 6 pages, 7 figures; Submitted to HotMobile 2024

arXiv:2311.13217 [pdf, other]

Controllable orbital angular momentum monopoles in chiral topological semimetals

Authors: Yun Yen, Jonas A. Krieger, Mengyu Yao, Iñigo Robredo, Kaustuv Manna, Qun Yang, Emily C. McFarlane, Chandra Shekhar, Horst Borrmann, Samuel Stolz, Roland Widmer, Oliver Gröning, Vladimir N. Strocov, Stuart S. P. Parkin, Claudia Felser, Maia G. Vergniory, Michael Schüler, Niels B. M. Schröter

Abstract: The emerging field of orbitronics aims at generating and controlling currents of electronic orbital angular momentum (OAM) for information processing. Structurally chiral topological crystals could be particularly suitable orbitronic materials because they have been predicted to host topological band degeneracies in reciprocal space that are monopoles of OAM. Around such a monopole, the OAM is loc… ▽ More The emerging field of orbitronics aims at generating and controlling currents of electronic orbital angular momentum (OAM) for information processing. Structurally chiral topological crystals could be particularly suitable orbitronic materials because they have been predicted to host topological band degeneracies in reciprocal space that are monopoles of OAM. Around such a monopole, the OAM is locked isotopically parallel or antiparallel to the direction of the electron's momentum, which could be used to generate large and controllable OAM currents. However, OAM monopoles have not yet been directly observed in chiral crystals, and no handle to control their polarity has been discovered. Here, we use circular dichroism in angle-resolved photoelectron spectroscopy (CD-ARPES) to image OAM monopoles in the chiral topological semimetals PtGa and PdGa. Moreover, we also demonstrate that the polarity of the monopole can be controlled via the structural handedness of the host crystal by imaging OAM monopoles and anti-monopoles in the two enantiomers of PdGa, respectively. For most photon energies used in our study, we observe a sign change in the CD-ARPES spectrum when comparing positive and negative momenta along the light direction near the topological degeneracy. This is consistent with the conventional view that CD-ARPES measures the projection of the OAM monopole along the photon momentum. For some photon energies, however, this sign change disappears, which can be understood from our numerical simulations as the interference of polar atomic OAM contributions, consistent with the presence of OAM monopoles. Our results highlight the potential of chiral crystals for orbitronic device applications, and our methodology could enable the discovery of even more complicated nodal OAM textures that could be exploited for orbitronics. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Comments: 16 pages, 8 figures

arXiv:2311.11211 [pdf]

Leveraging Generative AI for Clinical Evidence Summarization Needs to Ensure Trustworthiness

Authors: Gongbo Zhang, Qiao **, Denis Jered McInerney, Yong Chen, Fei Wang, Curtis L. Cole, Qian Yang, Yanshan Wang, Bradley A. Malin, Mor Peleg, Byron C. Wallace, Zhiyong Lu, Chunhua Weng, Yifan Peng

Abstract: Evidence-based medicine promises to improve the quality of healthcare by empowering medical decisions and practices with the best available evidence. The rapid growth of medical evidence, which can be obtained from various sources, poses a challenge in collecting, appraising, and synthesizing the evidential information. Recent advancements in generative AI, exemplified by large language models, ho… ▽ More Evidence-based medicine promises to improve the quality of healthcare by empowering medical decisions and practices with the best available evidence. The rapid growth of medical evidence, which can be obtained from various sources, poses a challenge in collecting, appraising, and synthesizing the evidential information. Recent advancements in generative AI, exemplified by large language models, hold promise in facilitating the arduous task. However, develo** accountable, fair, and inclusive models remains a complicated undertaking. In this perspective, we discuss the trustworthiness of generative AI in the context of automated summarization of medical evidence. △ Less

Submitted 31 March, 2024; v1 submitted 18 November, 2023; originally announced November 2023.

arXiv:2311.10944 [pdf, other]

Deception Detection from Linguistic and Physiological Data Streams Using Bimodal Convolutional Neural Networks

Authors: Panfeng Li, Mohamed Abouelenien, Rada Mihalcea, Zhicheng Ding, Qikai Yang, Yiming Zhou

Abstract: Deception detection is gaining increasing interest due to ethical and security concerns. This paper explores the application of convolutional neural networks for the purpose of multimodal deception detection. We use a dataset built by interviewing 104 subjects about two topics, with one truthful and one falsified response from each subject about each topic. In particular, we make three main contri… ▽ More Deception detection is gaining increasing interest due to ethical and security concerns. This paper explores the application of convolutional neural networks for the purpose of multimodal deception detection. We use a dataset built by interviewing 104 subjects about two topics, with one truthful and one falsified response from each subject about each topic. In particular, we make three main contributions. First, we extract linguistic and physiological features from this data to train and construct the neural network models. Second, we propose a fused convolutional neural network model using both modalities in order to achieve an improved overall performance. Third, we compare our new approach with earlier methods designed for multimodal deception detection. We find that our system outperforms regular classification methods; our results indicate the feasibility of using neural networks for deception detection even in the presence of limited amounts of data. △ Less

Submitted 26 June, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

Comments: Accepted by 2024 5th International Conference on Information Science, Parallel and Distributed Systems

arXiv:2311.10070 [pdf, ps, other]

Conformally Covariant Boundary Operators and Sharp Higher Order Sobolev Trace Inequalities on Poincaré-Einstein Manifolds

Authors: Joshua Flynn, Guozhen Lu, Qiaohua Yang

Abstract: In this paper we introduce conformally covariant boundary operators for Poincaré-Einstein manifolds satisfying a mild spectral assumption. Using these boundary operators we set up higher order Dirichlet problems whose solutions are such that, when applied to by our boundary operators, they recover the fractional order GJMS operators on the conformal infinity of the manifold. We moreover obtain all… ▽ More In this paper we introduce conformally covariant boundary operators for Poincaré-Einstein manifolds satisfying a mild spectral assumption. Using these boundary operators we set up higher order Dirichlet problems whose solutions are such that, when applied to by our boundary operators, they recover the fractional order GJMS operators on the conformal infinity of the manifold. We moreover obtain all related higher order trace Sobolev inequalities on these manifolds. In conjunction with Beckner's fractional Sobolev inequalities on the sphere, we obtain as an application the sharp higher order Sobolev trace inequalities on the ball. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: 41 pages

arXiv:2311.09956 [pdf, ps, other]

Conformally Covariant Boundary Operators and Sharp Higher Order CR Sobolev Trace Inequalities on the Siegel Domain and Complex Ball

Authors: Joshua Flynn, Guozhen Lu, Qiaohua Yang

Abstract: We first introduce an appropriate family of conformally covariant boundary operators associated to the Siegel domain ${\mathcal U}^{n+1}$ with the Heisenberg group $\mathbb{H}^{n}$ as its boundary and the complex ball $\mathbb{B}_{\mathbb{C}}^{n+1}$ with the complex sphere $\mathbb{S}^{2n+1}$ as its boundary. We provide the explicit formulas of these conformally covariant boundary operators. Secon… ▽ More We first introduce an appropriate family of conformally covariant boundary operators associated to the Siegel domain ${\mathcal U}^{n+1}$ with the Heisenberg group $\mathbb{H}^{n}$ as its boundary and the complex ball $\mathbb{B}_{\mathbb{C}}^{n+1}$ with the complex sphere $\mathbb{S}^{2n+1}$ as its boundary. We provide the explicit formulas of these conformally covariant boundary operators. Second, we establish all higher order extension theorems of Caffarelli-Silvestre type for the Siegel domain and complex ball. Third, we prove all higher order CR Sobolev trace inequalities for the Siegel domain ${\mathcal U}^{n+1}$ and the complex ball $\mathbb{B}_{\mathbb{C}}^{n+1}$.In particular, we generalize the Sobolev trace inequalityfor $γ\in (0, 1)$ in the CR setting by Frank-González-Monticelli-Tan to the case for all $γ\in (0, n+1)\backslash \mathbb{N}$. The family of higher order conformally covariant boundary operators we define are naturally intrinsic to the higher order Sobolev trace inequalities on both the Siegel domain ${\mathcal U}^{n+1}$ and complex ball $\mathbb{B}_{\mathbb{C}^{n+1}}$. Finally, we give an explicit solution to the scattering problem on the complex hyperbolic ball. More precisely, we obtain an integral representation and an expansion in terms of special functions for the solution to the scattering problem. △ Less

Submitted 22 January, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

Comments: 40 pages. A new section (Section 8) is added. This section gives an explicit solution to the scattering problem on the complex hyperbolic ball

arXiv:2311.07919 [pdf, other]

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

Authors: Yunfei Chu, ** Xu, Xiaohuan Zhou, Qian Yang, Shiliang Zhang, Zhijie Yan, Chang Zhou, **gren Zhou

Abstract: Recently, instruction-following audio-language models have received broad attention for audio interaction with humans. However, the absence of pre-trained audio models capable of handling diverse audio types and tasks has hindered progress in this field. Consequently, most existing works have only been able to support a limited range of interaction capabilities. In this paper, we develop the Qwen-… ▽ More Recently, instruction-following audio-language models have received broad attention for audio interaction with humans. However, the absence of pre-trained audio models capable of handling diverse audio types and tasks has hindered progress in this field. Consequently, most existing works have only been able to support a limited range of interaction capabilities. In this paper, we develop the Qwen-Audio model and address this limitation by scaling up audio-language pre-training to cover over 30 tasks and various audio types, such as human speech, natural sounds, music, and songs, to facilitate universal audio understanding abilities. However, directly co-training all tasks and datasets can lead to interference issues, as the textual labels associated with different datasets exhibit considerable variations due to differences in task focus, language, granularity of annotation, and text structure. To overcome the one-to-many interference, we carefully design a multi-task training framework by conditioning on a sequence of hierarchical tags to the decoder for encouraging knowledge sharing and avoiding interference through shared and specified tags respectively. Remarkably, Qwen-Audio achieves impressive performance across diverse benchmark tasks without requiring any task-specific fine-tuning, surpassing its counterparts. Building upon the capabilities of Qwen-Audio, we further develop Qwen-Audio-Chat, which allows for input from various audios and text inputs, enabling multi-turn dialogues and supporting various audio-central scenarios. △ Less

Submitted 21 December, 2023; v1 submitted 14 November, 2023; originally announced November 2023.

Comments: The code, checkpoints and demo are released at https://github.com/QwenLM/Qwen-Audio

arXiv:2311.06750 [pdf, other]

Federated Learning for Generalization, Robustness, Fairness: A Survey and Benchmark

Authors: Wenke Huang, Mang Ye, Zekun Shi, Guancheng Wan, He Li, Bo Du, Qiang Yang

Abstract: Federated learning has emerged as a promising paradigm for privacy-preserving collaboration among different parties. Recently, with the popularity of federated learning, an influx of approaches have delivered towards different realistic challenges. In this survey, we provide a systematic overview of the important and recent developments of research on federated learning. Firstly, we introduce the… ▽ More Federated learning has emerged as a promising paradigm for privacy-preserving collaboration among different parties. Recently, with the popularity of federated learning, an influx of approaches have delivered towards different realistic challenges. In this survey, we provide a systematic overview of the important and recent developments of research on federated learning. Firstly, we introduce the study history and terminology definition of this area. Then, we comprehensively review three basic lines of research: generalization, robustness, and fairness, by introducing their respective background concepts, task settings, and main challenges. We also offer a detailed overview of representative literature on both methods and datasets. We further benchmark the reviewed methods on several well-known datasets. Finally, we point out several open issues in this field and suggest opportunities for further research. We also provide a public website to continuously track developments in this fast advancing field: https://github.com/WenkeHuang/MarsFL. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: 22 pages, 4 figures

arXiv:2311.06463 [pdf, other]

Self-suppressed quantum diffusion and fundamental noise limit of soliton microcombs

Authors: Xing **, Zhe Lv, Qihuang Gong, Qi-Fan Yang

Abstract: Quantum diffusion of soliton microcombs has long been recognized as their fundamental noise limit. Here we surpass such limit by utilizing dispersive wave dynamics in multimode microresonators. Through the recoil force provided by these dispersive waves, the quantum diffusion can be suppressed to a much lower level that forms the ultimate fundamental noise limit of soliton microcombs. Our findings… ▽ More Quantum diffusion of soliton microcombs has long been recognized as their fundamental noise limit. Here we surpass such limit by utilizing dispersive wave dynamics in multimode microresonators. Through the recoil force provided by these dispersive waves, the quantum diffusion can be suppressed to a much lower level that forms the ultimate fundamental noise limit of soliton microcombs. Our findings enable coherence engineering of soliton microcombs in the quantum-limited regime, providing critical guidelines for using soliton microcombs to synthesize ultralow-noise microwave and optical signals. △ Less

Submitted 10 November, 2023; originally announced November 2023.

Comments: 8 pages, 5 figures

arXiv:2311.05837 [pdf]

High-efficiency edge couplers enabled by vertically tapering on lithium-niobate photonic chips

Authors: Di Jia, Qiang Luo, Chen Yang, Rui Ma, Xuanyi Yu, Feng Gao, Qifan Yang, Fang Bo, Guoquan Zhang, **gjun Xu

Abstract: In the past decade, photonic integrated circuits (PICs) based on thin-film lithium niobate (TFLN) have advanced in various fields, including optical communication, nonlinear photonics, and quantum optics. A critical component is an efficient edge coupler connecting PICs to light sources or detectors. Here, we propose an innovative edge coupler design with a wedge-shaped TFLN waveguide and a silico… ▽ More In the past decade, photonic integrated circuits (PICs) based on thin-film lithium niobate (TFLN) have advanced in various fields, including optical communication, nonlinear photonics, and quantum optics. A critical component is an efficient edge coupler connecting PICs to light sources or detectors. Here, we propose an innovative edge coupler design with a wedge-shaped TFLN waveguide and a silicon oxynitride (SiON) cladding. Experimental results show that the coupling loss between the TFLN PIC and a 3-μm mode field diameter (MFD) lensed fiber is low at 1.52 dB/facet, with the potential for improvement to 0.43 dB/facet theoretically. The coupling loss between the edge coupler and a UHNA7 fiber with an MFD of 3.2 μm is reduced to 0.92 dB/facet. This design maintains robust fabrication and alignment tolerance. Importantly, the minimum linewidth of the TFLN waveguide of the coupler (600 nm) can be easily achieved using foundry-available i-line stepper lithography. This work benefits the development of TFLN integrated platforms, such as on-chip electro-optic modulators, frequency comb generation, and quantum sensors. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2311.05827 [pdf, other]

AccEPT: An Acceleration Scheme for Speeding Up Edge Pipeline-parallel Training

Authors: Yuhao Chen, Yuxuan Yan, Qianqian Yang, Yuanchao Shu, Shibo He, Zhiguo Shi, Jiming Chen

Abstract: It is usually infeasible to fit and train an entire large deep neural network (DNN) model using a single edge device due to the limited resources. To facilitate intelligent applications across edge devices, researchers have proposed partitioning a large model into several sub-models, and deploying each of them to a different edge device to collaboratively train a DNN model. However, the communicat… ▽ More It is usually infeasible to fit and train an entire large deep neural network (DNN) model using a single edge device due to the limited resources. To facilitate intelligent applications across edge devices, researchers have proposed partitioning a large model into several sub-models, and deploying each of them to a different edge device to collaboratively train a DNN model. However, the communication overhead caused by the large amount of data transmitted from one device to another during training, as well as the sub-optimal partition point due to the inaccurate latency prediction of computation at each edge device can significantly slow down training. In this paper, we propose AccEPT, an acceleration scheme for accelerating the edge collaborative pipeline-parallel training. In particular, we propose a light-weight adaptive latency predictor to accurately estimate the computation latency of each layer at different devices, which also adapts to unseen devices through continuous learning. Therefore, the proposed latency predictor leads to better model partitioning which balances the computation loads across participating devices. Moreover, we propose a bit-level computation-efficient data compression scheme to compress the data to be transmitted between devices during training. Our numerical results demonstrate that our proposed acceleration approach is able to significantly speed up edge pipeline parallel training up to 3 times faster in the considered experimental settings. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2311.03612 [pdf, other]

BlockEmulator: An Emulator Enabling to Test Blockchain Sharding Protocols

Authors: Huawei Huang, Guang Ye, Qinde Chen, Zhaokang Yin, Xiaofei Luo, Jianru Lin, Taotao Li, Qinglin Yang, Zibin Zheng

Abstract: Numerous blockchain simulators have been proposed to allow researchers to simulate mainstream blockchains. However, we have not yet found a testbed that enables researchers to develop and evaluate their new consensus algorithms or new protocols for blockchain sharding systems. To fill this gap, we develop BlockEmulator, which is designed as an experimental platform, particularly for emulating bloc… ▽ More Numerous blockchain simulators have been proposed to allow researchers to simulate mainstream blockchains. However, we have not yet found a testbed that enables researchers to develop and evaluate their new consensus algorithms or new protocols for blockchain sharding systems. To fill this gap, we develop BlockEmulator, which is designed as an experimental platform, particularly for emulating blockchain sharding mechanisms. BlockEmulator adopts a lightweight blockchain architecture such that developers can only focus on implementing their new protocols or mechanisms. Using layered modules and useful programming interfaces offered by BlockEmulator, researchers can implement a new protocol with minimum effort. Through experiments, we test various functionalities of BlockEmulator in two steps. Firstly, we prove the correctness of the emulation results yielded by BlockEmulator by comparing the theoretical analysis with the observed experiment results. Secondly, other experimental results demonstrate that BlockEmulator can facilitate the measurement of a series of metrics, including throughput, transaction confirmation latency, cross-shard transaction ratio, the queuing size of transaction pools, workload distribution across blockchain shards, etc. We have made BlockEmulator open-source in Github. △ Less

Submitted 11 November, 2023; v1 submitted 6 November, 2023; originally announced November 2023.

arXiv:2311.03500 [pdf]

Predicting Age from White Matter Diffusivity with Residual Learning

Authors: Chenyu Gao, Michael E. Kim, Ho Hin Lee, Qi Yang, Nazirah Mohd Khairi, Praitayini Kanakaraj, Nancy R. Newlin, Derek B. Archer, Angela L. Jefferson, Warren D. Taylor, Brian D. Boyd, Lori L. Beason-Held, Susan M. Resnick, The BIOCARD Study Team, Yuankai Huo, Katherine D. Van Schaik, Kurt G. Schilling, Daniel Moyer, Ivana Išgum, Bennett A. Landman

Abstract: Imaging findings inconsistent with those expected at specific chronological age ranges may serve as early indicators of neurological disorders and increased mortality risk. Estimation of chronological age, and deviations from expected results, from structural MRI data has become an important task for develo** biomarkers that are sensitive to such deviations. Complementary to structural analysis,… ▽ More Imaging findings inconsistent with those expected at specific chronological age ranges may serve as early indicators of neurological disorders and increased mortality risk. Estimation of chronological age, and deviations from expected results, from structural MRI data has become an important task for develo** biomarkers that are sensitive to such deviations. Complementary to structural analysis, diffusion tensor imaging (DTI) has proven effective in identifying age-related microstructural changes within the brain white matter, thereby presenting itself as a promising additional modality for brain age prediction. Although early studies have sought to harness DTI's advantages for age estimation, there is no evidence that the success of this prediction is owed to the unique microstructural and diffusivity features that DTI provides, rather than the macrostructural features that are also available in DTI data. Therefore, we seek to develop white-matter-specific age estimation to capture deviations from normal white matter aging. Specifically, we deliberately disregard the macrostructural information when predicting age from DTI scalar images, using two distinct methods. The first method relies on extracting only microstructural features from regions of interest. The second applies 3D residual neural networks (ResNets) to learn features directly from the images, which are non-linearly registered and warped to a template to minimize macrostructural variations. When tested on unseen data, the first method yields mean absolute error (MAE) of 6.11 years for cognitively normal participants and MAE of 6.62 years for cognitively impaired participants, while the second method achieves MAE of 4.69 years for cognitively normal participants and MAE of 4.96 years for cognitively impaired participants. We find that the ResNet model captures subtler, non-macrostructural features for brain age prediction. △ Less

Submitted 21 January, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

Comments: SPIE Medical Imaging: Image Processing. San Diego, CA. February 2024 (accepted as poster presentation)

arXiv:2311.03301 [pdf, other]

Ziya2: Data-centric Learning is All LLMs Need

Authors: Ruyi Gan, Ziwei Wu, Renliang Sun, Junyu Lu, Xiaojun Wu, Dixiang Zhang, Kunhao Pan, Junqing He, Yuanhe Tian, ** Yang, Qi Yang, Hao Wang, Jiaxing Zhang, Yan Song

Abstract: Various large language models (LLMs) have been proposed in recent years, including closed- and open-source ones, continually setting new records on multiple benchmarks. However, the development of LLMs still faces several issues, such as high cost of training models from scratch, and continual pre-training leading to catastrophic forgetting, etc. Although many such issues are addressed along the l… ▽ More Various large language models (LLMs) have been proposed in recent years, including closed- and open-source ones, continually setting new records on multiple benchmarks. However, the development of LLMs still faces several issues, such as high cost of training models from scratch, and continual pre-training leading to catastrophic forgetting, etc. Although many such issues are addressed along the line of research on LLMs, an important yet practical limitation is that many studies overly pursue enlarging model sizes without comprehensively analyzing and optimizing the use of pre-training data in their learning process, as well as appropriate organization and leveraging of such data in training LLMs under cost-effective settings. In this work, we propose Ziya2, a model with 13 billion parameters adopting LLaMA2 as the foundation model, and further pre-trained on 700 billion tokens, where we focus on pre-training techniques and use data-centric optimization to enhance the learning process of Ziya2 on different stages. We define three data attributes and firstly establish data-centric scaling laws to illustrate how different data impacts LLMs. Experiments show that Ziya2 significantly outperforms other models in multiple benchmarks especially with promising results compared to representative open-source ones. Ziya2 (Base) is released at https://huggingface.co/IDEA-CCNL/Ziya2-13B-Base and https://modelscope.cn/models/Fengshenbang/Ziya2-13B-Base/summary. △ Less

Submitted 4 April, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

arXiv:2311.02947 [pdf]

Multi-view learning for automatic classification of multi-wavelength auroral images

Authors: Qiuju Yang, Hang Su, Lili Liu, Yixuan Wang, Ze-Jun Hu

Abstract: Auroral classification plays a crucial role in polar research. However, current auroral classification studies are predominantly based on images taken at a single wavelength, typically 557.7 nm. Images obtained at other wavelengths have been comparatively overlooked, and the integration of information from multiple wavelengths remains an underexplored area. This limitation results in low classific… ▽ More Auroral classification plays a crucial role in polar research. However, current auroral classification studies are predominantly based on images taken at a single wavelength, typically 557.7 nm. Images obtained at other wavelengths have been comparatively overlooked, and the integration of information from multiple wavelengths remains an underexplored area. This limitation results in low classification rates for complex auroral patterns. Furthermore, these studies, whether employing traditional machine learning or deep learning approaches, have not achieved a satisfactory trade-off between accuracy and speed. To address these challenges, this paper proposes a lightweight auroral multi-wavelength fusion classification network, MLCNet, based on a multi-view approach. Firstly, we develop a lightweight feature extraction backbone, called LCTNet, to improve the classification rate and cope with the increasing amount of auroral observation data. Secondly, considering the existence of multi-scale spatial structures in auroras, we design a novel multi-scale reconstructed feature module named MSRM. Finally, to highlight the discriminative information between auroral classes, we propose a lightweight attention feature enhancement module called LAFE. The proposed method is validated using observational data from the Arctic Yellow River Station during 2003-2004. Experimental results demonstrate that the fusion of multi-wavelength information effectively improves the auroral classification performance. In particular, our approach achieves state-of-the-art classification accuracy compared to previous auroral classification studies, and superior results in terms of accuracy and computational efficiency compared to existing multi-view methods. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2311.01145 [pdf, ps, other]

Simpler Distribution Testing with Little Memory

Authors: Clément L. Canonne, Joy Qi** Yang

Abstract: We consider the question of distribution testing (specifically, uniformity and closeness testing) in the streaming setting, \ie under stringent memory constraints. We improve on the results of Diakonikolas, Gouleakis, Kane, and Rao (2019) by providing considerably simpler algorithms, which remove some restrictions on the range of parameters and match their lower bounds. We consider the question of distribution testing (specifically, uniformity and closeness testing) in the streaming setting, \ie under stringent memory constraints. We improve on the results of Diakonikolas, Gouleakis, Kane, and Rao (2019) by providing considerably simpler algorithms, which remove some restrictions on the range of parameters and match their lower bounds. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: To appear in the 2024 SIAM Symposium on Simplicity in Algorithms (SOSA'24)

arXiv:2310.18358 [pdf, other]

A Communication Theory Perspective on Prompting Engineering Methods for Large Language Models

Authors: Yuanfeng Song, Yuanqin He, Xuefang Zhao, Hanlin Gu, Di Jiang, Haijun Yang, Lixin Fan, Qiang Yang

Abstract: The springing up of Large Language Models (LLMs) has shifted the community from single-task-orientated natural language processing (NLP) research to a holistic end-to-end multi-task learning paradigm. Along this line of research endeavors in the area, LLM-based prompting methods have attracted much attention, partially due to the technological advantages brought by prompt engineering (PE) as well… ▽ More The springing up of Large Language Models (LLMs) has shifted the community from single-task-orientated natural language processing (NLP) research to a holistic end-to-end multi-task learning paradigm. Along this line of research endeavors in the area, LLM-based prompting methods have attracted much attention, partially due to the technological advantages brought by prompt engineering (PE) as well as the underlying NLP principles disclosed by various prompting methods. Traditional supervised learning usually requires training a model based on labeled data and then making predictions. In contrast, PE methods directly use the powerful capabilities of existing LLMs (i.e., GPT-3 and GPT-4) via composing appropriate prompts, especially under few-shot or zero-shot scenarios. Facing the abundance of studies related to the prompting and the ever-evolving nature of this field, this article aims to (i) illustrate a novel perspective to review existing PE methods, within the well-established communication theory framework; (ii) facilitate a better/deeper understanding of develo** trends of existing PE methods used in four typical tasks; (iii) shed light on promising research directions for future PE methods. △ Less

Submitted 23 October, 2023; originally announced October 2023.

arXiv:2310.17966 [pdf, other]

Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning

Authors: Shenzhi Wang, Qisen Yang, Jiawei Gao, Matthieu Gaetan Lin, Hao Chen, Liwei Wu, Ning Jia, Shiji Song, Gao Huang

Abstract: Offline-to-online reinforcement learning (RL) is a training paradigm that combines pre-training on a pre-collected dataset with fine-tuning in an online environment. However, the incorporation of online fine-tuning can intensify the well-known distributional shift problem. Existing solutions tackle this problem by imposing a policy constraint on the policy improvement objective in both offline and… ▽ More Offline-to-online reinforcement learning (RL) is a training paradigm that combines pre-training on a pre-collected dataset with fine-tuning in an online environment. However, the incorporation of online fine-tuning can intensify the well-known distributional shift problem. Existing solutions tackle this problem by imposing a policy constraint on the policy improvement objective in both offline and online learning. They typically advocate a single balance between policy improvement and constraints across diverse data collections. This one-size-fits-all manner may not optimally leverage each collected sample due to the significant variation in data quality across different states. To this end, we introduce Family Offline-to-Online RL (FamO2O), a simple yet effective framework that empowers existing algorithms to determine state-adaptive improvement-constraint balances. FamO2O utilizes a universal model to train a family of policies with different improvement/constraint intensities, and a balance model to select a suitable policy for each state. Theoretically, we prove that state-adaptive balances are necessary for achieving a higher policy performance upper bound. Empirically, extensive experiments show that FamO2O offers a statistically significant improvement over various existing methods, achieving state-of-the-art performance on the D4RL benchmark. Codes are available at https://github.com/LeapLabTHU/FamO2O. △ Less

Submitted 30 October, 2023; v1 submitted 27 October, 2023; originally announced October 2023.

Comments: NeurIPS 2023 spotlight. 24 pages, 13 figures

arXiv:2310.15973 [pdf, ps, other]

Explicit Formulas of Fractional GJMS operators on hyperbolic spaces and sharp fractional Poincaré-Sobolev and Hardy-Sobolev-Maz'ya inequalities

Authors: Guozhen Lu, Qiaohua Yang

Abstract: Using the scattering theory on the hyperbolic space $\mathbb{H}^n$, we give the explicit formulas of the fractional GJMS operators $P_γ$ for all $γ\in(0,\frac{n}{2})\setminus\mathbb{N}$ on $\mathbb{H}^n$.These $P_γ$ for $γ\in(0,\frac{n}{2})\setminus\mathbb{N}$ are neither conformal to the fractional Laplacians on $\mathbb{R}^n_{+}$ nor on $\mathbb{B}^n$ in $\mathbb{R}^{n}$ though $P_γ$ are conform… ▽ More Using the scattering theory on the hyperbolic space $\mathbb{H}^n$, we give the explicit formulas of the fractional GJMS operators $P_γ$ for all $γ\in(0,\frac{n}{2})\setminus\mathbb{N}$ on $\mathbb{H}^n$.These $P_γ$ for $γ\in(0,\frac{n}{2})\setminus\mathbb{N}$ are neither conformal to the fractional Laplacians on $\mathbb{R}^n_{+}$ nor on $\mathbb{B}^n$ in $\mathbb{R}^{n}$ though $P_γ$ are conformal to $(-Δ)^γ$ via half space model and ball model of hyperbolic spaces when $γ\in\mathbb{N}$. To circumvent this, we introduce another family of fractional operators $\tilde{P}_γ$ on $\mathbb{H}^n$ which are conformal to the fractional Laplacians on $\mathbb{R}^n_{+}$ and $\mathbb{B}^n$. It is worthwhile to note that $\tilde{P}_γ\not =P_γ$ unless $γ$ is an integer. We establish the fractional Poincaré-Sobolev inequalities associated with both $P_γ$ and $\tilde{P}_γ$ on $\mathbb{H}^n$. In particular, when $n\geq 3$ and $\frac{n-1}{2}\leq γ<\frac{n}{2}$, we prove that the sharp constants in the $γ$-th order of Poincaré-Sobolev inequalities on the hyperbolic space associated with $P_γ$ and $\tilde{P}_γ$ coincide with the best $γ$-th order Sobolev constant in the $n$-dimensional Euclidean space $\mathbb{R}^n$. We also establish fractional Hardy-Sobolev-Maz'ya inequality on $\mathbb{R}^{n}_+$ and $\mathbb{B}^n$ and prove that the sharp constants in the $γ$-th order Hardy-Sobolev-Maz'ya inequalities on half space $\mathbb{R}^{n}_+$ and unit ball $\mathbb{B}^n$ are the same as the best $γ$-th order Sobolev constants in $\mathbb{R}^n$ when $n\geq 3$ and $\frac{n-1}{2}\leq γ<\frac{n}{2}$. Our methods crucially rely on the Helgason-Fourier analysis on hyperbolic spaces and delicate analysis of special functions. △ Less

Submitted 29 December, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

Comments: Theorem 4.11 is improved to include more general $ν$

arXiv:2310.14978 [pdf, other]

LC-TTFS: Towards Lossless Network Conversion for Spiking Neural Networks with TTFS Coding

Authors: Qu Yang, Malu Zhang, Jibin Wu, Kay Chen Tan, Haizhou Li

Abstract: The biological neurons use precise spike times, in addition to the spike firing rate, to communicate with each other. The time-to-first-spike (TTFS) coding is inspired by such biological observation. However, there is a lack of effective solutions for training TTFS-based spiking neural network (SNN). In this paper, we put forward a simple yet effective network conversion algorithm, which is referr… ▽ More The biological neurons use precise spike times, in addition to the spike firing rate, to communicate with each other. The time-to-first-spike (TTFS) coding is inspired by such biological observation. However, there is a lack of effective solutions for training TTFS-based spiking neural network (SNN). In this paper, we put forward a simple yet effective network conversion algorithm, which is referred to as LC-TTFS, by addressing two main problems that hinder an effective conversion from a high-performance artificial neural network (ANN) to a TTFS-based SNN. We show that our algorithm can achieve a near-perfect map** between the activation values of an ANN and the spike times of an SNN on a number of challenging AI tasks, including image classification, image reconstruction, and speech enhancement. With TTFS coding, we can achieve up to orders of magnitude saving in computation over ANN and other rate-based SNNs. The study, therefore, paves the way for deploying ultra-low-power TTFS-based SNNs on power-constrained edge computing platforms. △ Less

Submitted 23 October, 2023; originally announced October 2023.

arXiv:2310.14954 [pdf, other]

doi 10.1109/LSP.2023.3327585

Key Frame Mechanism For Efficient Conformer Based End-to-end Speech Recognition

Authors: Peng Fan, Changhao Shan, Sining Sun, Qing Yang, Jianwei Zhang

Abstract: Recently, Conformer as a backbone network for end-to-end automatic speech recognition achieved state-of-the-art performance. The Conformer block leverages a self-attention mechanism to capture global information, along with a convolutional neural network to capture local information, resulting in improved performance. However, the Conformer-based model encounters an issue with the self-attention m… ▽ More Recently, Conformer as a backbone network for end-to-end automatic speech recognition achieved state-of-the-art performance. The Conformer block leverages a self-attention mechanism to capture global information, along with a convolutional neural network to capture local information, resulting in improved performance. However, the Conformer-based model encounters an issue with the self-attention mechanism, as computational complexity grows quadratically with the length of the input sequence. Inspired by previous Connectionist Temporal Classification (CTC) guided blank skip** during decoding, we introduce intermediate CTC outputs as guidance into the downsampling procedure of the Conformer encoder. We define the frame with non-blank output as key frame. Specifically, we introduce the key frame-based self-attention (KFSA) mechanism, a novel method to reduce the computation of the self-attention mechanism using key frames. The structure of our proposed approach comprises two encoders. Following the initial encoder, we introduce an intermediate CTC loss function to compute the label frame, enabling us to extract the key frames and blank frames for KFSA. Furthermore, we introduce the key frame-based downsampling (KFDS) mechanism to operate on high-dimensional acoustic features directly and drop the frames corresponding to blank labels, which results in new acoustic feature sequences as input to the second encoder. By using the proposed method, which achieves comparable or higher performance than vanilla Conformer and other similar work such as Efficient Conformer. Meantime, our proposed method can discard more than 60\% useless frames during model training and inference, which will accelerate the inference speed significantly. This work code is available in {https://github.com/scufan1990/Key-Frame-Mechanism-For-Efficient-Conformer} △ Less

Submitted 28 October, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

Comments: This manuscript has been accepted by IEEE Signal Processing Letters for publication

arXiv:2310.13173 [pdf, ps, other]

Trudinger-Moser and Hardy-Trudinger-Moser inequalities for the Aharonov-Bohm Magnetic field

Authors: Guozhen Lu, Qiaohua Yang

Abstract: The main results of this paper concern sharp constant of the Trudinger-Moser inequality in $\mathbb{R}^{2}$ for Aharonov-Bohm magnetic fields. This is a borderline case of the Hardy type inequalities for Aharonov-Bohm magnetic fields in $\mathbb{R}^2$ studied by A. Laptev and T. Weidl. As an application, we obtain the exact asymptotic estimates on best constants of magnetic Hardy-Sobolev inequalit… ▽ More The main results of this paper concern sharp constant of the Trudinger-Moser inequality in $\mathbb{R}^{2}$ for Aharonov-Bohm magnetic fields. This is a borderline case of the Hardy type inequalities for Aharonov-Bohm magnetic fields in $\mathbb{R}^2$ studied by A. Laptev and T. Weidl. As an application, we obtain the exact asymptotic estimates on best constants of magnetic Hardy-Sobolev inequalities. In order to achieve our goal, we introduce a new operator $T_{a}$ on the unit circle $\mathbb{S}^{1}$ and give the asymptotic estimates of the heat kernel $e^{tT_{a}}$ via the Poisson summation formula. Finally, we show that such Trudinger-Moser inequalities in the unit ball $\mathbb{B}^{2}$ can be improved via subtraction of an additional Hardy term to derive a Hardy-Trudinger-Moser inequality. △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: 25 pages

arXiv:2310.12674 [pdf, other]

Observation of the Antimatter Hypernucleus $^4_{\barΛ}\overline{\hbox{H}}$

Authors: STAR Collaboration, M. I. Abdulhamid, B. E. Aboona, J. Adam, L. Adamczyk, J. R. Adams, I. Aggarwal, M. M. Aggarwal, Z. Ahammed, E. C. Aschenauer, S. Aslam, J. Atchison, V. Bairathi, J. G. Ball Cap, K. Barish, R. Bellwied, P. Bhagat, A. Bhasin, S. Bhatta, S. R. Bhosale, J. Bielcik, J. Bielcikova, J. D. Brandenburg, C. Broodo, X. Z. Cai , et al. (342 additional authors not shown)

Abstract: At the origin of the Universe, asymmetry between the amount of created matter and antimatter led to the matter-dominated Universe as we know today. The origins of this asymmetry remain not completely understood yet. High-energy nuclear collisions create conditions similar to the Universe microseconds after the Big Bang, with comparable amounts of matter and antimatter. Much of the created antimatt… ▽ More At the origin of the Universe, asymmetry between the amount of created matter and antimatter led to the matter-dominated Universe as we know today. The origins of this asymmetry remain not completely understood yet. High-energy nuclear collisions create conditions similar to the Universe microseconds after the Big Bang, with comparable amounts of matter and antimatter. Much of the created antimatter escapes the rapidly expanding fireball without annihilating, making such collisions an effective experimental tool to create heavy antimatter nuclear objects and study their properties, ho** to shed some light on existing questions on the asymmetry between matter and antimatter. Here we report the first observation of the antimatter hypernucleus \hbox{$^4_{\barΛ}\overline{\hbox{H}}$}, composed of a $\barΛ$ , an antiproton and two antineutrons. The discovery was made through its two-body decay after production in ultrarelativistic heavy-ion collisions by the STAR experiment at the Relativistic Heavy Ion Collider. In total, 15.6 candidate \hbox{$^4_{\barΛ}\overline{\hbox{H}}$} antimatter hypernuclei are obtained with an estimated background count of 6.4. The lifetimes of the antihypernuclei \hbox{$^3_{\barΛ}\overline{\hbox{H}}$} and \hbox{$^4_{\barΛ}\overline{\hbox{H}}$} are measured and compared with the lifetimes of their corresponding hypernuclei, testing the symmetry between matter and antimatter. Various production yield ratios among (anti)hypernuclei and (anti)nuclei are also measured and compared with theoretical model predictions, shedding light on their production mechanisms. △ Less

Submitted 8 June, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

Comments: 28 pages, 5 figures in the main paper; 16 pages, 5 figures in the methods part

arXiv:2310.10049 [pdf, other]

FATE-LLM: A Industrial Grade Federated Learning Framework for Large Language Models

Authors: Tao Fan, Yan Kang, Guoqiang Ma, Wei**g Chen, Wenbin Wei, Lixin Fan, Qiang Yang

Abstract: Large Language Models (LLMs), such as ChatGPT, LLaMA, GLM, and PaLM, have exhibited remarkable performances across various tasks in recent years. However, LLMs face two main challenges in real-world applications. One challenge is that training LLMs consumes vast computing resources, preventing LLMs from being adopted by small and medium-sized enterprises with limited computing resources. Another i… ▽ More Large Language Models (LLMs), such as ChatGPT, LLaMA, GLM, and PaLM, have exhibited remarkable performances across various tasks in recent years. However, LLMs face two main challenges in real-world applications. One challenge is that training LLMs consumes vast computing resources, preventing LLMs from being adopted by small and medium-sized enterprises with limited computing resources. Another is that training LLM requires a large amount of high-quality data, which are often scattered among enterprises. To address these challenges, we propose FATE-LLM, an industrial-grade federated learning framework for large language models. FATE-LLM (1) facilitates federated learning for large language models (coined FedLLM); (2) promotes efficient training of FedLLM using parameter-efficient fine-tuning methods; (3) protects the intellectual property of LLMs; (4) preserves data privacy during training and inference through privacy-preserving mechanisms. We release the code of FATE-LLM at https://github.com/FederatedAI/FATE-LLM to facilitate the research of FedLLM and enable a broad range of industrial applications. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Showing 151–200 of 1,410 results for author: Yang, Q