Search | arXiv e-print repository

Knowledge-Aware Parsimony Learning: A Perspective from Relational Graphs

Authors: Quanming Yao, Yongqi Zhang, Yaqing Wang, Nan Yin, James Kwok, Qiang Yang

Abstract: The scaling law, a strategy that involves the brute-force scaling of the training dataset and learnable parameters, has become a prevalent approach for develo** stronger learning models. In this paper, we examine its rationale in terms of learning from relational graphs. We demonstrate that directly adhering to such a scaling law does not necessarily yield stronger models due to architectural in… ▽ More The scaling law, a strategy that involves the brute-force scaling of the training dataset and learnable parameters, has become a prevalent approach for develo** stronger learning models. In this paper, we examine its rationale in terms of learning from relational graphs. We demonstrate that directly adhering to such a scaling law does not necessarily yield stronger models due to architectural incompatibility and representation bottlenecks. To tackle this challenge, we propose a novel framework for learning from relational graphs via knowledge-aware parsimony learning. Our method draws inspiration from the duality between data and knowledge inherent in these graphs. Specifically, we first extract knowledge (like symbolic logic and physical laws) during the learning process, and then apply combinatorial generalization to the task at hand. This extracted knowledge serves as the ``building blocks'' for achieving parsimony learning. By applying this philosophy to architecture, parameters, and inference, we can effectively achieve versatile, sample-efficient, and interpretable learning. Experimental results show that our proposed framework surpasses methods that strictly follow the traditional scaling-up roadmap. This highlights the importance of incorporating knowledge in the development of next-generation learning technologies. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2406.19703 [pdf, other]

Vision Transformer with Key-select Routing Attention for Single Image Dehazing

Authors: Lihan Tong, Weijia Li, Qingxia Yang, Liyuan Chen, Peng Chen

Abstract: We present Ksformer, utilizing Multi-scale Key-select Routing Attention (MKRA) for intelligent selection of key areas through multi-channel, multi-scale windows with a top-k operator, and Lightweight Frequency Processing Module (LFPM) to enhance high-frequency features, outperforming other dehazing methods in tests. We present Ksformer, utilizing Multi-scale Key-select Routing Attention (MKRA) for intelligent selection of key areas through multi-channel, multi-scale windows with a top-k operator, and Lightweight Frequency Processing Module (LFPM) to enhance high-frequency features, outperforming other dehazing methods in tests. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: 5 pages,4 figures,IEICE Trans. Information and Systems

Report number: Vol.E107-D,No.11,pp.-,Nov. 2024 MSC Class: 68U10(Primary) ACM Class: I.4

arXiv:2406.18862 [pdf, other]

Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study

Authors: Peikun Chen, Sining Sun, Changhao Shan, Qing Yang, Lei Xie

Abstract: Unified speech-text models like SpeechGPT, VioLA, and AudioPaLM have shown impressive performance across various speech-related tasks, especially in Automatic Speech Recognition (ASR). These models typically adopt a unified method to model discrete speech and text tokens, followed by training a decoder-only transformer. However, they are all designed for non-streaming ASR tasks, where the entire s… ▽ More Unified speech-text models like SpeechGPT, VioLA, and AudioPaLM have shown impressive performance across various speech-related tasks, especially in Automatic Speech Recognition (ASR). These models typically adopt a unified method to model discrete speech and text tokens, followed by training a decoder-only transformer. However, they are all designed for non-streaming ASR tasks, where the entire speech utterance is needed during decoding. Hence, we introduce a decoder-only model exclusively designed for streaming recognition, incorporating a dedicated boundary token to facilitate streaming recognition and employing causal attention masking during the training phase. Furthermore, we introduce right-chunk attention and various data augmentation techniques to improve the model's contextual modeling abilities. While achieving streaming speech recognition, experiments on the AISHELL-1 and -2 datasets demonstrate the competitive performance of our streaming approach with non-streaming decoder-only counterparts. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: Accepted for Interspeech 2024

arXiv:2406.17404 [pdf, other]

Make Some Noise: Unlocking Language Model Parallel Inference Capability through Noisy Training

Authors: Yixuan Wang, Xianzhen Luo, Fuxuan Wei, Yijun Liu, Qingfu Zhu, Xuanyu Zhang, Qing Yang, Dongliang Xu, Wanxiang Che

Abstract: Existing speculative decoding methods typically require additional model structure and training processes to assist the model for draft token generation. This makes the migration of acceleration methods to the new model more costly and more demanding on device memory. To address this problem, we propose the Make Some Noise (MSN) training framework as a replacement for the supervised fine-tuning st… ▽ More Existing speculative decoding methods typically require additional model structure and training processes to assist the model for draft token generation. This makes the migration of acceleration methods to the new model more costly and more demanding on device memory. To address this problem, we propose the Make Some Noise (MSN) training framework as a replacement for the supervised fine-tuning stage of the large language model. The training method simply introduces some noise at the input for the model to learn the denoising task. It significantly enhances the parallel decoding capability of the model without affecting the original task capability. In addition, we propose a tree-based retrieval-augmented Jacobi (TR-Jacobi) decoding strategy to further improve the inference speed of MSN models. Experiments in both the general and code domains have shown that MSN can improve inference speed by 2.3-2.7x times without compromising model performance. The MSN model also achieves comparable acceleration ratios to the SOTA model with additional model structure on Spec-Bench. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 11 pages, 6 figures

arXiv:2406.16520 [pdf]

Gigantic-oxidative atomically layered epitaxy for designed complex oxides

Authors: Guangdi Zhou, Haoliang Huang, Fengzhe Wang, Heng Wang, Qishuo Yang, Zihao Nie, Wei Lv, Cui Ding, Yueying Li, Danfeng Li, Yujie Sun, Junhao Lin, Guang-Ming Zhang, Qi-Kun Xue, Zhuoyu Chen

Abstract: In designing material functionality within the intricate realm of transition metal oxides, lattice structure and d-orbital occupancy are two principal determinants of the correlated physical properties, such as superconductivity. However, the modulation of these two factors is inherently limited by the need to balance thermodynamic stability, kinetic mobility, and synthesis precision, particularly… ▽ More In designing material functionality within the intricate realm of transition metal oxides, lattice structure and d-orbital occupancy are two principal determinants of the correlated physical properties, such as superconductivity. However, the modulation of these two factors is inherently limited by the need to balance thermodynamic stability, kinetic mobility, and synthesis precision, particularly for oxidation-demanding phases. We introduce a methodology, namely the gigantic-oxidative atomically layered epitaxy (GOAL-Epitaxy), enhancing oxidation power 3-4 orders of magnitude beyond oxide molecular beam epitaxy (OMBE) and pulsed laser deposition (PLD), while ensuring atomic-layer-by-layer growth of designed complex structures. Consequently, thermodynamic stability is markedly augmented at elevated temperatures, improving growth kinetics. We demonstrate the accurate synthesis of complex nickelates and cuprates, especially an artificially designed structure as a parent of high-temperature superconductivity, in which alternating single and double NiO2 layers possess distinct nominal d-orbital occupancy. The GOAL-Epitaxy enables material discovery within the vastly broadened growth parameter space. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.16442 [pdf, other]

EmoLLM: Multimodal Emotional Understanding Meets Large Language Models

Authors: Qu Yang, Mang Ye, Bo Du

Abstract: Multi-modal large language models (MLLMs) have achieved remarkable performance on objective multimodal perception tasks, but their ability to interpret subjective, emotionally nuanced multimodal content remains largely unexplored. Thus, it impedes their ability to effectively understand and react to the intricate emotions expressed by humans through multimodal media. To bridge this gap, we introdu… ▽ More Multi-modal large language models (MLLMs) have achieved remarkable performance on objective multimodal perception tasks, but their ability to interpret subjective, emotionally nuanced multimodal content remains largely unexplored. Thus, it impedes their ability to effectively understand and react to the intricate emotions expressed by humans through multimodal media. To bridge this gap, we introduce EmoBench, the first comprehensive benchmark designed specifically to evaluate the emotional capabilities of MLLMs across five popular emotional tasks, using a diverse dataset of 287k images and videos paired with corresponding textual instructions. Meanwhile, we propose EmoLLM, a novel model for multimodal emotional understanding, incorporating with two core techniques. 1) Multi-perspective Visual Projection, it captures diverse emotional cues from visual data from multiple perspectives. 2) EmoPrompt, it guides MLLMs to reason about emotions in the correct direction. Experimental results demonstrate that EmoLLM significantly elevates multimodal emotional understanding performance, with an average improvement of 12.1% across multiple foundation models on EmoBench. Our work contributes to the advancement of MLLMs by facilitating a deeper and more nuanced comprehension of intricate human emotions, paving the way for the development of artificial emotional intelligence capabilities with wide-ranging applications in areas such as human-computer interaction, mental health support, and empathetic AI systems. Code, data, and model will be released. △ Less

Submitted 29 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

Comments: 9 pages

arXiv:2406.16278 [pdf, ps, other]

Sharp fractional Sobolev and related inequalities on H-type groups

Authors: Yaojun Wang, Qiaohua Yang

Abstract: We determine the sharp constants for the fractional Sobolev inequalities associated with the conformally invariant fractional powers $\mathcal{L}_{s}(0<s<1)$ of the sublaplacian on H-type groups. From these inequalities we derive a sharp log-Sobolev inequality by considering a limiting case and a sharp Sobolev trace inequality. The later extends to this context the result of Frank, González, Monti… ▽ More We determine the sharp constants for the fractional Sobolev inequalities associated with the conformally invariant fractional powers $\mathcal{L}_{s}(0<s<1)$ of the sublaplacian on H-type groups. From these inequalities we derive a sharp log-Sobolev inequality by considering a limiting case and a sharp Sobolev trace inequality. The later extends to this context the result of Frank, González, Monticelli and Tan (Adv. Math, 2015). △ Less

Submitted 27 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.16271 [pdf, other]

Feature-prompting GBMSeg: One-Shot Reference Guided Training-Free Prompt Engineering for Glomerular Basement Membrane Segmentation

Authors: Xueyu Liu, Guangze Shi, Rui Wang, Yexin Lai, Jianan Zhang, Lele Sun, Quan Yang, Yongfei Wu, MIng Li, Weixia Han, Wen Zheng

Abstract: Assessment of the glomerular basement membrane (GBM) in transmission electron microscopy (TEM) is crucial for diagnosing chronic kidney disease (CKD). The lack of domain-independent automatic segmentation tools for the GBM necessitates an AI-based solution to automate the process. In this study, we introduce GBMSeg, a training-free framework designed to automatically segment the GBM in TEM images… ▽ More Assessment of the glomerular basement membrane (GBM) in transmission electron microscopy (TEM) is crucial for diagnosing chronic kidney disease (CKD). The lack of domain-independent automatic segmentation tools for the GBM necessitates an AI-based solution to automate the process. In this study, we introduce GBMSeg, a training-free framework designed to automatically segment the GBM in TEM images guided only by a one-shot annotated reference. Specifically, GBMSeg first exploits the robust feature matching capabilities of the pretrained foundation model to generate initial prompt points, then introduces a series of novel automatic prompt engineering techniques across the feature and physical space to optimize the prompt scheme. Finally, GBMSeg employs a class-agnostic foundation segmentation model with the generated prompt scheme to obtain accurate segmentation results. Experimental results on our collected 2538 TEM images confirm that GBMSeg achieves superior segmentation performance with a Dice similarity coefficient (DSC) of 87.27% using only one labeled reference image in a training-free manner, outperforming recently proposed one-shot or few-shot methods. In summary, GBMSeg introduces a distinctive automatic prompt framework that facilitates robust domain-independent segmentation performance without training, particularly advancing the automatic prompting of foundation segmentation models for medical images. Future work involves automating the thickness measurement of segmented GBM and quantifying pathological indicators, holding significant potential for advancing pathology assessments in clinical applications. The source code is available on https://github.com/SnowRain510/GBMSeg △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: Accepted for MICCAI2024

arXiv:2406.13007 [pdf, other]

NTIRE 2024 Challenge on Night Photography Rendering

Authors: Egor Ershov, Artyom Panshin, Oleg Karasev, Sergey Korchagin, Shepelev Lev, Alexandr Startsev, Daniil Vladimirov, Ekaterina Zaychenkova, Nikola Banić, Dmitrii Iarchuk, Maria Efimova, Radu Timofte, Arseniy Terekhin, Shuwei Yue, Yuyang Liu, Minchen Wei, Lu Xu, Chao Zhang, Yasi Wang, Furkan Kınlı, Doğa Yılmaz, Barış Özcan, Furkan Kıraç, Shuai Liu, **gyuan Xiao , et al. (25 additional authors not shown)

Abstract: This paper presents a review of the NTIRE 2024 challenge on night photography rendering. The goal of the challenge was to find solutions that process raw camera images taken in nighttime conditions, and thereby produce a photo-quality output images in the standard RGB (sRGB) space. Unlike the previous year's competition, the challenge images were collected with a mobile phone and the speed of algo… ▽ More This paper presents a review of the NTIRE 2024 challenge on night photography rendering. The goal of the challenge was to find solutions that process raw camera images taken in nighttime conditions, and thereby produce a photo-quality output images in the standard RGB (sRGB) space. Unlike the previous year's competition, the challenge images were collected with a mobile phone and the speed of algorithms was also measured alongside the quality of their output. To evaluate the results, a sufficient number of viewers were asked to assess the visual quality of the proposed solutions, considering the subjective nature of the task. There were 2 nominations: quality and efficiency. Top 5 solutions in terms of output quality were sorted by evaluation time (see Fig. 1). The top ranking participants' solutions effectively represent the state-of-the-art in nighttime photography rendering. More results can be found at https://nightimaging.org. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 10 pages, 10 figures

arXiv:2406.12726 [pdf, other]

ED-sKWS: Early-Decision Spiking Neural Networks for Rapid,and Energy-Efficient Keyword Spotting

Authors: Zeyang Song, Qianhui Liu, Qu Yang, Yizhou Peng, Haizhou Li

Abstract: Keyword Spotting (KWS) is essential in edge computing requiring rapid and energy-efficient responses. Spiking Neural Networks (SNNs) are well-suited for KWS for their efficiency and temporal capacity for speech. To further reduce the latency and energy consumption, this study introduces ED-sKWS, an SNN-based KWS model with an early-decision mechanism that can stop speech processing and output the… ▽ More Keyword Spotting (KWS) is essential in edge computing requiring rapid and energy-efficient responses. Spiking Neural Networks (SNNs) are well-suited for KWS for their efficiency and temporal capacity for speech. To further reduce the latency and energy consumption, this study introduces ED-sKWS, an SNN-based KWS model with an early-decision mechanism that can stop speech processing and output the result before the end of speech utterance. Furthermore, we introduce a Cumulative Temporal (CT) loss that can enhance prediction accuracy at both the intermediate and final timesteps. To evaluate early-decision performance, we present the SC-100 dataset including 100 speech commands with beginning and end timestamp annotation. Experiments on the Google Speech Commands v2 and our SC-100 datasets show that ED-sKWS maintains competitive accuracy with 61% timesteps and 52% energy consumption compared to SNN models without early-decision mechanism, ensuring rapid response and energy efficiency. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Accepted by INTERSPEECH2024

arXiv:2406.12403 [pdf, other]

PDSS: A Privacy-Preserving Framework for Step-by-Step Distillation of Large Language Models

Authors: Tao Fan, Yan Kang, Wei**g Chen, Hanlin Gu, Yuanfeng Song, Lixin Fan, Kai Chen, Qiang Yang

Abstract: In the context of real-world applications, leveraging large language models (LLMs) for domain-specific tasks often faces two major challenges: domain-specific knowledge privacy and constrained resources. To address these issues, we propose PDSS, a privacy-preserving framework for step-by-step distillation of LLMs. PDSS works on a server-client architecture, wherein client transmits perturbed promp… ▽ More In the context of real-world applications, leveraging large language models (LLMs) for domain-specific tasks often faces two major challenges: domain-specific knowledge privacy and constrained resources. To address these issues, we propose PDSS, a privacy-preserving framework for step-by-step distillation of LLMs. PDSS works on a server-client architecture, wherein client transmits perturbed prompts to the server's LLM for rationale generation. The generated rationales are then decoded by the client and used to enrich the training of task-specific small language model(SLM) within a multi-task learning paradigm. PDSS introduces two privacy protection strategies: the Exponential Mechanism Strategy and the Encoder-Decoder Strategy, balancing prompt privacy and rationale usability. Experiments demonstrate the effectiveness of PDSS in various text generation tasks, enabling the training of task-specific SLM with enhanced performance while prioritizing data privacy protection. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12254 [pdf, other]

Enhancing Single-Slice Segmentation with 3D-to-2D Unpaired Scan Distillation

Authors: Xin Yu, Qi Yang, Han Liu, Ho Hin Lee, Yucheng Tang, Lucas W. Remedios, Michael Kim, Shunxing Bao, Ann Xenobia Moore, Luigi Ferrucci, Bennett A. Landman

Abstract: 2D single-slice abdominal computed tomography (CT) enables the assessment of body habitus and organ health with low radiation exposure. However, single-slice data necessitates the use of 2D networks for segmentation, but these networks often struggle to capture contextual information effectively. Consequently, even when trained on identical datasets, 3D networks typically achieve superior segmenta… ▽ More 2D single-slice abdominal computed tomography (CT) enables the assessment of body habitus and organ health with low radiation exposure. However, single-slice data necessitates the use of 2D networks for segmentation, but these networks often struggle to capture contextual information effectively. Consequently, even when trained on identical datasets, 3D networks typically achieve superior segmentation results. In this work, we propose a novel 3D-to-2D distillation framework, leveraging pre-trained 3D models to enhance 2D single-slice segmentation. Specifically, we extract the prediction distribution centroid from the 3D representations, to guide the 2D student by learning intra- and inter-class correlation. Unlike traditional knowledge distillation methods that require the same data input, our approach employs unpaired 3D CT scans with any contrast to guide the 2D student model. Experiments conducted on 707 subjects from the single-slice Baltimore Longitudinal Study of Aging (BLSA) dataset demonstrate that state-of-the-art 2D multi-organ segmentation methods can benefit from the 3D teacher model, achieving enhanced performance in single-slice multi-organ segmentation. Notably, our approach demonstrates considerable efficacy in low-data regimes, outperforming the model trained with all available training subjects even when utilizing only 200 training subjects. Thus, this work underscores the potential to alleviate manual annotation burdens. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.11967 [pdf, other]

Elf autoencoder: unsupervised exploration of flat-band materials using electronic band structure fingerprints

Authors: Henry Kelbrick Pentz, Thomas Warford, Ivan Timokhin, Qian Yang, Anupam Bhattacharya, Artem Mishchenko

Abstract: Two-dimensional materials with flat electronic bands are promising for realizing exotic quantum phenomena such as unconventional superconductivity and nontrivial topology, but exploring their vast chemical space remains challenging. Here, we introduce an unsupervised convolutional autoencoder agent (elf) that operates on electronic band structure images and is capable of map** band features and… ▽ More Two-dimensional materials with flat electronic bands are promising for realizing exotic quantum phenomena such as unconventional superconductivity and nontrivial topology, but exploring their vast chemical space remains challenging. Here, we introduce an unsupervised convolutional autoencoder agent (elf) that operates on electronic band structure images and is capable of map** band features and extracting the latent space representation as a fingerprint, enabling autonomous clustering of materials with common electronic properties beyond traditional chemical paradigms. Unsupervised visualisation of the latent space then helps to uncover hidden chemical trends and identify promising candidates based on similarities to well-studied exemplars. Our framework paves the way for the accelerated discovery of novel flat-band materials with desirable electronic characteristics. It complements high-throughput ab initio methods by rapidly screening candidates and guides further investigations into the mechanisms governing the emergence of flat-band physics. We believe the elf autoencoder will be a valuable tool for the autonomous discovery of previously unexplored flat-band materials, aiding in the unbiased identification of compounds with desirable electronic properties in vast 2D chemical space. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11158 [pdf, other]

Dynamic Modeling and Control for an Offshore Semisubmersible Floating Wind Turbine

Authors: Yingjie Gong, Qinmin Yang, Hua Geng, Wenchao Meng, Lin Wang

Abstract: Floating wind turbines (FWTs) hold significant potential for the exploitation of offshore renewable energy resources. Nevertheless, prior to the construction of FWTs, it is imperative to tackle several critical challenges, especially the issue of performance degradation under combined wind and wave loads. This study initiates with the development of a simplified nonlinear dynamical model for a sem… ▽ More Floating wind turbines (FWTs) hold significant potential for the exploitation of offshore renewable energy resources. Nevertheless, prior to the construction of FWTs, it is imperative to tackle several critical challenges, especially the issue of performance degradation under combined wind and wave loads. This study initiates with the development of a simplified nonlinear dynamical model for a semi-submersible FWT. In particular, both the rotor dynamics and the finite rotations of the platform are considered in presented modeling approach, thereby effectively capturing the complex interplay between the platform, tower, nacelle, and rotor under combined wind and wave loads. Subsequently, based on the developed FWT model, a novel adaptive nonlinear pitch controller is formulated with the goal of striking a trade-off between regulating power generation and reducing platform motion. Notably, the proposed control strategy adopts a continuous control approach, strategically beneficial in circumventing the chattering phenomenon commonly associated with sliding mode control. Furthermore, the controller integrates an online approximator and a robust integral of the sign of the tracking error, facilitating real-time learning of system unknown dynamics while compensating for bounded disturbances. Finally, both the accuracy of the established nonlinear FWT model in predicting key dynamics and the superiority of the presented pitch controller are validated through comprehensive comparative studies. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.10715 [pdf, other]

Chip-scale generation of 60-mode continuous-variable cluster states

Authors: Ze Wang, Kangkang Li, Yue Wang, Xin Zhou, Yinke Cheng, Boxuan **g, Fengxiao Sun, **cheng Li, Zhilin Li, Qihuang Gong, Qiongyi He, Bei-Bei Li, Qi-Fan Yang

Abstract: Increasing the number of entangled entities is crucial for achieving exponential computational speedups and secure quantum networks. Despite recent progress in generating large-scale entanglement through continuous-variable (CV) cluster states, translating these technologies to photonic chips has been hindered by decoherence, limiting the number of entangled entities to 8. Here, we demonstrate 60-… ▽ More Increasing the number of entangled entities is crucial for achieving exponential computational speedups and secure quantum networks. Despite recent progress in generating large-scale entanglement through continuous-variable (CV) cluster states, translating these technologies to photonic chips has been hindered by decoherence, limiting the number of entangled entities to 8. Here, we demonstrate 60-mode CVcluster states in a chip-based optical microresonator pumped by chromatic lasers. Resonantly-enhanced four-wave mixing processes establish entanglement between equidistant spectral quantum modes (qumodes), forming a quantum analogue of optical frequency combs. Decoherence is minimized to achieve unprecedented two-mode raw squeezing (>3 dB) from a chip. Using bichromatic and trichromatic pump lasers, we realize one- and two-dimensional cluster states with up to 60 qumodes. Our work provides a compact and scalable platform for constructing large-scale entangled quantum resources, which are appealing for performing computational and communicational tasks with quantum advantages. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.10616 [pdf, other]

doi 10.1145/3637528.3671660

HiFGL: A Hierarchical Framework for Cross-silo Cross-device Federated Graph Learning

Authors: Zhuoning Guo, Duanyi Yao, Qiang Yang, Hao Liu

Abstract: Federated Graph Learning (FGL) has emerged as a promising way to learn high-quality representations from distributed graph data with privacy preservation. Despite considerable efforts have been made for FGL under either cross-device or cross-silo paradigm, how to effectively capture graph knowledge in a more complicated cross-silo cross-device environment remains an under-explored problem. However… ▽ More Federated Graph Learning (FGL) has emerged as a promising way to learn high-quality representations from distributed graph data with privacy preservation. Despite considerable efforts have been made for FGL under either cross-device or cross-silo paradigm, how to effectively capture graph knowledge in a more complicated cross-silo cross-device environment remains an under-explored problem. However, this task is challenging because of the inherent hierarchy and heterogeneity of decentralized clients, diversified privacy constraints in different clients, and the cross-client graph integrity requirement. To this end, in this paper, we propose a Hierarchical Federated Graph Learning (HiFGL) framework for cross-silo cross-device FGL. Specifically, we devise a unified hierarchical architecture to safeguard federated GNN training on heterogeneous clients while ensuring graph integrity. Moreover, we propose a Secret Message Passing (SecMP) scheme to shield unauthorized access to subgraph-level and node-level sensitive information simultaneously. Theoretical analysis proves that HiFGL achieves multi-level privacy preservation with complexity guarantees. Extensive experiments on real-world datasets validate the superiority of the proposed framework against several baselines. Furthermore, HiFGL's versatile nature allows for its application in either solely cross-silo or cross-device settings, further broadening its utility in real-world FGL applications. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: Accepted by SIGKDD 2024

arXiv:2406.10606 [pdf, other]

Semantic Communication for Edge Intelligence Enabled Autonomous Driving System

Authors: Yunqi Feng, Hesheng Shen, Zhendong Shan, Qianqian Yang, Xiufang Shi

Abstract: Expected to provide higher transportation efficiency and security, autonomous driving has attracted substantial attentions from both industry and academia. Meanwhile, the emergence of edge intelligence has further introduced significant advancements to this field. However, the crucial demands of ultra-reliable and low-latency communications (URLLC) among the vehicles and edge servers have hindered… ▽ More Expected to provide higher transportation efficiency and security, autonomous driving has attracted substantial attentions from both industry and academia. Meanwhile, the emergence of edge intelligence has further introduced significant advancements to this field. However, the crucial demands of ultra-reliable and low-latency communications (URLLC) among the vehicles and edge servers have hindered the development of autonomous driving. In this article, we provide a brief overview of edge intelligence enabled autonomous driving system and current vehicle-to-everything (V2X) technologies. Moreover, challenges associated with massive data transmission in autonomous driving are highlighted from three perspectives: multi-modal data transmission and fusion, multi-user collaboration and connection, and multi-task training and execution. To cope with these challenges, we propose to incorporate semantic communication into autonomous driving to achieve highly efficient and task-oriented data transmission. Unlike traditional communications, semantic communication extracts task-relevant semantic feature from multi-sensory data. Specifically, a unified multi-user semantic communication system for transmitting multi-modal data and performing multi-task execution is designed for collaborative data transmission and decision making in autonomous driving. Simulation results demonstrate that the proposed system can significantly reduce data transmission volume without compromising task performance, as evidenced by the realization of a cooperative multi-vehicle target classification and detection task. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: This paper has been submitted to IEEE Network Magazine, and is ungergoing major revisions

arXiv:2406.10540 [pdf, other]

Generating and Evolving Reward Functions for Highway Driving with Large Language Models

Authors: Xu Han, Qiannan Yang, Xianda Chen, Xiaowen Chu, Meixin Zhu

Abstract: Reinforcement Learning (RL) plays a crucial role in advancing autonomous driving technologies by maximizing reward functions to achieve the optimal policy. However, crafting these reward functions has been a complex, manual process in many practices. To reduce this complexity, we introduce a novel framework that integrates Large Language Models (LLMs) with RL to improve reward function design in a… ▽ More Reinforcement Learning (RL) plays a crucial role in advancing autonomous driving technologies by maximizing reward functions to achieve the optimal policy. However, crafting these reward functions has been a complex, manual process in many practices. To reduce this complexity, we introduce a novel framework that integrates Large Language Models (LLMs) with RL to improve reward function design in autonomous driving. This framework utilizes the coding capabilities of LLMs, proven in other areas, to generate and evolve reward functions for highway scenarios. The framework starts with instructing LLMs to create an initial reward function code based on the driving environment and task descriptions. This code is then refined through iterative cycles involving RL training and LLMs' reflection, which benefits from their ability to review and improve the output. We have also developed a specific prompt template to improve LLMs' understanding of complex driving simulations, ensuring the generation of effective and error-free code. Our experiments in a highway driving simulator across three traffic configurations show that our method surpasses expert handcrafted reward functions, achieving a 22% higher average success rate. This not only indicates safer driving but also suggests significant gains in development productivity. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: 7 pages, 6 figures

arXiv:2406.10469 [pdf, other]

Object-Attribute-Relation Representation based Video Semantic Communication

Authors: Qiyuan Du, Yi** Duan, Qianqian Yang, Xiaoming Tao, Mérouane Debbah

Abstract: With the rapid growth of multimedia data volume, there is an increasing need for efficient video transmission in applications such as virtual reality and future video streaming services. Semantic communication is emerging as a vital technique for ensuring efficient and reliable transmission in low-bandwidth, high-noise settings. However, most current approaches focus on joint source-channel coding… ▽ More With the rapid growth of multimedia data volume, there is an increasing need for efficient video transmission in applications such as virtual reality and future video streaming services. Semantic communication is emerging as a vital technique for ensuring efficient and reliable transmission in low-bandwidth, high-noise settings. However, most current approaches focus on joint source-channel coding (JSCC) that depends on end-to-end training. These methods often lack an interpretable semantic representation and struggle with adaptability to various downstream tasks. In this paper, we introduce the use of object-attribute-relation (OAR) as a semantic framework for videos to facilitate low bit-rate coding and enhance the JSCC process for more effective video transmission. We utilize OAR sequences for both low bit-rate representation and generative video reconstruction. Additionally, we incorporate OAR into the image JSCC model to prioritize communication resources for areas more critical to downstream tasks. Our experiments on traffic surveillance video datasets assess the effectiveness of our approach in terms of video transmission performance. The empirical findings demonstrate that our OAR-based video coding method not only outperforms H.265 coding at lower bit-rates but also synergizes with JSCC to deliver robust and efficient video transmission. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.10277 [pdf]

Tellegen responses in metamaterials

Authors: Qingdong Yang, Xinhua Wen, Zhongfu Li, Oubo You, Shuang Zhang

Abstract: Tellegen medium has long been a topic of debate, with its existence being contested over several decades. It was first proposed by Tellegen in 1948 and is characterized by a real-valued cross coupling between electric and magnetic responses, distinguishing it from the well-known chiral medium that has imaginary coupling coefficients. Significantly, Tellegen responses are closely linked to axion dy… ▽ More Tellegen medium has long been a topic of debate, with its existence being contested over several decades. It was first proposed by Tellegen in 1948 and is characterized by a real-valued cross coupling between electric and magnetic responses, distinguishing it from the well-known chiral medium that has imaginary coupling coefficients. Significantly, Tellegen responses are closely linked to axion dynamics, an extensively studied subject in condensed matter physics. Here, we report the realization of Tellegen metamaterials in the microwave region through a judicious combination of subwavelength metallic resonators, gyromagnetic materials, and permanent magnet discs. We observe the key signature of the Tellegen response, i.e. a Kerr rotation for reflected wave, while the polarization remains the same in the transmission direction. The retrieved effective Tellegen parameter is several orders of magnitude greater than that of natural materials. Our work opens door to a variety of nonreciprocal photonic devices and may provide a platform for studying axion physics. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 19 pages, 4 figures

arXiv:2406.08278 [pdf, other]

doi 10.1088/1674-4527/ad5398

HiFAST : An HI Data Calibration and Imaging Pipeline for FAST II. Flux Density Calibration

Authors: Ziming Liu, Jie Wang, Yingjie **g, Zhi-Yu Zhang, Chen Xu, Tiantian Liang, Qingze Chen, Ningyu Tang, Qingliang Yang

Abstract: Accurate flux density calibration is essential for precise analysis and interpretation of observations across different observation modes and instruments. In this research, we firstly introduce the flux calibration model incorporated in HIFAST pipeline, designed for processing HI 21-cm spectra. Furthermore, we investigate different calibration techniques and assess the dependence of the gain param… ▽ More Accurate flux density calibration is essential for precise analysis and interpretation of observations across different observation modes and instruments. In this research, we firstly introduce the flux calibration model incorporated in HIFAST pipeline, designed for processing HI 21-cm spectra. Furthermore, we investigate different calibration techniques and assess the dependence of the gain parameter on the time and environmental factors. A comparison is carried out in various observation modes (e.g. tracking and scanning modes) to determine the flux density gain ($G$), revealing insignificant discrepancies in $G$ among different methods. Long-term monitoring data shows a linear correlation between $G$ and atmospheric temperature. After subtracting the $G$--Temperature dependence, the dispersion of $G$ is reduced to $<$3% over a one-year time scale. The stability of the receiver response of FAST is considered sufficient to facilitate HI observations that can accommodate a moderate error in flux calibration (e.g., $>\sim5\%$) when utilizing a constant $G$ for calibration purposes. Our study will serve as a useful addition to the results provided by Jiang et al. (2020). Detailed measurement of $G$ for the 19 beams of FAST, covering the frequency range 1000 MHz -- 1500 MHz can be found on the HIFAST homepage: https://hifast.readthedocs.io/fluxgain. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 14 pages, 15 figures, accepted by RAA

arXiv:2406.05811 [pdf, other]

CLT for Generalized Linear Spectral Statistics of High-dimensional Sample Covariance Matrices and Applications

Authors: Yanlin Hu, Qing Yang, Xiao Han

Abstract: In this paper, we introduce the $\mathbf{G}$eneralized $\mathbf{L}$inear $\mathbf{S}$pectral $\mathbf{S}$tatistics (GLSS) of a high-dimensional sample covariance matrix $\mathbf{S}_n$, denoted as $\operatorname{tr}f(\mathbf{S}_n)\mathbf{B}_n$, which effectively captures distinct spectral properties of $\mathbf{S}_n$ by involving an ancillary matrix $\mathbf{B}_n$ and a test function $f$. The joint… ▽ More In this paper, we introduce the $\mathbf{G}$eneralized $\mathbf{L}$inear $\mathbf{S}$pectral $\mathbf{S}$tatistics (GLSS) of a high-dimensional sample covariance matrix $\mathbf{S}_n$, denoted as $\operatorname{tr}f(\mathbf{S}_n)\mathbf{B}_n$, which effectively captures distinct spectral properties of $\mathbf{S}_n$ by involving an ancillary matrix $\mathbf{B}_n$ and a test function $f$. The joint asymptotic normality of GLSS associated with different test functions is established under weak assumptions on $\mathbf{B}_n$ and the underlying distribution, when the dimension $n$ and sample size $N$ are comparable. Specifically, we allow the rank of $\mathbf{B}_n$ to diverge with $n$. The convergence rate of GLSS is determined by $\sqrt{{N}/{\operatorname{rank}(\mathbf{B}_n)}}$. As a natural application, we propose a novel approach based on GLSS for hypothesis testing on eigenspaces of spiked covariance matrices. The theoretical accuracy of the results established for GLSS and the advantages of the newly suggested testing procedure are demonstrated through various numerical studies. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2406.04601 [pdf, other]

Enhancing Size Generalization in Graph Neural Networks through Disentangled Representation Learning

Authors: Zheng Huang, Qihui Yang, Dawei Zhou, Yujun Yan

Abstract: Although most graph neural networks (GNNs) can operate on graphs of any size, their classification performance often declines on graphs larger than those encountered during training. Existing methods insufficiently address the removal of size information from graph representations, resulting in sub-optimal performance and reliance on backbone models. In response, we propose DISGEN, a novel and mod… ▽ More Although most graph neural networks (GNNs) can operate on graphs of any size, their classification performance often declines on graphs larger than those encountered during training. Existing methods insufficiently address the removal of size information from graph representations, resulting in sub-optimal performance and reliance on backbone models. In response, we propose DISGEN, a novel and model-agnostic framework designed to disentangle size factors from graph representations. DISGEN employs size- and task-invariant augmentations and introduces a decoupling loss that minimizes shared information in hidden representations, with theoretical guarantees for its effectiveness. Our empirical results show that DISGEN outperforms the state-of-the-art models by up to 6% on real-world datasets, underscoring its effectiveness in enhancing the size generalizability of GNNs. Our codes are available at: https://github.com/GraphmindDartmouth/DISGEN. △ Less

Submitted 11 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.04323 [pdf, other]

ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories

Authors: Qianlan Yang, Yu-Xiong Wang

Abstract: Training autonomous agents with sparse rewards is a long-standing problem in online reinforcement learning (RL), due to low data efficiency. Prior work overcomes this challenge by extracting useful knowledge from offline data, often accomplished through the learning of action distribution from offline data and utilizing the learned distribution to facilitate online RL. However, since the offline d… ▽ More Training autonomous agents with sparse rewards is a long-standing problem in online reinforcement learning (RL), due to low data efficiency. Prior work overcomes this challenge by extracting useful knowledge from offline data, often accomplished through the learning of action distribution from offline data and utilizing the learned distribution to facilitate online RL. However, since the offline data are given and fixed, the extracted knowledge is inherently limited, making it difficult to generalize to new tasks. We propose a novel approach that leverages offline data to learn a generative diffusion model, coined as Adaptive Trajectory Diffuser (ATraDiff). This model generates synthetic trajectories, serving as a form of data augmentation and consequently enhancing the performance of online RL methods. The key strength of our diffuser lies in its adaptability, allowing it to effectively handle varying trajectory lengths and mitigate distribution shifts between online and offline data. Because of its simplicity, ATraDiff seamlessly integrates with a wide spectrum of RL methods. Empirical evaluation shows that ATraDiff consistently achieves state-of-the-art performance across a variety of environments, with particularly pronounced improvements in complicated settings. Our code and demo video are available at https://atradiff.github.io . △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: ICML 2024 Accepted

arXiv:2406.04025 [pdf]

The syntax-semantics interface in a child's path: A study of 3- to 11-year-olds' elicited production of Mandarin recursive relative clauses

Authors: Caimei Yang, Qihang Yang, Xingzhi Su, Chenxi Fu, Xiaoyi Wang, Ying Yan, Zaijiang Man

Abstract: There have been apparently conflicting claims over the syntax-semantics relationship in child acquisition. However, few of them have assessed the child's path toward the acquisition of recursive relative clauses (RRCs). The authors of the current paper did experiments to investigate 3- to 11-year-olds' most-structured elicited production of eight Mandarin RRCs in a 4 (syntactic types)*2 (semantic… ▽ More There have been apparently conflicting claims over the syntax-semantics relationship in child acquisition. However, few of them have assessed the child's path toward the acquisition of recursive relative clauses (RRCs). The authors of the current paper did experiments to investigate 3- to 11-year-olds' most-structured elicited production of eight Mandarin RRCs in a 4 (syntactic types)*2 (semantic conditions) design. The four syntactic types were RRCs with a subject-gapped RC embedded in an object-gapped RC (SORRCs), RRCs with an object-gapped RC embedded in another object-gapped RC (OORRCs), RRCs with an object-gapped RC embedded in a subject-gapped RC (OSRRCs), and RRCs with a subject-gapped RC embedded in another subject-gapped RC (SSRRCs). Each syntactic type was put in two conditions differing in internal semantics: irreversible internal semantics (IIS) and reversible internal semantics (RIS). For example, "the balloon that [the girl that _ eats the banana] holds _" is SORRCs in the IIS condition; "the monkey that [the dog that _ bites the pig] hits_" is SORRCs in the RIS condition. For each target, the participants were provided with a speech-visual stimulus constructing a condition of irreversible external semantics (IES). The results showed that SSRRCs, OSRRCs and SORRCs in the IIS-IES condition were produced two years earlier than their counterparts in the RIS-IES condition. Thus, a 2-stage development path is proposed: the language acquisition device starts with the interface between (irreversible) syntax and IIS, and ends with the interface between syntax and IES, both abiding by the syntax-semantic interface principle. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.03868 [pdf, other]

PALM: A Efficient Performance Simulator for Tiled Accelerators with Large-scale Model Training

Authors: Jiahao Fang, Huizheng Wang, Qize Yang, Dehao Kong, Xu Dai, **yi Deng, Yang Hu, Shouyi Yin

Abstract: Deep learning (DL) models are piquing high interest and scaling at an unprecedented rate. To this end, a handful of tiled accelerators have been proposed to support such large-scale training tasks. However, these accelerators often incorporate numerous cores or tiles even extending to wafer-scale, substantial on-chip bandwidth, and distributed memory systems. This results in an exceedingly complex… ▽ More Deep learning (DL) models are piquing high interest and scaling at an unprecedented rate. To this end, a handful of tiled accelerators have been proposed to support such large-scale training tasks. However, these accelerators often incorporate numerous cores or tiles even extending to wafer-scale, substantial on-chip bandwidth, and distributed memory systems. This results in an exceedingly complex design space. Moreover, conducting actual training experiments to find optimal configurations is impractical due to time constraints. Hence, predicting the optimal map** of various parallelisms to such tiled system architectures becomes crucial. In this study, leveraging an analysis of existing mainstream DL model training strategies, we introduce a performance simulator named PALM. PALM targets both the training and inference processes for tiled accelerators, aiming to inspire the design of current and future accelerators. Specifically, (i) we establish a scheduling mechanism among tiled accelerators based on an event-driven framework; (ii) we support user-configurable pipeline, tensor, and data parallelism on tiled accelerators, determining the absolute performance throughput under these parallelism strategies; (iii) we model the interaction of on-chip SRAM, NoC, and off-chip DRAM during operator execution. This work is available here: https://github.com/fangjh21/PALM. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 11 pages

arXiv:2406.02916 [pdf, other]

Real-time Motion Planning for autonomous vehicles in dynamic environments

Authors: Mohammad Dehghani Tezerjani, Dominic Carrillo, Deyuan Qu, Sudip Dhakal, Amir Mirzaeinia, Qing Yang

Abstract: Recent advancements in self-driving car technologies have enabled them to navigate autonomously through various environments. However, one of the critical challenges in autonomous vehicle operation is trajectory planning, especially in dynamic environments with moving obstacles. This research aims to tackle this challenge by proposing a robust algorithm tailored for autonomous cars operating in dy… ▽ More Recent advancements in self-driving car technologies have enabled them to navigate autonomously through various environments. However, one of the critical challenges in autonomous vehicle operation is trajectory planning, especially in dynamic environments with moving obstacles. This research aims to tackle this challenge by proposing a robust algorithm tailored for autonomous cars operating in dynamic environments with moving obstacles. The algorithm introduces two main innovations. Firstly, it defines path density by adjusting the number of waypoints along the trajectory, optimizing their distribution for accuracy in curved areas and reducing computational complexity in straight sections. Secondly, it integrates hierarchical motion planning algorithms, combining global planning with an enhanced $A^*$ graph-based method and local planning using the time elastic band algorithm with moving obstacle detection considering different motion models. The proposed algorithm is adaptable for different vehicle types and mobile robots, making it versatile for real-world applications. Simulation results demonstrate its effectiveness across various conditions, promising safer and more efficient navigation for autonomous vehicles in dynamic environments. These modifications significantly improve trajectory planning capabilities, addressing a crucial aspect of autonomous vehicle technology. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 8 pages

arXiv:2406.02852 [pdf]

Isolated anions induced high ionic conductivity

Authors: Qifan Yang, **g Xu, Yuqi Wang, Xiao Fu, Ruijuan Xiao, Hong Li

Abstract: One of the key materials in solid-state lithium batteries is fast ion conductors. However, the Li+ ion transport in inorganic crystals involves complex factors, making it a mystery to find and design ion conductors with low migration barriers. In this work, a distinctive structural characteristic involving isolated anions has been discovered to enhance high ionic conductivity in crystals. It is an… ▽ More One of the key materials in solid-state lithium batteries is fast ion conductors. However, the Li+ ion transport in inorganic crystals involves complex factors, making it a mystery to find and design ion conductors with low migration barriers. In this work, a distinctive structural characteristic involving isolated anions has been discovered to enhance high ionic conductivity in crystals. It is an effective way to create a smooth energy potential landscape and construct local pathways for lithium ion migration. By adjusting the spacing and arrangement of the isolated anions, these local pathways can connect with each other, leading to high ion conductivity. By designing different space groups and local environments of the Se2- anions in the Li8SiSe6 composition, combined with the ion transport properties obtained from AIMD simulations, we define isolated anions and find that local environment with higher point group symmetry promotes the formation of cage-like local transport channels. Additionally, the appropriate distance between neighboring isolated anions can create coplanar connections between adjacent cage-like channels. Furthermore, different types of isolated anions can be used to control the distribution of cage-like channels in the lattice. Based on the structural characteristic of isolated anions, we discovered compounds with isolated N3-, Cl-, I-, and S2- features from the crystal structure databases. The confirmation of ion transport in these structures validates the proposed design method of using isolated anions as structural features for fast ion conductors and leads to the discovery of several new fast ion conductor materials. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.02224 [pdf, other]

FedMKT: Federated Mutual Knowledge Transfer for Large and Small Language Models

Authors: Tao Fan, Guoqiang Ma, Yan Kang, Hanlin Gu, Yuanfeng Song, Lixin Fan, Kai Chen, Qiang Yang

Abstract: Recent research in federated large language models (LLMs) has primarily focused on enabling clients to fine-tune their locally deployed homogeneous LLMs collaboratively or on transferring knowledge from server-based LLMs to small language models (SLMs) at downstream clients. However, a significant gap remains in the simultaneous mutual enhancement of both the server's LLM and clients' SLMs. To bri… ▽ More Recent research in federated large language models (LLMs) has primarily focused on enabling clients to fine-tune their locally deployed homogeneous LLMs collaboratively or on transferring knowledge from server-based LLMs to small language models (SLMs) at downstream clients. However, a significant gap remains in the simultaneous mutual enhancement of both the server's LLM and clients' SLMs. To bridge this gap, we propose FedMKT, a parameter-efficient federated mutual knowledge transfer framework for large and small language models. This framework is designed to adaptively transfer knowledge from the server's LLM to clients' SLMs while concurrently enriching the LLM with clients' unique domain insights. We facilitate token alignment using minimum edit distance (MinED) and then selective mutual knowledge transfer between client-side SLMs and a server-side LLM, aiming to collectively enhance their performance. Through extensive experiments across three distinct scenarios, we evaluate the effectiveness of FedMKT using various public LLMs and SLMs on a range of NLP text generation tasks. Empirical results demonstrate that FedMKT simultaneously boosts the performance of both LLMs and SLMs. △ Less

Submitted 18 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.01956 [pdf, other]

Enhance Image-to-Image Generation with LLaVA Prompt and Negative Prompt

Authors: Zhicheng Ding, Panfeng Li, Qikai Yang, Siyang Li

Abstract: This paper presents a novel approach to enhance image-to-image generation by leveraging the multimodal capabilities of the Large Language and Vision Assistant (LLaVA). We propose a framework where LLaVA analyzes input images and generates textual descriptions, hereinafter LLaVA-generated prompts. These prompts, along with the original image, are fed into the image-to-image generation pipeline. Thi… ▽ More This paper presents a novel approach to enhance image-to-image generation by leveraging the multimodal capabilities of the Large Language and Vision Assistant (LLaVA). We propose a framework where LLaVA analyzes input images and generates textual descriptions, hereinafter LLaVA-generated prompts. These prompts, along with the original image, are fed into the image-to-image generation pipeline. This enriched representation guides the generation process towards outputs that exhibit a stronger resemblance to the input image. Extensive experiments demonstrate the effectiveness of LLaVA-generated prompts in promoting image similarity. We observe a significant improvement in the visual coherence between the generated and input images compared to traditional methods. Future work will explore fine-tuning LLaVA prompts for increased control over the creative process. By providing more specific details within the prompts, we aim to achieve a delicate balance between faithfulness to the original image and artistic expression in the generated outputs. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: Accepted by 2024 5th International Conference on Information Science, Parallel and Distributed Systems

arXiv:2406.01422 [pdf, other]

How to Understand Whole Software Repository?

Authors: Yingwei Ma, Qing** Yang, Rongyu Cao, Binhua Li, Fei Huang, Yongbin Li

Abstract: Recently, Large Language Model (LLM) based agents have advanced the significant development of Automatic Software Engineering (ASE). Although verified effectiveness, the designs of the existing methods mainly focus on the local information of codes, e.g., issues, classes, and functions, leading to limitations in capturing the global context and interdependencies within the software system. From th… ▽ More Recently, Large Language Model (LLM) based agents have advanced the significant development of Automatic Software Engineering (ASE). Although verified effectiveness, the designs of the existing methods mainly focus on the local information of codes, e.g., issues, classes, and functions, leading to limitations in capturing the global context and interdependencies within the software system. From the practical experiences of the human SE developers, we argue that an excellent understanding of the whole repository will be the critical path to ASE. However, understanding the whole repository raises various challenges, e.g., the extremely long code input, the noisy code information, the complex dependency relationships, etc. To this end, we develop a novel ASE method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories. Specifically, we first condense the critical information of the whole repository into the repository knowledge graph in a top-to-down mode to decrease the complexity of repository. Subsequently, we empower the agents the ability of understanding whole repository by proposing a Monte Carlo tree search based repository exploration strategy. In addition, to better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan. Then, they can manipulate the tools to dynamically acquire information and generate the patches to solve the real-world GitHub issues. Extensive experiments demonstrate the superiority and effectiveness of the proposed RepoUnderstander. It achieved 18.5\% relative improvement on the SWE-bench Lite benchmark compared to SWE-agent. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.01085 [pdf, other]

FedAdOb: Privacy-Preserving Federated Deep Learning with Adaptive Obfuscation

Authors: Hanlin Gu, Jiahuan Luo, Yan Kang, Yuan Yao, Gongxi Zhu, Bowen Li, Lixin Fan, Qiang Yang

Abstract: Federated learning (FL) has emerged as a collaborative approach that allows multiple clients to jointly learn a machine learning model without sharing their private data. The concern about privacy leakage, albeit demonstrated under specific conditions, has triggered numerous follow-up research in designing powerful attacking methods and effective defending mechanisms aiming to thwart these attacki… ▽ More Federated learning (FL) has emerged as a collaborative approach that allows multiple clients to jointly learn a machine learning model without sharing their private data. The concern about privacy leakage, albeit demonstrated under specific conditions, has triggered numerous follow-up research in designing powerful attacking methods and effective defending mechanisms aiming to thwart these attacking methods. Nevertheless, privacy-preserving mechanisms employed in these defending methods invariably lead to compromised model performances due to a fixed obfuscation applied to private data or gradients. In this article, we, therefore, propose a novel adaptive obfuscation mechanism, coined FedAdOb, to protect private data without yielding original model performances. Technically, FedAdOb utilizes passport-based adaptive obfuscation to ensure data privacy in both horizontal and vertical federated learning settings. The privacy-preserving capabilities of FedAdOb, specifically with regard to private features and labels, are theoretically proven through Theorems 1 and 2. Furthermore, extensive experimental evaluations conducted on various datasets and network architectures demonstrate the effectiveness of FedAdOb by manifesting its superior trade-off between privacy preservation and model performance, surpassing existing methods. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.20681 [pdf, other]

No Free Lunch Theorem for Privacy-Preserving LLM Inference

Authors: Xiao** Zhang, Yulin Fei, Yan Kang, Wei Chen, Lixin Fan, Hai **, Qiang Yang

Abstract: Individuals and businesses have been significantly benefited by Large Language Models (LLMs) including PaLM, Gemini and ChatGPT in various ways. For example, LLMs enhance productivity, reduce costs, and enable us to focus on more valuable tasks. Furthermore, LLMs possess the capacity to sift through extensive datasets, uncover underlying patterns, and furnish critical insights that propel the fron… ▽ More Individuals and businesses have been significantly benefited by Large Language Models (LLMs) including PaLM, Gemini and ChatGPT in various ways. For example, LLMs enhance productivity, reduce costs, and enable us to focus on more valuable tasks. Furthermore, LLMs possess the capacity to sift through extensive datasets, uncover underlying patterns, and furnish critical insights that propel the frontiers of technology and science. However, LLMs also pose privacy concerns. Users' interactions with LLMs may expose their sensitive personal or company information. A lack of robust privacy safeguards and legal frameworks could permit the unwarranted intrusion or improper handling of individual data, thereby risking infringements of privacy and the theft of personal identities. To ensure privacy, it is essential to minimize the dependency between shared prompts and private information. Various randomization approaches have been proposed to protect prompts' privacy, but they may incur utility loss compared to unprotected LLMs prompting. Therefore, it is essential to evaluate the balance between the risk of privacy leakage and loss of utility when conducting effective protection mechanisms. The current study develops a framework for inferring privacy-protected Large Language Models (LLMs) and lays down a solid theoretical basis for examining the interplay between privacy preservation and utility. The core insight is encapsulated within a theorem that is called as the NFL (abbreviation of the word No-Free-Lunch) Theorem. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.18802 [pdf, other]

Enhancing Security and Privacy in Federated Learning using Update Digests and Voting-Based Defense

Authors: Wenjie Li, Kai Fan, **gyuan Zhang, Hui Li, Wei Yang Bryan Lim, Qiang Yang

Abstract: Federated Learning (FL) is a promising privacy-preserving machine learning paradigm that allows data owners to collaboratively train models while kee** their data localized. Despite its potential, FL faces challenges related to the trustworthiness of both clients and servers, especially in the presence of curious or malicious adversaries. In this paper, we introduce a novel framework named \unde… ▽ More Federated Learning (FL) is a promising privacy-preserving machine learning paradigm that allows data owners to collaboratively train models while kee** their data localized. Despite its potential, FL faces challenges related to the trustworthiness of both clients and servers, especially in the presence of curious or malicious adversaries. In this paper, we introduce a novel framework named \underline{\textbf{F}}ederated \underline{\textbf{L}}earning with \underline{\textbf{U}}pdate \underline{\textbf{D}}igest (FLUD), which addresses the critical issues of privacy preservation and resistance to Byzantine attacks within distributed learning environments. FLUD utilizes an innovative approach, the $\mathsf{LinfSample}$ method, allowing clients to compute the $l_{\infty}$ norm across sliding windows of updates as an update digest. This digest enables the server to calculate a shared distance matrix, significantly reducing the overhead associated with Secure Multi-Party Computation (SMPC) by three orders of magnitude while effectively distinguishing between benign and malicious updates. Additionally, FLUD integrates a privacy-preserving, voting-based defense mechanism that employs optimized SMPC protocols to minimize communication rounds. Our comprehensive experiments demonstrate FLUD's effectiveness in countering Byzantine adversaries while incurring low communication and runtime overhead. FLUD offers a scalable framework for secure and reliable FL in distributed environments, facilitating its application in scenarios requiring robust data management and security. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 14 pages

arXiv:2405.18776 [pdf, other]

LMO-DP: Optimizing the Randomization Mechanism for Differentially Private Fine-Tuning (Large) Language Models

Authors: Qin Yang, Meisam Mohammad, Han Wang, Ali Payani, Ashish Kundu, Kai Shu, Yan Yan, Yuan Hong

Abstract: Differentially Private Stochastic Gradient Descent (DP-SGD) and its variants have been proposed to ensure rigorous privacy for fine-tuning large-scale pre-trained language models. However, they rely heavily on the Gaussian mechanism, which may overly perturb the gradients and degrade the accuracy, especially in stronger privacy regimes (e.g., the privacy budget $ε< 3$). To address such limitations… ▽ More Differentially Private Stochastic Gradient Descent (DP-SGD) and its variants have been proposed to ensure rigorous privacy for fine-tuning large-scale pre-trained language models. However, they rely heavily on the Gaussian mechanism, which may overly perturb the gradients and degrade the accuracy, especially in stronger privacy regimes (e.g., the privacy budget $ε< 3$). To address such limitations, we propose a novel Language Model-based Optimal Differential Privacy (LMO-DP) mechanism, which takes the first step to enable the tight composition of accurately fine-tuning (large) language models with a sub-optimal DP mechanism, even in strong privacy regimes (e.g., $0.1\leq ε<3$). Furthermore, we propose a novel offline optimal noise search method to efficiently derive the sub-optimal DP that significantly reduces the noise magnitude. For instance, fine-tuning RoBERTa-large (with 300M parameters) on the SST-2 dataset can achieve an accuracy of 92.20% (given $ε=0.3$, $δ=10^{-10}$) by drastically outperforming the Gaussian mechanism (e.g., $\sim 50\%$ for small $ε$ and $δ$). We also draw similar findings on the text generation tasks on GPT-2. Finally, to our best knowledge, LMO-DP is also the first solution to accurately fine-tune Llama-2 with strong differential privacy guarantees. The code will be released soon and available upon request. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 18 pages, 15 figures

arXiv:2405.17660 [pdf, other]

LoReTrack: Efficient and Accurate Low-Resolution Transformer Tracking

Authors: Shaohua Dong, Yunhe Feng, Qing Yang, Yuewei Lin, Heng Fan

Abstract: High-performance Transformer trackers have shown excellent results, yet they often bear a heavy computational load. Observing that a smaller input can immediately and conveniently reduce computations without changing the model, an easy solution is to adopt the low-resolution input for efficient Transformer tracking. Albeit faster, this hurts tracking accuracy much due to information loss in low re… ▽ More High-performance Transformer trackers have shown excellent results, yet they often bear a heavy computational load. Observing that a smaller input can immediately and conveniently reduce computations without changing the model, an easy solution is to adopt the low-resolution input for efficient Transformer tracking. Albeit faster, this hurts tracking accuracy much due to information loss in low resolution tracking. In this paper, we aim to mitigate such information loss to boost the performance of the low-resolution Transformer tracking via dual knowledge distillation from a frozen high-resolution (but not a larger) Transformer tracker. The core lies in two simple yet effective distillation modules, comprising query-key-value knowledge distillation (QKV-KD) and discrimination knowledge distillation (Disc-KD), across resolutions. The former, from the global view, allows the low-resolution tracker to inherit the features and interactions from the high-resolution tracker, while the later, from the target-aware view, enhances the target-background distinguishing capacity via imitating discriminative regions from its high-resolution counterpart. With the dual knowledge distillation, our Low-Resolution Transformer Tracker (LoReTrack) enjoys not only high efficiency owing to reduced computation but also enhanced accuracy by distilling knowledge from the high-resolution tracker. In extensive experiments, LoReTrack with a 256x256 resolution consistently improves baseline with the same resolution, and shows competitive or even better results compared to 384x384 high-resolution Transformer tracker, while running 52% faster and saving 56% MACs. Moreover, LoReTrack is resolution-scalable. With a 128x128 resolution, it runs 25 fps on a CPU with 64.9%/46.4% SUC scores on LaSOT/LaSOText, surpassing all other CPU real-time trackers. Code will be released. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.17522 [pdf, other]

Efficient Model Compression for Hierarchical Federated Learning

Authors: Xi Zhu, Songcan Yu, Junbo Wang, Qinglin Yang

Abstract: Federated learning (FL), as an emerging collaborative learning paradigm, has garnered significant attention due to its capacity to preserve privacy within distributed learning systems. In these systems, clients collaboratively train a unified neural network model using their local datasets and share model parameters rather than raw data, enhancing privacy. Predominantly, FL systems are designed fo… ▽ More Federated learning (FL), as an emerging collaborative learning paradigm, has garnered significant attention due to its capacity to preserve privacy within distributed learning systems. In these systems, clients collaboratively train a unified neural network model using their local datasets and share model parameters rather than raw data, enhancing privacy. Predominantly, FL systems are designed for mobile and edge computing environments where training typically occurs over wireless networks. Consequently, as model sizes increase, the conventional FL frameworks increasingly consume substantial communication resources. To address this challenge and improve communication efficiency, this paper introduces a novel hierarchical FL framework that integrates the benefits of clustered FL and model compression. We present an adaptive clustering algorithm that identifies a core client and dynamically organizes clients into clusters. Furthermore, to enhance transmission efficiency, each core client implements a local aggregation with compression (LC aggregation) algorithm after collecting compressed models from other clients within the same cluster. Simulation results affirm that our proposed algorithms not only maintain comparable predictive accuracy but also significantly reduce energy consumption relative to existing FL mechanisms. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.17221 [pdf, other]

Efficient Orchestrated AI Workflows Execution on Scale-out Spatial Architecture

Authors: **yi Deng, Xinru Tang, Zhiheng Yue, Guangyang Lu, Qize Yang, Jiahao Zhang, **xi Li, Chao Li, Shaojun Wei, Yang Hu, Shouyi Yin

Abstract: Given the increasing complexity of AI applications, traditional spatial architectures frequently fall short. Our analysis identifies a pattern of interconnected, multi-faceted tasks encompassing both AI and general computational processes. In response, we have conceptualized "Orchestrated AI Workflows," an approach that integrates various tasks with logic-driven decisions into dynamic, sophisticat… ▽ More Given the increasing complexity of AI applications, traditional spatial architectures frequently fall short. Our analysis identifies a pattern of interconnected, multi-faceted tasks encompassing both AI and general computational processes. In response, we have conceptualized "Orchestrated AI Workflows," an approach that integrates various tasks with logic-driven decisions into dynamic, sophisticated workflows. Specifically, we find that the intrinsic Dual Dynamicity of Orchestrated AI Workflows, namely dynamic execution times and frequencies of Task Blocks, can be effectively represented using the Orchestrated Workflow Graph. Furthermore, the intrinsic Dual Dynamicity poses challenges to existing spatial architecture, namely Indiscriminate Resource Allocation, Reactive Load Rebalancing, and Contagious PEA Idleness. To overcome these challenges, we present Octopus, a scale-out spatial architecture and a suite of advanced scheduling strategies optimized for executing Orchestrated AI Workflows, such as the Discriminate Dual-Scheduling Mechanism, Adaptive TBU Scheduling Strategy, and Proactive Cluster Scheduling Strategy. Our evaluations demonstrate that Octopus significantly outperforms traditional architectures in handling the dynamic demands of Orchestrated AI Workflows, and possesses robust scalability in large scale hardware such as wafer-scale chip. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.16944 [pdf]

Even- and Odd-denominator Fractional Quantum Anomalous Hall Effect in Graphene Moire Superlattices

Authors: Jian Xie, Zihao Huo, Xin Lu, Zuo Feng, Zaizhe Zhang, Wenxuan Wang, Qiu Yang, Kenji Watanabe, Takashi Taniguchi, Kaihui Liu, Zhida Song, X. C. Xie, Jianpeng Liu, Xiaobo Lu

Abstract: Fractional quantum anomalous hall effect (FQAHE), a transport effect with fractionally quantized Hall plateau emerging under zero magnetic field, provides a radically new opportunity to engineer topological quantum electronics. By construction of topological flat band with moire engineering, intrinsic FQAHE has been observed in twisted MoTe2 system and rhombohedral pentalayer graphene/hBN moire su… ▽ More Fractional quantum anomalous hall effect (FQAHE), a transport effect with fractionally quantized Hall plateau emerging under zero magnetic field, provides a radically new opportunity to engineer topological quantum electronics. By construction of topological flat band with moire engineering, intrinsic FQAHE has been observed in twisted MoTe2 system and rhombohedral pentalayer graphene/hBN moire superlattices with anomalous Hall resistivity quantization number C <= 2/3 including the gapless composite Fermi-liquid state with C = 1/2. Here we experimentally demonstrate a new system of rhombohedral hexalayer graphene (RHG)/hBN moire superlattices showing both fractional and integer quantum anomalous Hall effects when the lowest flat Chern band is fractionally and fully filled at zero magnetic field. The zero-field Hall resistance Rho_xy = h/Ce2 is quantized to values corresponding to C = 3/5, 2/3, 5/7, 3/4, 7/9 and 1 at moire filling factors v = 3/5, 2/3, 5/7, 3/4, 7/9 and 1, respectively. Particularly, the C = 3/4 FQAHE state at v = 3/4 moire filling featuring a minimum of longitudinal resistance Rho_xx and fractionally quantized Hall resistance Rho_xy = 4h/3e2, is observed for the first time under zero magnetic field. Such a state may be similar to the C = 3/4 fractional quantum hall (FQHE) state recently observed at high magnetic fields9,10 and possibly host fractional charge excitations obeying non-Abelian statistics. By tuning the electrical and magnetic fields at 0 < v < 1, we have observed a sign reversal of the Hall resistivity for v = 2/3 state, indicating a transition from quasi-electron-like excitations to quasi-hole ones. Our experiment has established RHG/hBN moire superlattices a promising platform to explore quasi-particles with fractional charge excitations and non-Abelian anyons at zero magnetic field. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16138 [pdf, ps, other]

doi 10.1103/PhysRevB.109.L180508

Anomalous isotope Effect in d-wave superconductors on the square lattice

Authors: Gan Sun, Qing-Geng Yang, Da Wang, Qiang-Hua Wang

Abstract: Isotope effect with a large coefficient $α=-\partial \ln T_c/\partial \ln M$ is usually taken as an evidence of phonon mediated superconductors in the Bardeen-Cooper-Schrieffer (BCS) theory. However, in cuprates which are now widely believed to be strong correlation induced d-wave superconductors, $α$ is experimentally observed to be quite small at optimal do**, but keeps growing up with decreas… ▽ More Isotope effect with a large coefficient $α=-\partial \ln T_c/\partial \ln M$ is usually taken as an evidence of phonon mediated superconductors in the Bardeen-Cooper-Schrieffer (BCS) theory. However, in cuprates which are now widely believed to be strong correlation induced d-wave superconductors, $α$ is experimentally observed to be quite small at optimal do**, but keeps growing up with decreasing $T_c$ upon do**, even after exceeding the BCS value $1/2$. Such an anomalous isotope effect seems to challenge the non-phonon picture and still leave room for the phonon-dominated mechanism. In this work, we show that the anomalous dependence of $α$ on $T_c$ can actually be obtained in spin fluctuation induced d-wave superconductors, by studying the Hubbard model on square lattices with functional renormalization group. We have considered two types of electron-phonon couplings (EPCs). The first type couples to electron densities, including the Holstein, breathing and buckling phonons, called Holstein-like. For all these EPCs, $α$ is negative and drops down towards $-\infty$ with decreasing $T_c$ upon do**. On the opposite, for the other type of Peierls-like EPC coupling to electron hop**s on the nearest bonds, also called Su-Schrieffer-Heeger phonon, $α$ is positive, grows up with decreasing $T_c$ and tends to diverge as $T_c\to0$, in qualitative agreement with the experiments. The difference between these two types of EPCs can be understood by their isotope effects on spin fluctuations. From this study, we conclude that the SSH phonon can explain the anomalous isotope effect in cuprates, although it is not the leading pairing mechanism. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Journal ref: Phys. Rev. B 109, L180508 (2024)

arXiv:2405.15474 [pdf, other]

Unlearning during Learning: An Efficient Federated Machine Unlearning Method

Authors: Hanlin Gu, Gongxi Zhu, Jie Zhang, Xinyuan Zhao, Yuxing Han, Lixin Fan, Qiang Yang

Abstract: In recent years, Federated Learning (FL) has garnered significant attention as a distributed machine learning paradigm. To facilitate the implementation of the right to be forgotten, the concept of federated machine unlearning (FMU) has also emerged. However, current FMU approaches often involve additional time-consuming steps and may not offer comprehensive unlearning capabilities, which renders… ▽ More In recent years, Federated Learning (FL) has garnered significant attention as a distributed machine learning paradigm. To facilitate the implementation of the right to be forgotten, the concept of federated machine unlearning (FMU) has also emerged. However, current FMU approaches often involve additional time-consuming steps and may not offer comprehensive unlearning capabilities, which renders them less practical in real FL scenarios. In this paper, we introduce FedAU, an innovative and efficient FMU framework aimed at overcoming these limitations. Specifically, FedAU incorporates a lightweight auxiliary unlearning module into the learning process and employs a straightforward linear operation to facilitate unlearning. This approach eliminates the requirement for extra time-consuming steps, rendering it well-suited for FL. Furthermore, FedAU exhibits remarkable versatility. It not only enables multiple clients to carry out unlearning tasks concurrently but also supports unlearning at various levels of granularity, including individual data samples, specific classes, and even at the client level. We conducted extensive experiments on MNIST, CIFAR10, and CIFAR100 datasets to evaluate the performance of FedAU. The results demonstrate that FedAU effectively achieves the desired unlearning effect while maintaining model accuracy. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: Accepted by IJCAI 2024

arXiv:2405.14488 [pdf, other]

MoGU: A Framework for Enhancing Safety of Open-Sourced LLMs While Preserving Their Usability

Authors: Yanrui Du, Sendong Zhao, Danyang Zhao, Ming Ma, Yuhan Chen, Liangyu Huo, Qing Yang, Dongliang Xu, Bing Qin

Abstract: Large Language Models (LLMs) are increasingly deployed in various applications. As their usage grows, concerns regarding their safety are rising, especially in maintaining harmless responses when faced with malicious instructions. Many defense strategies have been developed to enhance the safety of LLMs. However, our research finds that existing defense strategies lead LLMs to predominantly adopt… ▽ More Large Language Models (LLMs) are increasingly deployed in various applications. As their usage grows, concerns regarding their safety are rising, especially in maintaining harmless responses when faced with malicious instructions. Many defense strategies have been developed to enhance the safety of LLMs. However, our research finds that existing defense strategies lead LLMs to predominantly adopt a rejection-oriented stance, thereby diminishing the usability of their responses to benign instructions. To solve this problem, we introduce the MoGU framework, designed to enhance LLMs' safety while preserving their usability. Our MoGU framework transforms the base LLM into two variants: the usable LLM and the safe LLM, and further employs dynamic routing to balance their contribution. When encountering malicious instructions, the router will assign a higher weight to the safe LLM to ensure that responses are harmless. Conversely, for benign instructions, the router prioritizes the usable LLM, facilitating usable and helpful responses. On various open-sourced LLMs, we compare multiple defense strategies to verify the superiority of our MoGU framework. Besides, our analysis provides key insights into the effectiveness of MoGU and verifies that our designed routing mechanism can effectively balance the contribution of each variant by assigning weights. Our work released the safer Llama2, Vicuna, Falcon, Dolphin, and Baichuan2. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.14212 [pdf, other]

Federated Domain-Specific Knowledge Transfer on Large Language Models Using Synthetic Data

Authors: Haoran Li, Xinyuan Zhao, Dadi Guo, Hanlin Gu, Ziqian Zeng, Yuxing Han, Yangqiu Song, Lixin Fan, Qiang Yang

Abstract: As large language models (LLMs) demonstrate unparalleled performance and generalization ability, LLMs are widely used and integrated into various applications. When it comes to sensitive domains, as commonly described in federated learning scenarios, directly using external LLMs on private data is strictly prohibited by stringent data security and privacy regulations. For local clients, the utiliz… ▽ More As large language models (LLMs) demonstrate unparalleled performance and generalization ability, LLMs are widely used and integrated into various applications. When it comes to sensitive domains, as commonly described in federated learning scenarios, directly using external LLMs on private data is strictly prohibited by stringent data security and privacy regulations. For local clients, the utilization of LLMs to improve the domain-specific small language models (SLMs), characterized by limited computational resources and domain-specific data, has attracted considerable research attention. By observing that LLMs can empower domain-specific SLMs, existing methods predominantly concentrate on leveraging the public data or LLMs to generate more data to transfer knowledge from LLMs to SLMs. However, due to the discrepancies between LLMs' generated data and clients' domain-specific data, these methods cannot yield substantial improvements in the domain-specific tasks. In this paper, we introduce a Federated Domain-specific Knowledge Transfer (FDKT) framework, which enables domain-specific knowledge transfer from LLMs to SLMs while preserving clients' data privacy. The core insight is to leverage LLMs to augment data based on domain-specific few-shot demonstrations, which are synthesized from private domain data using differential privacy. Such synthetic samples share similar data distribution with clients' private data and allow the server LLM to generate particular knowledge to improve clients' SLMs. The extensive experimental results demonstrate that the proposed FDKT framework consistently and greatly improves SLMs' task performance by around 5\% with a privacy budget of less than 10, compared to local training on private data. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.13519 [pdf]

Multi-fidelity topology optimization of flow boiling heat transfer in microchannels

Authors: Yi Yuan, Li Chen, Qirui Yang, Lingran Gu, Wen-Quan Tao

Abstract: Topology optimization (TO) is a powerful method to design innovative structures with improved heat transfer performance. In the present study, a multi-fidelity TO method with a delicately defined objective function is developed for flow boiling heat transfer in microchannels. Low-fidelity TO is conducted for the reduced-order process of single-phase laminar convective heat transfer, which generate… ▽ More Topology optimization (TO) is a powerful method to design innovative structures with improved heat transfer performance. In the present study, a multi-fidelity TO method with a delicately defined objective function is developed for flow boiling heat transfer in microchannels. Low-fidelity TO is conducted for the reduced-order process of single-phase laminar convective heat transfer, which generates a set of structure candidates for subsequent high-fidelity evaluation of flow boiling heat transfer. To avoid the possible iteration between the low-fidelity TO and high-fidelity evaluation which leads to inefficient solution of the multi-fidelity TO, distributions of velocity, temperature and two-phase in microchannels with single-phase and/or flow boiling heat transfer are investigated and compared in detail, based on which a new objective function is delicately defined, which can be employed in the low-fidelity TO yet can stand for the performance of the high-fidelity problem. With the help of the new objective function, the efficiency of the multi-fidelity TO is significantly improved and TO structures are designed with hot spots eliminated, thermal resistance reduced and temperature uniformity improved. The present work provides a new method for TO of complicated heat and mass transfer problems. Keywords: topology optimization, flow boiling, multi-fidelity optimization, microchannels, convective heat transfer △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.13483 [pdf, other]

Distributed Indirect Source Coding with Decoder Side Information

Authors: Jiancheng Tang, Qianqian Yang, Deniz Gündüz

Abstract: This paper studies a variant of the rate-distortion problem motivated by task-oriented semantic communication and distributed learning problems, where $M$ correlated sources are independently encoded for a central decoder. The decoder has access to a correlated side information in addition to the messages received from the encoders, and aims to recover a latent random variable correlated with the… ▽ More This paper studies a variant of the rate-distortion problem motivated by task-oriented semantic communication and distributed learning problems, where $M$ correlated sources are independently encoded for a central decoder. The decoder has access to a correlated side information in addition to the messages received from the encoders, and aims to recover a latent random variable correlated with the sources observed by the encoders within a given distortion constraint rather than recovering the sources themselves. We provide bounds on the rate-distortion region for this scenario in general, and characterize the rate-distortion function exactly when the sources are conditionally independent given the side information. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.11493 [pdf, other]

Point Cloud Compression with Implicit Neural Representations: A Unified Framework

Authors: Hongning Ruan, Yulin Shao, Qianqian Yang, Liang Zhao, Dusit Niyato

Abstract: Point clouds have become increasingly vital across various applications thanks to their ability to realistically depict 3D objects and scenes. Nevertheless, effectively compressing unstructured, high-precision point cloud data remains a significant challenge. In this paper, we present a pioneering point cloud compression framework capable of handling both geometry and attribute components. Unlike… ▽ More Point clouds have become increasingly vital across various applications thanks to their ability to realistically depict 3D objects and scenes. Nevertheless, effectively compressing unstructured, high-precision point cloud data remains a significant challenge. In this paper, we present a pioneering point cloud compression framework capable of handling both geometry and attribute components. Unlike traditional approaches and existing learning-based methods, our framework utilizes two coordinate-based neural networks to implicitly represent a voxelized point cloud. The first network generates the occupancy status of a voxel, while the second network determines the attributes of an occupied voxel. To tackle an immense number of voxels within the volumetric space, we partition the space into smaller cubes and focus solely on voxels within non-empty cubes. By feeding the coordinates of these voxels into the respective networks, we reconstruct the geometry and attribute components of the original point cloud. The neural network parameters are further quantized and compressed. Experimental results underscore the superior performance of our proposed method compared to the octree-based approach employed in the latest G-PCC standards. Moreover, our method exhibits high universality when contrasted with existing learning-based techniques. △ Less

Submitted 19 May, 2024; originally announced May 2024.

Comments: 6 Pages, 6 Figures, submitted to IEEE ICCC

arXiv:2405.10762 [pdf]

Research on Credit Risk Early Warning Model of Commercial Banks Based on Neural Network Algorithm

Authors: Yu Cheng, Qin Yang, Liyang Wang, Ao Xiang, **gyu Zhang

Abstract: In the realm of globalized financial markets, commercial banks are confronted with an escalating magnitude of credit risk, thereby imposing heightened requisites upon the security of bank assets and financial stability. This study harnesses advanced neural network techniques, notably the Backpropagation (BP) neural network, to pioneer a novel model for preempting credit risk in commercial banks. T… ▽ More In the realm of globalized financial markets, commercial banks are confronted with an escalating magnitude of credit risk, thereby imposing heightened requisites upon the security of bank assets and financial stability. This study harnesses advanced neural network techniques, notably the Backpropagation (BP) neural network, to pioneer a novel model for preempting credit risk in commercial banks. The discourse initially scrutinizes conventional financial risk preemptive models, such as ARMA, ARCH, and Logistic regression models, critically analyzing their real-world applications. Subsequently, the exposition elaborates on the construction process of the BP neural network model, encompassing network architecture design, activation function selection, parameter initialization, and objective function construction. Through comparative analysis, the superiority of neural network models in preempting credit risk in commercial banks is elucidated. The experimental segment selects specific bank data, validating the model's predictive accuracy and practicality. Research findings evince that this model efficaciously enhances the foresight and precision of credit risk management. △ Less

Submitted 30 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.09853 [pdf, other]

Chiral symmetry breaking in the pseudo-quantum electrodynamics with non-Abelian four-fermion interactions

Authors: Qiao Yang, Yu-Biao Wu, Wu-Ming Liu

Abstract: In the context of 2+1 dimensional Dirac materials, we consider electromagnetic interactions alongside a type of spin-dependent Hubbard interaction. The former is described by PQED theory, while the latter corresponds to an effective theory represented by the $SU(N_c)$ Thirring model. Employing Hubbard-Stratonovich transformation and large N expansion in the model yields a non-local $SU(N_c)$ Yang-… ▽ More In the context of 2+1 dimensional Dirac materials, we consider electromagnetic interactions alongside a type of spin-dependent Hubbard interaction. The former is described by PQED theory, while the latter corresponds to an effective theory represented by the $SU(N_c)$ Thirring model. Employing Hubbard-Stratonovich transformation and large N expansion in the model yields a non-local $SU(N_c)$ Yang-Mills action. Subsequently, we solve Schwinger-Dyson equations to obtain the self-energy function of the fermion propagator, from which we determine the critical fermion flavor number $N^c_f$ and critical fine structure constant $α_c$ indicative of chiral symmetry breaking. Our findings suggest that as the non-Abelian color number $N_c$ increases, the minimum value of the critical fermion flavor number monotonically increases, while the maximum value of the critical fine structure constant decreases accordingly, rendering the system more susceptible to chiral symmetry breaking. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: 9 pages, 4 figures

arXiv:2405.09234 [pdf, other]

Enhancing Image Privacy in Semantic Communication over Wiretap Channels leveraging Differential Privacy

Authors: Weixuan Chen, Shunpu Tang, Qianqian Yang

Abstract: Semantic communication (SemCom) enhances transmission efficiency by sending only task-relevant information compared to traditional methods. However, transmitting semantic-rich data over insecure or public channels poses security and privacy risks. This paper addresses the privacy problem of transmitting images over wiretap channels and proposes a novel SemCom approach ensuring privacy through a di… ▽ More Semantic communication (SemCom) enhances transmission efficiency by sending only task-relevant information compared to traditional methods. However, transmitting semantic-rich data over insecure or public channels poses security and privacy risks. This paper addresses the privacy problem of transmitting images over wiretap channels and proposes a novel SemCom approach ensuring privacy through a differential privacy (DP)-based image protection and deprotection mechanism. The method utilizes the GAN inversion technique to extract disentangled semantic features and applies a DP mechanism to protect sensitive features within the extracted semantic information. To address the non-invertibility of DP, we introduce two neural networks to approximate the DP application and removal processes, offering a privacy protection level close to that by the original DP process. Simulation results validate the effectiveness of our method in preventing eavesdroppers from obtaining sensitive information while maintaining high-fidelity image reconstruction at the legitimate receiver. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.07639 [pdf]

Unveiling the Magmatic Architecture Beneath Oceanus Procellarum: Insights from GRAIL Mission Data

Authors: Meixia Geng, Qingjie Yang, Chaouki Kasmi, J. Kim Welford, Alexander L. Peace

Abstract: The Oceanus Procellarum region, characterized by its vast basaltic plains and pronounced volcanic activity, serves as a focal point for understanding the volcanic history of the Moon. Leveraging the Gravity Recovery and Interior Laboratory (GRAIL) mission data, we imaged the magmatic structures beneath the Oceanus Procellarum region. Our 3D density models uncover pronounced linear magmatic structu… ▽ More The Oceanus Procellarum region, characterized by its vast basaltic plains and pronounced volcanic activity, serves as a focal point for understanding the volcanic history of the Moon. Leveraging the Gravity Recovery and Interior Laboratory (GRAIL) mission data, we imaged the magmatic structures beneath the Oceanus Procellarum region. Our 3D density models uncover pronounced linear magmatic structures along the Procellarum's western border and significant intrusions within the northern and southern Marius Hills. Crucially, they reveal three narrow near-horizontal sheeted magmatic structures, 80-150 km long, extending from near-surface to 6- 7 km depth, which we identified as sill-like magmatic conduits. These magmatic conduits connect the Marius Hills' northern and southern intrusions and bridge them with the Procellarum's western border structures. These discoveries suggest that sill-like magmatic conduits likely serve as central pathways facilitating magma transport across various volcanic systems and furthermore indicate widespread magmatic connectivity beneath the Oceanus Procellarum. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 30 pages, 6 figures, and 1 table

Showing 1–50 of 1,410 results for author: Yang, Q