Search | arXiv e-print repository

I Still See You: Why Existing IoT Traffic Resha** Fails

Authors: Su Wang, Keyang Yu, Qi Li, Dong Chen

Abstract: The Internet traffic data produced by the Internet of Things (IoT) devices are collected by Internet Service Providers (ISPs) and device manufacturers, and often shared with their third parties to maintain and enhance user services. Unfortunately, on-path adversaries could infer and fingerprint users' sensitive privacy information such as occupancy and user activities by analyzing these network tr… ▽ More The Internet traffic data produced by the Internet of Things (IoT) devices are collected by Internet Service Providers (ISPs) and device manufacturers, and often shared with their third parties to maintain and enhance user services. Unfortunately, on-path adversaries could infer and fingerprint users' sensitive privacy information such as occupancy and user activities by analyzing these network traffic traces. While there's a growing body of literature on defending against this side-channel attack-malicious IoT traffic analytics (TA), there's currently no systematic method to compare and evaluate the comprehensiveness of these existing studies. To address this problem, we design a new low-cost, open-source system framework-IoT Traffic Exposure Monitoring Toolkit (ITEMTK) that enables people to comprehensively examine and validate prior attack models and their defending approaches. In particular, we also design a novel image-based attack capable of inferring sensitive user information, even when users employ the most robust preventative measures in their smart homes. Researchers could leverage our new image-based attack to systematize and understand the existing literature on IoT traffic analysis attacks and preventing studies. Our results show that current defending approaches are not sufficient to protect IoT device user privacy. IoT devices are significantly vulnerable to our new image-based user privacy inference attacks, posing a grave threat to IoT device user privacy. We also highlight potential future improvements to enhance the defending approaches. ITEMTK's flexibility allows other researchers for easy expansion by integrating new TA attack models and prevention methods to benchmark their future work. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: EWSN'24 paper accepted, to appear

arXiv:2406.10185 [pdf, other]

Detecting and Evaluating Medical Hallucinations in Large Vision Language Models

Authors: Jiawei Chen, Dingkang Yang, Tong Wu, Yue Jiang, Xiaolu Hou, Mingcheng Li, Shunli Wang, Dongling Xiao, Ke Li, Lihua Zhang

Abstract: Large Vision Language Models (LVLMs) are increasingly integral to healthcare applications, including medical visual question answering and imaging report generation. While these models inherit the robust capabilities of foundational Large Language Models (LLMs), they also inherit susceptibility to hallucinations-a significant concern in high-stakes medical contexts where the margin for error is mi… ▽ More Large Vision Language Models (LVLMs) are increasingly integral to healthcare applications, including medical visual question answering and imaging report generation. While these models inherit the robust capabilities of foundational Large Language Models (LLMs), they also inherit susceptibility to hallucinations-a significant concern in high-stakes medical contexts where the margin for error is minimal. However, currently, there are no dedicated methods or benchmarks for hallucination detection and evaluation in the medical field. To bridge this gap, we introduce Med-HallMark, the first benchmark specifically designed for hallucination detection and evaluation within the medical multimodal domain. This benchmark provides multi-tasking hallucination support, multifaceted hallucination data, and hierarchical hallucination categorization. Furthermore, we propose the MediHall Score, a new medical evaluative metric designed to assess LVLMs' hallucinations through a hierarchical scoring system that considers the severity and type of hallucination, thereby enabling a granular assessment of potential clinical impacts. We also present MediHallDetector, a novel Medical LVLM engineered for precise hallucination detection, which employs multitask training for hallucination detection. Through extensive experimental evaluations, we establish baselines for popular LVLMs using our benchmark. The findings indicate that MediHall Score provides a more nuanced understanding of hallucination impacts compared to traditional metrics and demonstrate the enhanced performance of MediHallDetector. We hope this work can significantly improve the reliability of LVLMs in medical applications. All resources of this work will be released soon. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.10146 [pdf]

Multimodal Radiomics Model for Predicting Gold Nanoparticles Accumulation in Mouse Tumors

Authors: Jiajia Tang, Jie Zhang, Jiulou Zhang, Yuxia Tang, Hao Ni, Shouju Wang

Abstract: Background: Nanoparticles can accumulate in solid tumors, serving as diagnostic or therapeutic agents for cancer. Clinical translation is challenging due to low accumulation in tumors and heterogeneity between tumor types and individuals. Tools to identify this heterogeneity and predict nanoparticle accumulation are needed. Advanced imaging techniques combined with radiomics and AI may offer a sol… ▽ More Background: Nanoparticles can accumulate in solid tumors, serving as diagnostic or therapeutic agents for cancer. Clinical translation is challenging due to low accumulation in tumors and heterogeneity between tumor types and individuals. Tools to identify this heterogeneity and predict nanoparticle accumulation are needed. Advanced imaging techniques combined with radiomics and AI may offer a solution. Methods: 183 mice were used to create seven subcutaneous tumor models, with three sizes (15nm, 40nm, 70nm) of gold nanoparticles injected via the tail vein. Accumulation was measured using ICP-OES. Data were divided into training and test sets (7:3). Tumors were categorized into high and low uptake groups based on the median value of the training set. Before injection, multimodal imaging data (CT, B-mode ultrasound, SWE, CEUS) were acquired, and radiomics features extracted. LASSO and RFE algorithms built a radiomics signature. This, along with tumor type and mean values from CT and SWE, constructed the best model using SVM. For each tumor in the test set, the radiomics signature predicted gold nanoparticle uptake. Model performance was evaluated by AUC. Results: Significant variability in gold nanoparticle accumulation was observed among tumors (P < 0.001). The median accumulation in the training set was 3.37% ID/g. Nanoparticle size was not a main determinant of uptake (P > 0.05). The composite model based on radiomics signature outperformed the basic model in both training (AUC 0.93 vs. 0.68) and testing (0.78 vs. 0.61) datasets. Conclusion: The composite model identifies tumor heterogeneity and predicts high uptake of gold nanoparticles, improving patient stratification and supporting nanomedicine's clinical application. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.09836 [pdf, other]

Robustness-Inspired Defense Against Backdoor Attacks on Graph Neural Networks

Authors: Zhiwei Zhang, Minhua Lin, Junjie Xu, Zongyu Wu, Enyan Dai, Suhang Wang

Abstract: Graph Neural Networks (GNNs) have achieved promising results in tasks such as node classification and graph classification. However, recent studies reveal that GNNs are vulnerable to backdoor attacks, posing a significant threat to their real-world adoption. Despite initial efforts to defend against specific graph backdoor attacks, there is no work on defending against various types of backdoor at… ▽ More Graph Neural Networks (GNNs) have achieved promising results in tasks such as node classification and graph classification. However, recent studies reveal that GNNs are vulnerable to backdoor attacks, posing a significant threat to their real-world adoption. Despite initial efforts to defend against specific graph backdoor attacks, there is no work on defending against various types of backdoor attacks where generated triggers have different properties. Hence, we first empirically verify that prediction variance under edge drop** is a crucial indicator for identifying poisoned nodes. With this observation, we propose using random edge drop** to detect backdoors and theoretically show that it can efficiently distinguish poisoned nodes from clean ones. Furthermore, we introduce a novel robust training strategy to efficiently counteract the impact of the triggers. Extensive experiments on real-world datasets show that our framework can effectively identify poisoned nodes, significantly degrade the attack success rate, and maintain clean accuracy when defending against various types of graph backdoor attacks with different properties. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.09813 [pdf, other]

Diffuse X-ray Explorer: a high-resolution X-ray spectroscopic sky surveyor on the China Space Station

Authors: Hai **, Junjie Mao, Liubiao Chen, Naihui Chen, Wei Cui, Bo Gao, **** Li, Xinfeng Li, Jiejia Liu, Jia Quan, Chunyang Jiang, Guole Wang, Le Wang, Qian Wang, Sifan Wang, Aimin Xiao, Shuo Zhang

Abstract: DIffuse X-ray Explorer (DIXE) is a proposed high-resolution X-ray spectroscopic sky surveyor on the China Space Station (CSS). DIXE will focus on studying hot baryons in the Milky Way. Galactic hot baryons like the X-ray emitting Milky Way halo and eROSITA bubbles are best observed in the sky survey mode with a large field of view. DIXE will take advantage of the orbital motion of the CSS to scan… ▽ More DIffuse X-ray Explorer (DIXE) is a proposed high-resolution X-ray spectroscopic sky surveyor on the China Space Station (CSS). DIXE will focus on studying hot baryons in the Milky Way. Galactic hot baryons like the X-ray emitting Milky Way halo and eROSITA bubbles are best observed in the sky survey mode with a large field of view. DIXE will take advantage of the orbital motion of the CSS to scan a large fraction of the sky. High-resolution X-ray spectroscopy, enabled by superconducting microcalorimeters based on the transition-edge sensor (TES) technology, will probe the physical properties (e.g., temperature, density, elemental abundances, kinematics) of the Galactic hot baryons. This will complement the high-resolution imaging data obtained with the eROSITA mission. Here we present the preliminary design of DIXE. The payload consists mainly of a detector assembly and a cryogenic cooling system. The key components of the detector assembly are a microcalorimeter array and frequency-domain multiplexing readout electronics. To provide a working temperature for the detector assembly, the cooling system consists of an adiabatic demagnetization refrigerator and a mechanical cryocooler system. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 12 pages, 6 figures, the full version is published by Journal of Low Temperature Physics

arXiv:2406.09678 [pdf, ps, other]

Convergence rate of nonlinear delayed McKean-Vlasov SDEs driven by fractional Brownian motions

Authors: Shengrong Wang, Jie Xie, Li Tan

Abstract: In this paper, our main aim is to investigate the strong convergence for a McKean-Vlasov stochastic differential equation with super-linear delay driven by fractional Brownian motion with Hurst exponent $H\in(1/2, 1)$. After giving uniqueness and existence for the exact solution, we analyze the properties including boundedness of moment and propagation of chaos. Besides, we give the Euler-Maruyama… ▽ More In this paper, our main aim is to investigate the strong convergence for a McKean-Vlasov stochastic differential equation with super-linear delay driven by fractional Brownian motion with Hurst exponent $H\in(1/2, 1)$. After giving uniqueness and existence for the exact solution, we analyze the properties including boundedness of moment and propagation of chaos. Besides, we give the Euler-Maruyama (EM) scheme and show that the numerical solution converges strongly to the exact solution. Furthermore, a corresponding numerical example is given to illustrate the theory. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.09475 [pdf, other]

Search for $X(1870)$ via the decay $J/ψ\to ωK^+ K^-η$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (644 additional authors not shown)

Abstract: Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the… ▽ More Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the $90\%$ confidence level. In addition, the branching faction $B(J/ψ\toωK^+ K^- η)$ is measured to be $(3.33\pm0.02(\rm{stat.})\pm 0.12(\rm{syst.}))\times 10^{-4}$. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.09408 [pdf, other]

Data Attribution for Text-to-Image Models by Unlearning Synthesized Images

Authors: Sheng-Yu Wang, Aaron Hertzmann, Alexei A. Efros, Jun-Yan Zhu, Richard Zhang

Abstract: The goal of data attribution for text-to-image models is to identify the training images that most influence the generation of a new image. We can define "influence" by saying that, for a given output, if a model is retrained from scratch without that output's most influential images, the model should then fail to generate that output image. Unfortunately, directly searching for these influential… ▽ More The goal of data attribution for text-to-image models is to identify the training images that most influence the generation of a new image. We can define "influence" by saying that, for a given output, if a model is retrained from scratch without that output's most influential images, the model should then fail to generate that output image. Unfortunately, directly searching for these influential images is computationally infeasible, since it would require repeatedly retraining from scratch. We propose a new approach that efficiently identifies highly-influential images. Specifically, we simulate unlearning the synthesized image, proposing a method to increase the training loss on the output image, without catastrophic forgetting of other, unrelated concepts. Then, we find training images that are forgotten by proxy, identifying ones with significant loss deviations after the unlearning process, and label these as influential. We evaluate our method with a computationally intensive but "gold-standard" retraining from scratch and demonstrate our method's advantages over previous methods. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Project page: https://peterwang512.github.io/AttributeByUnlearning Code: https://github.com/PeterWang512/AttributeByUnlearning

arXiv:2406.09326 [pdf, other]

PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance

Authors: Qijun Gan, Song Wang, Shengtao Wu, Jianke Zhu

Abstract: Recently, artificial intelligence techniques for education have been received increasing attentions, while it still remains an open problem to design the effective music instrument instructing systems. Although key presses can be directly derived from sheet music, the transitional movements among key presses require more extensive guidance in piano performance. In this work, we construct a piano-h… ▽ More Recently, artificial intelligence techniques for education have been received increasing attentions, while it still remains an open problem to design the effective music instrument instructing systems. Although key presses can be directly derived from sheet music, the transitional movements among key presses require more extensive guidance in piano performance. In this work, we construct a piano-hand motion generation benchmark to guide hand movements and fingerings for piano playing. To this end, we collect an annotated dataset, PianoMotion10M, consisting of 116 hours of piano playing videos from a bird's-eye view with 10 million annotated hand poses. We also introduce a powerful baseline model that generates hand motions from piano audios through a position predictor and a position-guided gesture generator. Furthermore, a series of evaluation metrics are designed to assess the performance of the baseline model, including motion similarity, smoothness, positional accuracy of left and right hands, and overall fidelity of movement distribution. Despite that piano key presses with respect to music scores or audios are already accessible, PianoMotion10M aims to provide guidance on piano fingering for instruction purposes. The dataset and source code can be accessed at https://agnjason.github.io/PianoMotion-page. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Codes and Dataset: https://agnjason.github.io/PianoMotion-page

arXiv:2406.09317 [pdf, other]

Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, **ming Guo, Xiaolin Chen, **gcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources, encompassing a diverse range of diseases across multiple ethnicities and countries. RetiZero exhibits superior performance in several downstream tasks, including zero-shot disease recognition, image-to-image retrieval, and internal- and cross-domain disease identification. In zero-shot scenarios, RetiZero achieves Top5 accuracy scores of 0.8430 for 15 fundus diseases and 0.7561 for 52 fundus diseases. For image retrieval, it achieves Top5 scores of 0.9500 and 0.8860 for the same disease sets, respectively. Clinical evaluations show that RetiZero's Top3 zero-shot performance surpasses the average of 19 ophthalmologists from Singapore, China and the United States. Furthermore, RetiZero significantly enhances clinicians' accuracy in diagnosing fundus disease. These findings underscore the value of integrating the RetiZero foundation model into clinical settings, where a variety of fundus diseases are encountered. △ Less

Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.09304 [pdf]

Self-reconfigurable Multifunctional Memristive Nociceptor for Intelligent Robotics

Authors: Shengbo Wang, Mingchao Fang, Lekai Song, Cong Li, Jian Zhang, Arokia Nathan, Guohua Hu, Shuo Gao

Abstract: Artificial nociceptors, mimicking human-like stimuli perception, are of significance for intelligent robotics to work in hazardous and dynamic scenarios. One of the most essential characteristics of the human nociceptor is its self-adjustable attribute, which indicates that the threshold of determination of a potentially hazardous stimulus relies on environmental knowledge. This critical attribute… ▽ More Artificial nociceptors, mimicking human-like stimuli perception, are of significance for intelligent robotics to work in hazardous and dynamic scenarios. One of the most essential characteristics of the human nociceptor is its self-adjustable attribute, which indicates that the threshold of determination of a potentially hazardous stimulus relies on environmental knowledge. This critical attribute has been currently omitted, but it is highly desired for artificial nociceptors. Inspired by these shortcomings, this article presents, for the first time, a Self-Directed Channel (SDC) memristor-based self-reconfigurable nociceptor, capable of perceiving hazardous pressure stimuli under different temperatures and demonstrates key features of tactile nociceptors, including 'threshold,' 'no-adaptation,' and 'sensitization.' The maximum amplification of hazardous external stimuli is 1000%, and its response characteristics dynamically adapt to current temperature conditions by automatically altering the generated modulation schemes for the memristor. The maximum difference ratio of the response of memristors at different temperatures is 500%, and this adaptability closely mimics the functions of biological tactile nociceptors, resulting in accurate danger perception in various conditions. Beyond temperature adaptation, this memristor-based nociceptor has the potential to integrate different sensory modalities by applying various sensors, thereby achieving human-like perception capabilities in real-world environments. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 14 pages, 4 figures

arXiv:2406.09064 [pdf, other]

The nature of the accretion physics in quiescent black hole system LB-1

Authors: Tong Su, Erlin Qiao, Song Wang

Abstract: LB-1 is a binary system that has drawn great attention since its discovery in 2019. The nature of the two components of LB-1 is not very clear, which however is suggested very possibly to be a B-type star plus a black hole (BH). In this paper, we first calculate the wind mass-loss rate of the B-type star. We then calculate the mass capture rate by the BH, with which as the initial mass accretion r… ▽ More LB-1 is a binary system that has drawn great attention since its discovery in 2019. The nature of the two components of LB-1 is not very clear, which however is suggested very possibly to be a B-type star plus a black hole (BH). In this paper, we first calculate the wind mass-loss rate of the B-type star. We then calculate the mass capture rate by the BH, with which as the initial mass accretion rate, we calculate the truncation radius of the accretion disk and the corresponding emergent spectra of the accretion flow (comprising an inner advection-dominated accretion flow (ADAF) + an outer truncated accretion disk) within the framework of the disk evaporation model. It is found that the predicted truncation radius of the accretion disk with appropriate model parameters is consistent with observations inferred from the observed broad H$_α$ emission line. The predicted X-ray luminosity is definitely below the estimated upper limits with the sensitivity of Chandra X-ray Observatory of the X-ray luminosity $\sim 2\times 10^{31}$ erg/s. Finally, we argue that if the disk evaporation model indeed reflects the intrinsic physics of the accretion flow, the value of the viscosity parameter $α$ is constrained to be $α\gtrsim 0.05$ (with BH mass being $68M_{\rm \odot}$), or $α\gtrsim 0.003$ (with BH mass being $21M_{\rm \odot}$) to match the observed upper limit of the X-ray luminosity of LB-1. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 11 pages, 2 figures. Submitted to The Astrophysical Journal, comments are welcome

arXiv:2406.09044 [pdf, other]

MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning

Authors: Hanqing Wang, Zeguan Xiao, Yixia Li, Shuo Wang, Guanhua Chen, Yun Chen

Abstract: Efficient finetuning of large language models (LLMs) aims to adapt the LLMs with reduced computation and memory cost. Previous LoRA-based approaches initialize the low-rank matrices with gaussian distribution and zero values, while kee** the original weight matrices frozen. However, the trainable model parameters optimized in an unguided subspace might have interference with the well-learned sub… ▽ More Efficient finetuning of large language models (LLMs) aims to adapt the LLMs with reduced computation and memory cost. Previous LoRA-based approaches initialize the low-rank matrices with gaussian distribution and zero values, while kee** the original weight matrices frozen. However, the trainable model parameters optimized in an unguided subspace might have interference with the well-learned subspace of the pretrained weight matrix. In this paper, we propose MiLoRA, a simple yet effective LLM finetuning approach that only updates the minor singular components of the weight matrix while kee** the principle singular components frozen. It is observed that the minor matrix corresponds to the noisy or long-tail information, while the principle matrix contains important knowledge. The MiLoRA initializes the low-rank matrices within a subspace that is orthogonal to the principle matrix, thus the pretrained knowledge is expected to be well preserved. During finetuning, MiLoRA makes the most use of the less-optimized subspace for learning the finetuning dataset. Extensive experiments on commonsense reasoning, math reasoning and instruction following benchmarks present the superior performance of our method. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08903 [pdf, other]

Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models

Authors: Bowen **, Shuo Wang, Hanqing Wang, Xu Han, Yuzhuang Xu, Yukun Yan, Yun Chen, Baobao Chang, Zhiyuan Liu, Maosong Sun

Abstract: Fine-tuning is a crucial process for adapting large language models (LLMs) to diverse applications. In certain scenarios, such as multi-tenant serving, deploying multiple LLMs becomes necessary to meet complex demands. Recent studies suggest decomposing a fine-tuned LLM into a base model and corresponding delta weights, which are then compressed using low-rank or low-bit approaches to reduce costs… ▽ More Fine-tuning is a crucial process for adapting large language models (LLMs) to diverse applications. In certain scenarios, such as multi-tenant serving, deploying multiple LLMs becomes necessary to meet complex demands. Recent studies suggest decomposing a fine-tuned LLM into a base model and corresponding delta weights, which are then compressed using low-rank or low-bit approaches to reduce costs. In this work, we observe that existing low-rank and low-bit compression methods can significantly harm the model performance for task-specific fine-tuned LLMs (e.g., WizardMath for math problems). Motivated by the long-tail distribution of singular values in the delta weights, we propose a delta quantization approach using mixed-precision. This method employs higher-bit representation for singular vectors corresponding to larger singular values. We evaluate our approach on various fine-tuned LLMs, including math LLMs, code LLMs, chat LLMs, and even VLMs. Experimental results demonstrate that our approach performs comparably to full fine-tuned LLMs, surpassing both low-rank and low-bit baselines by a considerable margin. Additionally, we show that our method is compatible with various backbone LLMs, such as Llama-2, Llama-3, and Mistral, highlighting its generalizability. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 12 pages

arXiv:2406.08835 [pdf, other]

A Single-Step Non-Autoregressive Automatic Speech Recognition Architecture with High Accuracy and Inference Speed

Authors: Ziyang Zhuang, Chenfeng Miao, Kun Zou, Shuai Gong, Ming Fang, Tao Wei, Zijian Li, Wei Hu, Shaojun Wang, **g Xiao

Abstract: Non-autoregressive (NAR) automatic speech recognition (ASR) models predict tokens independently and simultaneously, bringing high inference speed. However, there is still a gap in the accuracy of the NAR models compared to the autoregressive (AR) models. To further narrow the gap between the NAR and AR models, we propose a single-step NAR ASR architecture with high accuracy and inference speed, ca… ▽ More Non-autoregressive (NAR) automatic speech recognition (ASR) models predict tokens independently and simultaneously, bringing high inference speed. However, there is still a gap in the accuracy of the NAR models compared to the autoregressive (AR) models. To further narrow the gap between the NAR and AR models, we propose a single-step NAR ASR architecture with high accuracy and inference speed, called EfficientASR. It uses an Index Map** Vector (IMV) based alignment generator to generate alignments during training, and an alignment predictor to learn the alignments for inference. It can be trained end-to-end (E2E) with cross-entropy loss combined with alignment loss. The proposed EfficientASR achieves competitive results on the AISHELL-1 and AISHELL-2 benchmarks compared to the state-of-the-art (SOTA) models. Specifically, it achieves character error rates (CER) of 4.26%/4.62% on the AISHELL-1 dev/test dataset, which outperforms the SOTA AR Conformer with about 30x inference speedup. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08641 [pdf, ps, other]

ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets

Authors: Jiatong Shi, Shih-Heng Wang, William Chen, Martijn Bartelds, Vanya Bannihatti Kumar, **chuan Tian, Xuankai Chang, Dan Jurafsky, Karen Livescu, Hung-yi Lee, Shinji Watanabe

Abstract: ML-SUPERB evaluates self-supervised learning (SSL) models on the tasks of language identification and automatic speech recognition (ASR). This benchmark treats the models as feature extractors and uses a single shallow downstream model, which can be fine-tuned for a downstream task. However, real-world use cases may require different configurations. This paper presents ML-SUPERB~2.0, which is a ne… ▽ More ML-SUPERB evaluates self-supervised learning (SSL) models on the tasks of language identification and automatic speech recognition (ASR). This benchmark treats the models as feature extractors and uses a single shallow downstream model, which can be fine-tuned for a downstream task. However, real-world use cases may require different configurations. This paper presents ML-SUPERB~2.0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models across downstream models, fine-tuning setups, and efficient model adaptation approaches. We find performance improvements over the setup of ML-SUPERB. However, performance depends on the downstream model design. Also, we find large performance differences between languages and datasets, suggesting the need for more targeted approaches to improve multilingual ASR performance. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech 2024

arXiv:2406.08377 [pdf, other]

DDR: Exploiting Deep Degradation Response as Flexible Image Descriptor

Authors: Juncheng Wu, Zhangkai Ni, Hanli Wang, Wenhan Yang, Yuyin Zhou, Shiqi Wang

Abstract: Image deep features extracted by pre-trained networks are known to contain rich and informative representations. In this paper, we present Deep Degradation Response (DDR), a method to quantify changes in image deep features under varying degradation conditions. Specifically, our approach facilitates flexible and adaptive degradation, enabling the controlled synthesis of image degradation through t… ▽ More Image deep features extracted by pre-trained networks are known to contain rich and informative representations. In this paper, we present Deep Degradation Response (DDR), a method to quantify changes in image deep features under varying degradation conditions. Specifically, our approach facilitates flexible and adaptive degradation, enabling the controlled synthesis of image degradation through text-driven prompts. Extensive evaluations demonstrate the versatility of DDR as an image descriptor, with strong correlations observed with key image attributes such as complexity, colorfulness, sharpness, and overall quality. Moreover, we demonstrate the efficacy of DDR across a spectrum of applications. It excels as a blind image quality assessment metric, outperforming existing methodologies across multiple datasets. Additionally, DDR serves as an effective unsupervised learning objective in image restoration tasks, yielding notable advancements in image deblurring and single-image super-resolution. Our code will be made available. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08343 [pdf, other]

Continuous-Time Digital Twin with Analogue Memristive Neural Ordinary Differential Equation Solver

Authors: Hegan Chen, Jichang Yang, Jia Chen, Songqi Wang, Shaocong Wang, Dingchen Wang, Xinyu Tian, Yifei Yu, Xi Chen, Yinan Lin, Yangu He, Xiaoshan Wu, Yi Li, Xinyuan Zhang, Ning Lin, Meng Xu, Yi Li, Xumeng Zhang, Zhongrui Wang, Han Wang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu

Abstract: Digital twins, the cornerstone of Industry 4.0, replicate real-world entities through computer models, revolutionising fields such as manufacturing management and industrial automation. Recent advances in machine learning provide data-driven methods for develo** digital twins using discrete-time data and finite-depth models on digital computers. However, this approach fails to capture the underl… ▽ More Digital twins, the cornerstone of Industry 4.0, replicate real-world entities through computer models, revolutionising fields such as manufacturing management and industrial automation. Recent advances in machine learning provide data-driven methods for develo** digital twins using discrete-time data and finite-depth models on digital computers. However, this approach fails to capture the underlying continuous dynamics and struggles with modelling complex system behaviour. Additionally, the architecture of digital computers, with separate storage and processing units, necessitates frequent data transfers and Analogue-Digital (A/D) conversion, thereby significantly increasing both time and energy costs. Here, we introduce a memristive neural ordinary differential equation (ODE) solver for digital twins, which is capable of capturing continuous-time dynamics and facilitates the modelling of complex systems using an infinite-depth model. By integrating storage and computation within analogue memristor arrays, we circumvent the von Neumann bottleneck, thus enhancing both speed and energy efficiency. We experimentally validate our approach by develo** a digital twin of the HP memristor, which accurately extrapolates its nonlinear dynamics, achieving a 4.2-fold projected speedup and a 41.4-fold projected decrease in energy consumption compared to state-of-the-art digital hardware, while maintaining an acceptable error margin. Additionally, we demonstrate scalability through experimentally grounded simulations of Lorenz96 dynamics, exhibiting projected performance improvements of 12.6-fold in speed and 189.7-fold in energy efficiency relative to traditional digital approaches. By harnessing the capabilities of fully analogue computing, our breakthrough accelerates the development of digital twins, offering an efficient and rapid solution to meet the demands of Industry 4.0. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 14 pages, 4 figures

arXiv:2406.08229 [pdf, other]

doi 10.1145/3626772.3657720

GPT4Rec: Graph Prompt Tuning for Streaming Recommendation

Authors: Peiyan Zhang, Yuchen Yan, Xi Zhang, Liying Kang, Chaozhuo Li, Feiran Huang, Senzhang Wang, Sunghun Kim

Abstract: In the realm of personalized recommender systems, the challenge of adapting to evolving user preferences and the continuous influx of new users and items is paramount. Conventional models, typically reliant on a static training-test approach, struggle to keep pace with these dynamic demands. Streaming recommendation, particularly through continual graph learning, has emerged as a novel solution. H… ▽ More In the realm of personalized recommender systems, the challenge of adapting to evolving user preferences and the continuous influx of new users and items is paramount. Conventional models, typically reliant on a static training-test approach, struggle to keep pace with these dynamic demands. Streaming recommendation, particularly through continual graph learning, has emerged as a novel solution. However, existing methods in this area either rely on historical data replay, which is increasingly impractical due to stringent data privacy regulations; or are inability to effectively address the over-stability issue; or depend on model-isolation and expansion strategies. To tackle these difficulties, we present GPT4Rec, a Graph Prompt Tuning method for streaming Recommendation. Given the evolving user-item interaction graph, GPT4Rec first disentangles the graph patterns into multiple views. After isolating specific interaction patterns and relationships in different views, GPT4Rec utilizes lightweight graph prompts to efficiently guide the model across varying interaction patterns within the user-item graph. Firstly, node-level prompts are employed to instruct the model to adapt to changes in the attributes or properties of individual nodes within the graph. Secondly, structure-level prompts guide the model in adapting to broader patterns of connectivity and relationships within the graph. Finally, view-level prompts are innovatively designed to facilitate the aggregation of information from multiple disentangled views. These prompt designs allow GPT4Rec to synthesize a comprehensive understanding of the graph, ensuring that all vital aspects of the user-item interactions are considered and effectively integrated. Experiments on four diverse real-world datasets demonstrate the effectiveness and efficiency of our proposal. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: Accepted by SIGIR 2024. arXiv admin note: text overlap with arXiv:2303.11700 by other authors

ACM Class: H.3.3

arXiv:2406.08225 [pdf, ps, other]

Observation of $η_{c}$(1S, 2S) and $χ_{cJ}$ decays to 2$(π^{+}π^{-})η$ via $ψ$(3686) radiative transitions

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (636 additional authors not shown)

Abstract: Based on $2.7 \times 10^9~ψ(3686)$ decays collected with the BESIII detector, the radiative decay $ψ(3686)\to\gamma2(π^{+}π^{-})η$ is investigated to measure properties of S- and P-wave charmonium states. The branching fraction of the decay $η_{c}(1S) \to 2(π^{+}π^{-})η$, which is found to have a strong dependence on the interference pattern between $η_c(1S)$ and non-$η_c(1S)$ processes, is measur… ▽ More Based on $2.7 \times 10^9~ψ(3686)$ decays collected with the BESIII detector, the radiative decay $ψ(3686)\to\gamma2(π^{+}π^{-})η$ is investigated to measure properties of S- and P-wave charmonium states. The branching fraction of the decay $η_{c}(1S) \to 2(π^{+}π^{-})η$, which is found to have a strong dependence on the interference pattern between $η_c(1S)$ and non-$η_c(1S)$ processes, is measured in both destructive and constructive interference scenarios for the first time. The mass and width of the $η_{c}(1S)$ are measured to be $M=(2984.14 \pm 0.13 \pm 0.38)$ MeV/$c^{2}$ and $Γ=(28.82 \pm 0.11 \pm 0.82)$ MeV, respectively. Clear signals for the decays of the $χ_{cJ}(J=0,1,2)$ and the $η_{c}(2S)$ to $2(π^{+}π^{-})η$ are also observed for the first time, and the corresponding branching fractions are measured. The ratio of the branching fractions between the $η_{c}(2S)$ and $η_{c}(1S)$ decays is significantly lower than the theoretical prediction, which might suggest different dynamics in their decays. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08224 [pdf, ps, other]

Toughness and spectral radius in graphs

Authors: Sufang Wang, Wei Zhang

Abstract: Let $t$ be a positive integer, and let $G$ be a connected graph of order $n$ with $n\geq t+2$. A graph $G$ is said to be $\frac{1}{t}$-tough if $|S|\geq\frac{1}{t}c(G-S)$ for every subset $S$ of $V(G)$ with $c(G-S)\geq2$, where $c(G-S)$ is the number of connected components in $G-S$. The adjacency matrix of $G$ is denoted by $A(G)$. Let $λ_1(G)\geqλ_2(G)\geq\dots\geqλ_n(G)$ be the eigenvalues of… ▽ More Let $t$ be a positive integer, and let $G$ be a connected graph of order $n$ with $n\geq t+2$. A graph $G$ is said to be $\frac{1}{t}$-tough if $|S|\geq\frac{1}{t}c(G-S)$ for every subset $S$ of $V(G)$ with $c(G-S)\geq2$, where $c(G-S)$ is the number of connected components in $G-S$. The adjacency matrix of $G$ is denoted by $A(G)$. Let $λ_1(G)\geqλ_2(G)\geq\dots\geqλ_n(G)$ be the eigenvalues of $A(G)$. In particular, the eigenvalue $λ_1(G)$ is called the spectral radius of $G$. In this paper, we prove that $G$ is a $\frac{1}{t}$-tough graph unless $G=K_1\vee(K_{n-t-1}\cup tK_1)$ if $λ_1(G)\geqη(t,n)$, where $η(t,n)$ is the largest root of $x^{3}-(n-t-2)x^{2}-(n-1)x+t(n-t-2)=0$. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 7 pages

MSC Class: 05C50; 90B99

arXiv:2406.08153 [pdf, ps, other]

Optimal control of quantum system in fermion fields: Pontryagin-type maximum principle(I)

Authors: Penghui Wang, Shan Wang

Abstract: In this paper, the Pontryagin-type maximum principle for optimal control of quantum stochastic systems in fermion fields is obtained. These systems have gained significant prominence in numerous quantum applications ranging from physical chemistry to multi-dimensional nuclear magnetic resonance experiments. Furthermore, we establish the existence and uniqueness of solutions to backward quantum sto… ▽ More In this paper, the Pontryagin-type maximum principle for optimal control of quantum stochastic systems in fermion fields is obtained. These systems have gained significant prominence in numerous quantum applications ranging from physical chemistry to multi-dimensional nuclear magnetic resonance experiments. Furthermore, we establish the existence and uniqueness of solutions to backward quantum stochastic differential equations driven by fermion Brownian motion. The application of noncommutative martingale inequalities and the martingale representation theorem enables this achievement. △ Less

Submitted 12 June, 2024; originally announced June 2024.

MSC Class: 47C15; 49K27; 81S25; 81Q93; 81V74

arXiv:2406.08081 [pdf]

CLDTA: Contrastive Learning based on Diagonal Transformer Autoencoder for Cross-Dataset EEG Emotion Recognition

Authors: Yuan Liao, Yuhong Zhang, Shenghuan Wang, Xiruo Zhang, Yiling Zhang, Wei Chen, Yuzhe Gu, Liya Huang

Abstract: Recent advances in non-invasive EEG technology have broadened its application in emotion recognition, yielding a multitude of related datasets. Yet, deep learning models struggle to generalize across these datasets due to variations in acquisition equipment and emotional stimulus materials. To address the pressing need for a universal model that fluidly accommodates diverse EEG dataset formats and… ▽ More Recent advances in non-invasive EEG technology have broadened its application in emotion recognition, yielding a multitude of related datasets. Yet, deep learning models struggle to generalize across these datasets due to variations in acquisition equipment and emotional stimulus materials. To address the pressing need for a universal model that fluidly accommodates diverse EEG dataset formats and bridges the gap between laboratory and real-world data, we introduce a novel deep learning framework: the Contrastive Learning based Diagonal Transformer Autoencoder (CLDTA), tailored for EEG-based emotion recognition. The CLDTA employs a diagonal masking strategy within its encoder to extracts full-channel EEG data's brain network knowledge, facilitating transferability to the datasets with fewer channels. And an information separation mechanism improves model interpretability by enabling straightforward visualization of brain networks. The CLDTA framework employs contrastive learning to distill subject-independent emotional representations and uses a calibration prediction process to enable rapid adaptation of the model to new subjects with minimal samples, achieving accurate emotion recognition. Our analysis across the SEED, SEED-IV, SEED-V, and DEAP datasets highlights CLDTA's consistent performance and proficiency in detecting both task-specific and general features of EEG signals related to emotions, underscoring its potential to revolutionize emotion recognition research. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08068 [pdf, other]

Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A Survey

Authors: Hao Yang, Yanyan Zhao, Yang Wu, Shilong Wang, Tian Zheng, Hongbo Zhang, Wanxiang Che, Bing Qin

Abstract: Compared to traditional sentiment analysis, which only considers text, multimodal sentiment analysis needs to consider emotional signals from multimodal sources simultaneously and is therefore more consistent with the way how humans process sentiment in real-world scenarios. It involves processing emotional information from various sources such as natural language, images, videos, audio, physiolog… ▽ More Compared to traditional sentiment analysis, which only considers text, multimodal sentiment analysis needs to consider emotional signals from multimodal sources simultaneously and is therefore more consistent with the way how humans process sentiment in real-world scenarios. It involves processing emotional information from various sources such as natural language, images, videos, audio, physiological signals, etc. However, although other modalities also contain diverse emotional cues, natural language usually contains richer contextual information and therefore always occupies a crucial position in multimodal sentiment analysis. The emergence of ChatGPT has opened up immense potential for applying large language models (LLMs) to text-centric multimodal tasks. However, it is still unclear how existing LLMs can adapt better to text-centric multimodal sentiment analysis tasks. This survey aims to (1) present a comprehensive review of recent research in text-centric multimodal sentiment analysis tasks, (2) examine the potential of LLMs for text-centric multimodal sentiment analysis, outlining their approaches, advantages, and limitations, (3) summarize the application scenarios of LLM-based multimodal sentiment analysis technology, and (4) explore the challenges and potential research directions for multimodal sentiment analysis in the future. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.07973 [pdf, other]

Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey

Authors: Shang Wang, Tianqing Zhu, Bo Liu, Ming Ding, Xu Guo, Dayong Ye, Wanlei Zhou, Philip S. Yu

Abstract: With the rapid development of artificial intelligence, large language models (LLMs) have made remarkable advancements in natural language processing. These models are trained on vast datasets to exhibit powerful language understanding and generation capabilities across various applications, including machine translation, chatbots, and agents. However, LLMs have revealed a variety of privacy and se… ▽ More With the rapid development of artificial intelligence, large language models (LLMs) have made remarkable advancements in natural language processing. These models are trained on vast datasets to exhibit powerful language understanding and generation capabilities across various applications, including machine translation, chatbots, and agents. However, LLMs have revealed a variety of privacy and security issues throughout their life cycle, drawing significant academic and industrial attention. Moreover, the risks faced by LLMs differ significantly from those encountered by traditional language models. Given that current surveys lack a clear taxonomy of unique threat models across diverse scenarios, we emphasize the unique privacy and security threats associated with five specific scenarios: pre-training, fine-tuning, retrieval-augmented generation systems, deployment, and LLM-based agents. Addressing the characteristics of each risk, this survey outlines potential threats and countermeasures. Research on attack and defense situations can offer feasible research directions, enabling more areas to benefit from LLMs. △ Less

Submitted 18 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.07846 [pdf, other]

DualVC 3: Leveraging Language Model Generated Pseudo Context for End-to-end Low Latency Streaming Voice Conversion

Authors: Ziqian Ning, Shuai Wang, Pengcheng Zhu, Zhichao Wang, Jixun Yao, Lei Xie, Mengxiao Bi

Abstract: Streaming voice conversion has become increasingly popular for its potential in real-time applications. The recently proposed DualVC 2 has achieved robust and high-quality streaming voice conversion with a latency of about 180ms. Nonetheless, the recognition-synthesis framework hinders end-to-end optimization, and the instability of automatic speech recognition (ASR) model with short chunks makes… ▽ More Streaming voice conversion has become increasingly popular for its potential in real-time applications. The recently proposed DualVC 2 has achieved robust and high-quality streaming voice conversion with a latency of about 180ms. Nonetheless, the recognition-synthesis framework hinders end-to-end optimization, and the instability of automatic speech recognition (ASR) model with short chunks makes it challenging to further reduce latency. To address these issues, we propose an end-to-end model, DualVC 3. With speaker-independent semantic tokens to guide the training of the content encoder, the dependency on ASR is removed and the model can operate under extremely small chunks, with cascading errors eliminated. A language model is trained on the content encoder output to produce pseudo context by iteratively predicting future frames, providing more contextual information for the decoder to improve conversion quality. Experimental results demonstrate that DualVC 3 achieves comparable performance to DualVC 2 in subjective and objective metrics, with a latency of only 50 ms. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech 2024

arXiv:2406.07529 [pdf, other]

MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation

Authors: Lu Li, Tianyu Zhang, Zhiqi Bu, Suyuchen Wang, Huan He, Jie Fu, Yonghui Wu, Jiang Bian, Yong Chen, Yoshua Bengio

Abstract: Model merging has emerged as an effective approach to combine multiple single-task models, fine-tuned from the same pre-trained model, into a multitask model. This process typically involves computing a weighted average of the model parameters without any additional training. Existing model-merging methods focus on enhancing average task accuracy. However, interference and conflicts between the ob… ▽ More Model merging has emerged as an effective approach to combine multiple single-task models, fine-tuned from the same pre-trained model, into a multitask model. This process typically involves computing a weighted average of the model parameters without any additional training. Existing model-merging methods focus on enhancing average task accuracy. However, interference and conflicts between the objectives of different tasks can lead to trade-offs during model merging. In real-world applications, a set of solutions with various trade-offs can be more informative, hel** practitioners make decisions based on diverse preferences. In this paper, we introduce a novel low-compute algorithm, Model Merging with Amortized Pareto Front (MAP). MAP identifies a Pareto set of scaling coefficients for merging multiple models to reflect the trade-offs. The core component of MAP is approximating the evaluation metrics of the various tasks using a quadratic approximation surrogate model derived from a pre-selected set of scaling coefficients, enabling amortized inference. Experimental results on vision and natural language processing tasks show that MAP can accurately identify the Pareto front. To further reduce the required computation of MAP, we propose (1) a Bayesian adaptive sampling algorithm and (2) a nested merging scheme with multiple stages. △ Less

Submitted 18 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.07390 [pdf, other]

DiffCom: Channel Received Signal is a Natural Condition to Guide Diffusion Posterior Sampling

Authors: Sixian Wang, **cheng Dai, Kailin Tan, Xiaoqi Qin, Kai Niu, ** Zhang

Abstract: End-to-end visual communication systems typically optimize a trade-off between channel bandwidth costs and signal-level distortion metrics. However, under challenging physical conditions, this traditional discriminative communication paradigm often results in unrealistic reconstructions with perceptible blurring and aliasing artifacts, despite the inclusion of perceptual or adversarial losses for… ▽ More End-to-end visual communication systems typically optimize a trade-off between channel bandwidth costs and signal-level distortion metrics. However, under challenging physical conditions, this traditional discriminative communication paradigm often results in unrealistic reconstructions with perceptible blurring and aliasing artifacts, despite the inclusion of perceptual or adversarial losses for optimizing. This issue primarily stems from the receiver's limited knowledge about the underlying data manifold and the use of deterministic decoding mechanisms. To address these limitations, this paper introduces DiffCom, a novel end-to-end generative communication paradigm that utilizes off-the-shelf generative priors and probabilistic diffusion models for decoding, thereby improving perceptual quality without heavily relying on bandwidth costs and received signal quality. Unlike traditional systems that rely on deterministic decoders optimized solely for distortion metrics, our DiffCom leverages raw channel-received signal as a fine-grained condition to guide stochastic posterior sampling. Our approach ensures that reconstructions remain on the manifold of real data with a novel confirming constraint, enhancing the robustness and reliability of the generated outcomes. Furthermore, DiffCom incorporates a blind posterior sampling technique to address scenarios with unknown forward transmission characteristics. Extensive experimental validations demonstrate that DiffCom not only produces realistic reconstructions with details faithful to the original data but also achieves superior robustness against diverse wireless transmission degradations. Collectively, these advancements establish DiffCom as a new benchmark in designing generative communication systems that offer enhanced robustness and generalization superiorities. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.07342 [pdf, other]

EdgeTimer: Adaptive Multi-Timescale Scheduling in Mobile Edge Computing with Deep Reinforcement Learning

Authors: Yijun Hao, Shusen Yang, Fang Li, Yifan Zhang, Shibo Wang, Xuebin Ren

Abstract: In mobile edge computing (MEC), resource scheduling is crucial to task requests' performance and service providers' cost, involving multi-layer heterogeneous scheduling decisions. Existing schedulers typically adopt static timescales to regularly update scheduling decisions of each layer, without adaptive adjustment of timescales for different layers, resulting in potentially poor performance in p… ▽ More In mobile edge computing (MEC), resource scheduling is crucial to task requests' performance and service providers' cost, involving multi-layer heterogeneous scheduling decisions. Existing schedulers typically adopt static timescales to regularly update scheduling decisions of each layer, without adaptive adjustment of timescales for different layers, resulting in potentially poor performance in practice. We notice that the adaptive timescales would significantly improve the trade-off between the operation cost and delay performance. Based on this insight, we propose EdgeTimer, the first work to automatically generate adaptive timescales to update multi-layer scheduling decisions using deep reinforcement learning (DRL). First, EdgeTimer uses a three-layer hierarchical DRL framework to decouple the multi-layer decision-making task into a hierarchy of independent sub-tasks for improving learning efficiency. Second, to cope with each sub-task, EdgeTimer adopts a safe multi-agent DRL algorithm for decentralized scheduling while ensuring system reliability. We apply EdgeTimer to a wide range of Kubernetes scheduling rules, and evaluate it using production traces with different workload patterns. Extensive trace-driven experiments demonstrate that EdgeTimer can learn adaptive timescales, irrespective of workload patterns and built-in scheduling rules. It obtains up to 9.1x more profit than existing approaches without sacrificing the delay performance. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.07132 [pdf, ps, other]

Spanning trees and signless Laplacian spectral radius in graphs

Authors: Sufang Wang, Wei Zhang

Abstract: Let $G$ be a connected graph and let $k$ be a positive integer. Let $T$ be a spanning tree of $G$. The leaf degree of a vertex $v\in V(T)$ is defined as the number of leaves adjacent to $v$ in $T$. The leaf degree of $T$ is the maximum leaf degree among all the vertices of $T$. Let $A(G)$ be the adjacency matrix of $G$ and $D(G)$ be the diagonal degree matrix of $G$. Let $Q(G)=D(G)+A(G)$ be the si… ▽ More Let $G$ be a connected graph and let $k$ be a positive integer. Let $T$ be a spanning tree of $G$. The leaf degree of a vertex $v\in V(T)$ is defined as the number of leaves adjacent to $v$ in $T$. The leaf degree of $T$ is the maximum leaf degree among all the vertices of $T$. Let $A(G)$ be the adjacency matrix of $G$ and $D(G)$ be the diagonal degree matrix of $G$. Let $Q(G)=D(G)+A(G)$ be the signless Laplacian matrix of $G$. The largest eigenvalue of $Q(G)$, denoted by $q(G)$, is called the signless Laplacian spectral radius of $G$. In this paper, we investigate the connection between the spanning tree and the signless Laplacian spectral radius of $G$, and put forward a sufficient condition based upon the signless Laplacian spectral radius to guarantee that a graph $G$ contains a spanning tree with leaf degree at most $k$. Finally, we construct some extremal graphs to claim all the bounds obtained in this paper are sharp. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 12 pages

MSC Class: 05C50; 05C05; 90B99

arXiv:2406.07091 [pdf, other]

AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding

Authors: Xing Zhang, Jiaxi Gu, Haoyu Zhao, Shicong Wang, Hang Xu, Ren**g Pei, Songcen Xu, Zuxuan Wu, Yu-Gang Jiang

Abstract: Temporal Video Grounding (TVG) aims to localize a moment from an untrimmed video given the language description. Since the annotation of TVG is labor-intensive, TVG under limited supervision has accepted attention in recent years. The great success of vision-language pre-training guides TVG to follow the traditional "pre-training + fine-tuning" paradigm, however, the pre-training process would suf… ▽ More Temporal Video Grounding (TVG) aims to localize a moment from an untrimmed video given the language description. Since the annotation of TVG is labor-intensive, TVG under limited supervision has accepted attention in recent years. The great success of vision-language pre-training guides TVG to follow the traditional "pre-training + fine-tuning" paradigm, however, the pre-training process would suffer from a lack of temporal modeling and fine-grained alignment due to the difference of data nature between pre-train and test. Besides, the large gap between pretext and downstream tasks makes zero-shot testing impossible for the pre-trained model. To avoid the drawbacks of the traditional paradigm, we propose AutoTVG, a new vision-language pre-training paradigm for TVG that enables the model to learn semantic alignment and boundary regression from automatically annotated untrimmed videos. To be specific, AutoTVG consists of a novel Captioned Moment Generation (CMG) module to generate captioned moments from untrimmed videos, and TVGNet with a regression head to predict localization results. Experimental results on Charades-STA and ActivityNet Captions show that, regarding zero-shot temporal video grounding, AutoTVG achieves highly competitive performance with in-distribution methods under out-of-distribution testing, and is superior to existing pre-training frameworks with much less training data. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: Technique Report

arXiv:2406.06978 [pdf, other]

Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

Authors: Zhenxin Li, Kailin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Yishen Ji, Zhiqi Li, Ziyue Zhu, Jan Kautz, Zuxuan Wu, Yu-Gang Jiang, Jose M. Alvarez

Abstract: We propose Hydra-MDP, a novel paradigm employing multiple teachers in a teacher-student model. This approach uses knowledge distillation from both human and rule-based teachers to train the student model, which features a multi-head decoder to learn diverse trajectory candidates tailored to various evaluation metrics. With the knowledge of rule-based teachers, Hydra-MDP learns how the environment… ▽ More We propose Hydra-MDP, a novel paradigm employing multiple teachers in a teacher-student model. This approach uses knowledge distillation from both human and rule-based teachers to train the student model, which features a multi-head decoder to learn diverse trajectory candidates tailored to various evaluation metrics. With the knowledge of rule-based teachers, Hydra-MDP learns how the environment influences the planning in an end-to-end manner instead of resorting to non-differentiable post-processing. This method achieves the $1^{st}$ place in the Navsim challenge, demonstrating significant improvements in generalization across diverse driving environments and conditions. Code will be available at \url{https://github.com/NVlabs/Hydra-MDP}. △ Less

Submitted 19 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

Comments: The 1st place solution of End-to-end Driving at Scale at the CVPR 2024 Autonomous Grand Challenge

arXiv:2406.06867 [pdf]

Electrically Tunable Magnetoconductance of Close-Packed CVD Bilayer Graphene Layer Stacking Walls

Authors: Qicheng Zhang, Sheng Wang, Zhaoli Gao, Sebastian Hurtado-Parra, Joel Berry, Zachariah Addison, Paul Masih Das, William M. Parkin, Marija Drndic, James M. Kikkawa, Feng Wang, Eugene J. Mele, A. T. Charlie Johnson, Zhengtang Luo

Abstract: Quantum valley Hall (QVH) domain wall states are a new class of one-dimensional (1D) one-way conductors that are topologically protected in the absence of valley mixing. Development beyond a single QVH channel raises important new questions as to how QVH channels in close spatial proximity interact with each other, and how that interaction may be controlled. Scalable epitaxial bilayer graphene syn… ▽ More Quantum valley Hall (QVH) domain wall states are a new class of one-dimensional (1D) one-way conductors that are topologically protected in the absence of valley mixing. Development beyond a single QVH channel raises important new questions as to how QVH channels in close spatial proximity interact with each other, and how that interaction may be controlled. Scalable epitaxial bilayer graphene synthesis produces layer stacking wall (LSW) bundles, where QVH channels are bound, providing an excellent platform to study QVH channel interactions. Here we show that distinct strain sources lead to the formation of both well-separated LSWs and close packed LSW bundles. Comparative studies of electronic transport in these two regimes reveal that close-packed LSW bundles support electrically tunable magnetoconductance. The coexistence of different strain sources offers a potential pathway to realize scalable quantum transport platform based on LSWs where electrically tunability enables programmable functionality. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.06822 [pdf, other]

An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection

Authors: Shenao Yan, Shen Wang, Yue Duan, Hanbin Hong, Kiho Lee, Doowon Kim, Yuan Hong

Abstract: Large Language Models (LLMs) have transformed code completion tasks, providing context-based suggestions to boost developer productivity in software engineering. As users often fine-tune these models for specific applications, poisoning and backdoor attacks can covertly alter the model outputs. To address this critical security challenge, we introduce CodeBreaker, a pioneering LLM-assisted backdoo… ▽ More Large Language Models (LLMs) have transformed code completion tasks, providing context-based suggestions to boost developer productivity in software engineering. As users often fine-tune these models for specific applications, poisoning and backdoor attacks can covertly alter the model outputs. To address this critical security challenge, we introduce CodeBreaker, a pioneering LLM-assisted backdoor attack framework on code completion models. Unlike recent attacks that embed malicious payloads in detectable or irrelevant sections of the code (e.g., comments), CodeBreaker leverages LLMs (e.g., GPT-4) for sophisticated payload transformation (without affecting functionalities), ensuring that both the poisoned data for fine-tuning and generated code can evade strong vulnerability detection. CodeBreaker stands out with its comprehensive coverage of vulnerabilities, making it the first to provide such an extensive set for evaluation. Our extensive experimental evaluations and user studies underline the strong attack performance of CodeBreaker across various settings, validating its superiority over existing approaches. By integrating malicious payloads directly into the source code with minimal transformation, CodeBreaker challenges current security measures, underscoring the critical need for more robust defenses for code completion. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: To appear in USENIX Security '24

arXiv:2406.06813 [pdf, other]

Stable Neighbor Denoising for Source-free Domain Adaptive Segmentation

Authors: Dong Zhao, Shuang Wang, Qi Zang, Licheng Jiao, Nicu Sebe, Zhun Zhong

Abstract: We study source-free unsupervised domain adaptation (SFUDA) for semantic segmentation, which aims to adapt a source-trained model to the target domain without accessing the source data. Many works have been proposed to address this challenging problem, among which uncertainty-based self-training is a predominant approach. However, without comprehensive denoising mechanisms, they still largely fall… ▽ More We study source-free unsupervised domain adaptation (SFUDA) for semantic segmentation, which aims to adapt a source-trained model to the target domain without accessing the source data. Many works have been proposed to address this challenging problem, among which uncertainty-based self-training is a predominant approach. However, without comprehensive denoising mechanisms, they still largely fall into biased estimates when dealing with different domains and confirmation bias. In this paper, we observe that pseudo-label noise is mainly contained in unstable samples in which the predictions of most pixels undergo significant variations during self-training. Inspired by this, we propose a novel mechanism to denoise unstable samples with stable ones. Specifically, we introduce the Stable Neighbor Denoising (SND) approach, which effectively discovers highly correlated stable and unstable samples by nearest neighbor retrieval and guides the reliable optimization of unstable samples by bi-level learning. Moreover, we compensate for the stable set by object-level object paste, which can further eliminate the bias caused by less learned classes. Our SND enjoys two advantages. First, SND does not require a specific segmentor structure, endowing its universality. Second, SND simultaneously addresses the issues of class, domain, and confirmation biases during adaptation, ensuring its effectiveness. Extensive experiments show that SND consistently outperforms state-of-the-art methods in various SFUDA semantic segmentation settings. In addition, SND can be easily integrated with other approaches, obtaining further improvements. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 2024 Conference on Computer Vision and Pattern Recognition

Journal ref: (2024 Conference on Computer Vision and Pattern Recognition)

arXiv:2406.06567 [pdf, other]

DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion

Authors: Yilong Chen, Linhao Zhang, Junyuan Shang, Zhenyu Zhang, Tingwen Liu, Shuohuan Wang, Yu Sun

Abstract: Large language models (LLMs) with billions of parameters demonstrate impressive performance. However, the widely used Multi-Head Attention (MHA) in LLMs incurs substantial computational and memory costs during inference. While some efforts have optimized attention mechanisms by pruning heads or sharing parameters among heads, these methods often lead to performance degradation or necessitate subst… ▽ More Large language models (LLMs) with billions of parameters demonstrate impressive performance. However, the widely used Multi-Head Attention (MHA) in LLMs incurs substantial computational and memory costs during inference. While some efforts have optimized attention mechanisms by pruning heads or sharing parameters among heads, these methods often lead to performance degradation or necessitate substantial continued pre-training costs to restore performance. Based on the analysis of attention redundancy, we design a Decoupled-Head Attention (DHA) mechanism. DHA adaptively configures group sharing for key heads and value heads across various layers, achieving a better balance between performance and efficiency. Inspired by the observation of clustering similar heads, we propose to progressively transform the MHA checkpoint into the DHA model through linear fusion of similar head parameters step by step, retaining the parametric knowledge of the MHA checkpoint. We construct DHA models by transforming various scales of MHA checkpoints given target head budgets. Our experiments show that DHA remarkably requires a mere 0.25\% of the original model's pre-training budgets to achieve 97.6\% of performance while saving 75\% of KV cache. Compared to Group-Query Attention (GQA), DHA achieves a 5$\times$ training acceleration, a maximum of 13.93\% performance improvement under 0.01\% pre-training budget, and 4\% relative improvement under 0.05\% pre-training budget. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 10 pages, 9 figures, 3 tables

arXiv:2406.06564 [pdf, ps, other]

Revolutionizing Large Language Model Training through Dynamic Parameter Adjustment

Authors: Kaiye Zhou, Shucheng Wang

Abstract: In the era of large language models, the demand for efficient use of computational resources has become critically important. Although parameter-efficient fine-tuning techniques have achieved results comparable to full fine-tuning, their application during the pre-training phase poses significant challenges. Specifically, employing parameter-efficient strategies at the onset of pre-training can se… ▽ More In the era of large language models, the demand for efficient use of computational resources has become critically important. Although parameter-efficient fine-tuning techniques have achieved results comparable to full fine-tuning, their application during the pre-training phase poses significant challenges. Specifically, employing parameter-efficient strategies at the onset of pre-training can severely compromise efficiency, especially in larger models. In this paper, building upon the fine-tuning method LoRA, we introduce a novel parameter-efficient training technique that frequently alters trainable part of parameters, facilitating effective pre-training. Our method not only achieves memory reductions and computational overhead comparable to current state-of-the-art parameter-efficient algorithms during the pre-training phase but also maintains accuracy levels comparable to those of full pre-training. We provide both theoretical analyses and empirical evidence to demonstrate the effectiveness of our approach. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: This paper introduces an innovative parameter-efficient training method that dynamically switches parameters throughout the entire training period, achieving significant memory and computational savings

arXiv:2406.06559 [pdf, other]

Harnessing Business and Media Insights with Large Language Models

Authors: Yujia Bao, Ankit Parag Shah, Neeru Narang, Jonathan Rivers, Rajeev Maksey, Lan Guan, Louise N. Barrere, Shelley Evenson, Rahul Basole, Connie Miao, Ankit Mehta, Fabien Boulay, Su Min Park, Natalie E. Pearson, Eldhose Joy, Tiger He, Sumiran Thakur, Koustav Ghosal, Josh On, Phoebe Morrison, Tim Major, Eva Siqi Wang, Gina Escobar, Jiaheng Wei, Tharindu Cyril Weerasooriya , et al. (8 additional authors not shown)

Abstract: This paper introduces Fortune Analytics Language Model (FALM). FALM empowers users with direct access to comprehensive business analysis, including market trends, company performance metrics, and expert insights. Unlike generic LLMs, FALM leverages a curated knowledge base built from professional journalism, enabling it to deliver precise and in-depth answers to intricate business questions. Users… ▽ More This paper introduces Fortune Analytics Language Model (FALM). FALM empowers users with direct access to comprehensive business analysis, including market trends, company performance metrics, and expert insights. Unlike generic LLMs, FALM leverages a curated knowledge base built from professional journalism, enabling it to deliver precise and in-depth answers to intricate business questions. Users can further leverage natural language queries to directly visualize financial data, generating insightful charts and graphs to understand trends across diverse business sectors clearly. FALM fosters user trust and ensures output accuracy through three novel methods: 1) Time-aware reasoning guarantees accurate event registration and prioritizes recent updates. 2) Thematic trend analysis explicitly examines topic evolution over time, providing insights into emerging business landscapes. 3) Content referencing and task decomposition enhance answer fidelity and data visualization accuracy. We conduct both automated and human evaluations, demonstrating FALM's significant performance improvements over baseline methods while prioritizing responsible AI practices. These benchmarks establish FALM as a cutting-edge LLM in the business and media domains, with exceptional accuracy and trustworthiness. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2406.06490 [pdf, other]

How much longer do you have to drive than the crow has to fly?

Authors: Shanshan Wang, Henrik M. Bette, Michael Schreckenberg, Thomas Guhr

Abstract: When we travel by car from one location to another, our route is constrained by the road network. The resulting network distance is generally longer than the geodetic distance, i.e. the distance as the crow flies, between the two locations. We report a systematic relation between the statistical properties of these two distances. In empirical analyses for large motorway networks in various countri… ▽ More When we travel by car from one location to another, our route is constrained by the road network. The resulting network distance is generally longer than the geodetic distance, i.e. the distance as the crow flies, between the two locations. We report a systematic relation between the statistical properties of these two distances. In empirical analyses for large motorway networks in various countries and areas, we work out distributions of network and geodetic distances and identify a surprisingly robust scaling property between them. A simple consequence is that we typically have to drive $1.3\pm0.1$ times longer than the crow flies. Moreover, we show that this scaling is not present in standard random networks; rather, it requires a certain non-randomness, namely adjacency. We develop a set of rules to build a realistic motorway network, also consistent with the scaling properties found empirically. We hypothesize that the scaling reflects, in a rather universal fashion, a compromise between two societal needs: high efficiency and accessibility on the one hand, and limitation of costs and other burdens on the other. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.06462 [pdf, other]

VCR: Visual Caption Restoration

Authors: Tianyu Zhang, Suyuchen Wang, Lu Li, Ge Zhang, Perouz Taslakian, Sai Rajeswar, Jie Fu, Bang Liu, Yoshua Bengio

Abstract: We introduce Visual Caption Restoration (VCR), a novel vision-language task that challenges models to accurately restore partially obscured texts using pixel-level hints within images. This task stems from the observation that text embedded in images is intrinsically different from common visual elements and natural language due to the need to align the modalities of vision, text, and text embedde… ▽ More We introduce Visual Caption Restoration (VCR), a novel vision-language task that challenges models to accurately restore partially obscured texts using pixel-level hints within images. This task stems from the observation that text embedded in images is intrinsically different from common visual elements and natural language due to the need to align the modalities of vision, text, and text embedded in images. While numerous works have integrated text embedded in images into visual question-answering tasks, approaches to these tasks generally rely on optical character recognition or masked language modeling, thus reducing the task to mainly text-based processing. However, text-based processing becomes ineffective in VCR as accurate text restoration depends on the combined information from provided images, context, and subtle cues from the tiny exposed areas of masked texts. We develop a pipeline to generate synthetic images for the VCR task using image-caption pairs, with adjustable caption visibility to control the task difficulty. With this pipeline, we construct a dataset for VCR called VCR-Wiki using images with captions from Wikipedia, comprising 2.11M English and 346K Chinese entities in both easy and hard split variants. Our results reveal that current vision language models significantly lag behind human performance in the VCR task, and merely fine-tuning the models on our dataset does not lead to notable improvements. We release VCR-Wiki and the data construction code to facilitate future research. △ Less

Submitted 24 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

Comments: 17 pages, 2 figures

arXiv:2406.06446 [pdf, other]

Deep Generative Modeling Reshapes Compression and Transmission: From Efficiency to Resiliency

Authors: **cheng Dai, Xiaoqi Qin, Sixian Wang, Lexi Xu, Kai Niu, ** Zhang

Abstract: Information theory and machine learning are inextricably linked and have even been referred to as "two sides of the same coin". One particularly elegant connection is the essential equivalence between probabilistic generative modeling and data compression or transmission. In this article, we reveal the dual-functionality of deep generative models that reshapes both data compression for efficiency… ▽ More Information theory and machine learning are inextricably linked and have even been referred to as "two sides of the same coin". One particularly elegant connection is the essential equivalence between probabilistic generative modeling and data compression or transmission. In this article, we reveal the dual-functionality of deep generative models that reshapes both data compression for efficiency and transmission error concealment for resiliency. We present how the contextual predictive capabilities of powerful generative models can be well positioned to be strong compressors and estimators. In this sense, we advocate for viewing the deep generative modeling problem through the lens of end-to-end communications, and evaluate the compression and error restoration capabilities of foundation generative models. We show that the kernel of many large generative models is powerful predictor that can capture complex relationships among semantic latent variables, and the communication viewpoints provide novel insights into semantic feature tokenization, contextual learning, and usage of deep generative models. In summary, our article highlights the essential connections of generative AI to source and channel coding techniques, and motivates researchers to make further explorations in this emerging topic. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: Publication in IEEE Wireless Communications

arXiv:2406.06118 [pdf, other]

Strong and weak $CP$ tests in sequential decays of polarized $Σ^0$ hyperons

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (644 additional authors not shown)

Abstract: The $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ processes and subsequent decays are studied using the world's largest $J/ψ$ and $ψ(3686)$ data samples collected with the BESIII detector. The strong-$CP$ symmetry is tested in the decays of the $Σ^0$ hyperons for the first time by measuring the decay parameters, $α_{Σ^0} = -0.0017 \pm 0.0021 \pm 0.0018$ and $\barα_{Σ^0} = 0.0021 \pm 0.0020 \pm 0.0022$. The wea… ▽ More The $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ processes and subsequent decays are studied using the world's largest $J/ψ$ and $ψ(3686)$ data samples collected with the BESIII detector. The strong-$CP$ symmetry is tested in the decays of the $Σ^0$ hyperons for the first time by measuring the decay parameters, $α_{Σ^0} = -0.0017 \pm 0.0021 \pm 0.0018$ and $\barα_{Σ^0} = 0.0021 \pm 0.0020 \pm 0.0022$. The weak-$CP$ test is performed in the subsequent decays of their daughter particles $Λ$ and $\barΛ$. Also for the first time, the transverse polarizations of the $Σ^0$ hyperons in $J/ψ$ and $ψ(3686)$ decays are observed with opposite directions, and the ratios between the S-wave and D-wave contributions of the $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ decays are obtained. These results are crucial to understand the decay dynamics of the charmonium states and the production mechanism of the $Σ^0-\barΣ^0$ pairs. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.06104 [pdf]

Correlated electrons of the flat band in charge density wave state of 4Hb-TaSexS2-x

Authors: Yanyan Geng, Jianfeng Guo, Fanyu Meng, Manyu Wang, Shuo Mi, Li Huang, Rui Xu, Fei Pang, Kai Liu, Shancai Wang, Hong-Jun Gao, Weichang Zhou, Wei Ji, Hechang Lei, Zhihai Cheng

Abstract: Many intriguing quantum states of matter, such as unconventional superconductivity, magnetic phases and fractional quantum Hall physics, emergent from the spatially-correlated localized electrons in the flat band of solid materials. By using scanning tunneling microscopy and spectroscopy (STM/STS), we report the real-space investigation of correlated electrons in the flat band of superlattice 4Hb-… ▽ More Many intriguing quantum states of matter, such as unconventional superconductivity, magnetic phases and fractional quantum Hall physics, emergent from the spatially-correlated localized electrons in the flat band of solid materials. By using scanning tunneling microscopy and spectroscopy (STM/STS), we report the real-space investigation of correlated electrons in the flat band of superlattice 4Hb-TaSexS2-x. In contrast with the pristine 4Hb-TaS2, the selenium (Se) substitutions significantly affect the interfacial transfer of correlated electrons between the CDW states of 1T- and 1H-TaS2 layers, and contribute a real-space fractional electron-filling configurations with the distributed electron-filled and -void SoD clusters of 1T-layer. The site-specific STS spectra directly reveal their respective prominent spectra weight above EF and symmetric Mott-like spectra. In addition, the spatial distributions of these electron-filled SoDs in the 1T-layer of 4Hb-TaSe0.7S1.3 demonstrate different local short-range patterning, clearly indicating the complex neighboring interactions among the localized electrons in the flat band of 1T-layer. Our results not only provide an in-depth insight of correlated electrons in the flat CDW band, and provide a simple platform to manipulate the electron-correlation-related quantum states. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 18 pages, 4 figures

arXiv:2406.05986 [pdf, other]

Neural-g: A Deep Learning Framework for Mixing Density Estimation

Authors: Shijie Wang, Saptarshi Chakraborty, Qian Qin, Ray Bai

Abstract: Mixing (or prior) density estimation is an important problem in machine learning and statistics, especially in empirical Bayes $g$-modeling where accurately estimating the prior is necessary for making good posterior inferences. In this paper, we propose neural-$g$, a new neural network-based estimator for $g$-modeling. Neural-$g$ uses a softmax output layer to ensure that the estimated prior is a… ▽ More Mixing (or prior) density estimation is an important problem in machine learning and statistics, especially in empirical Bayes $g$-modeling where accurately estimating the prior is necessary for making good posterior inferences. In this paper, we propose neural-$g$, a new neural network-based estimator for $g$-modeling. Neural-$g$ uses a softmax output layer to ensure that the estimated prior is a valid probability density. Under default hyperparameters, we show that neural-$g$ is very flexible and capable of capturing many unknown densities, including those with flat regions, heavy tails, and/or discontinuities. In contrast, existing methods struggle to capture all of these prior shapes. We provide justification for neural-$g$ by establishing a new universal approximation theorem regarding the capability of neural networks to learn arbitrary probability mass functions. To accelerate convergence of our numerical implementation, we utilize a weighted average gradient descent approach to update the network parameters. Finally, we extend neural-$g$ to multivariate prior density estimation. We illustrate the efficacy of our approach through simulations and analyses of real datasets. A software package to implement neural-$g$ is publicly available at https://github.com/shijiew97/neuralG. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: 40 pages, 8 figures, 5 tables

arXiv:2406.05974 [pdf, other]

Inter-slice Super-resolution of Magnetic Resonance Images by Pre-training and Self-supervised Fine-tuning

Authors: Xin Wang, Zhiyun Song, Yitao Zhu, Sheng Wang, Lichi Zhang, Dinggang Shen, Qian Wang

Abstract: In clinical practice, 2D magnetic resonance (MR) sequences are widely adopted. While individual 2D slices can be stacked to form a 3D volume, the relatively large slice spacing can pose challenges for both image visualization and subsequent analysis tasks, which often require isotropic voxel spacing. To reduce slice spacing, deep-learning-based super-resolution techniques are widely investigated.… ▽ More In clinical practice, 2D magnetic resonance (MR) sequences are widely adopted. While individual 2D slices can be stacked to form a 3D volume, the relatively large slice spacing can pose challenges for both image visualization and subsequent analysis tasks, which often require isotropic voxel spacing. To reduce slice spacing, deep-learning-based super-resolution techniques are widely investigated. However, most current solutions require a substantial number of paired high-resolution and low-resolution images for supervised training, which are typically unavailable in real-world scenarios. In this work, we propose a self-supervised super-resolution framework for inter-slice super-resolution of MR images. Our framework is first featured by pre-training on video dataset, as temporal correlation of videos is found beneficial for modeling the spatial relation among MR slices. Then, we use public high-quality MR dataset to fine-tune our pre-trained model, for enhancing awareness of our model to medical data. Finally, given a target dataset at hand, we utilize self-supervised fine-tuning to further ensure our model works well with user-specific super-resolution tasks. The proposed method demonstrates superior performance compared to other self-supervised methods and also holds the potential to benefit various downstream applications. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: ISBI 2024

arXiv:2406.05838 [pdf, other]

Bubbles kick off primordial black holes to form more binaries

Authors: Zi-Yan Yuwen, Cristian Joana, Shao-Jiang Wang, Rong-Gen Cai

Abstract: Primordial black holes (PBHs) may form before cosmological first-order phase transitions, leading to inevitable collisions between PBHs and bubble walls. In this Letter, we have simulated for the first time the co-evolution of an expanding scalar wall passing through a black hole with full numerical relativity. This black hole-bubble wall collision yields multiple far-reaching phenomena including… ▽ More Primordial black holes (PBHs) may form before cosmological first-order phase transitions, leading to inevitable collisions between PBHs and bubble walls. In this Letter, we have simulated for the first time the co-evolution of an expanding scalar wall passing through a black hole with full numerical relativity. This black hole-bubble wall collision yields multiple far-reaching phenomena including the PBH mass growth, gravitational wave radiations, and momentum recoil that endows PBHs with additional velocities, approximately doubling the formation rate for PBH binaries and hence strengthening the observational constraints on the PBH abundances. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: 5 pages + supplemental material

arXiv:2406.05827 [pdf, ps, other]

Measurement of the integrated luminosity of the data collected at 3.773 GeV by BESIII from 2021 to 2024

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

Abstract: We present a measurement of the integrated luminosity of $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at a center-of-mass energy of $E_{\rm cm} = 3.773$~GeV. The integrated luminosities of the data sets taken from December 2021 to June 2022, from November 2022 to June 2023, and from October 2023 to February 2024 are determined to be $4.995 \pm 0.019$~fb$^{-1}$,… ▽ More We present a measurement of the integrated luminosity of $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at a center-of-mass energy of $E_{\rm cm} = 3.773$~GeV. The integrated luminosities of the data sets taken from December 2021 to June 2022, from November 2022 to June 2023, and from October 2023 to February 2024 are determined to be $4.995 \pm 0.019$~fb$^{-1}$, $8.157 \pm 0.031$~fb$^{-1}$, and $4.191 \pm 0.016$~fb$^{-1}$, respectively, by analyzing large angle Bhabha scattering events. The uncertainties are dominated by systematic effects and the statistical uncertainties are negligible. Our results provide essential input for future analyses and precision measurements. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2406.05770 [pdf, other]

LAYCAST: LAYered CAvern Surface Tracker at future electron-positron colliders

Authors: Ye Lu, Ying-nan Mao, Kechen Wang, Zeren Simon Wang

Abstract: We propose a detector concept, LAYered CAvern Surface Tracker (LAYCAST), to be installed on the ceiling and the wall of the cavern hosting the main experiment of future electron-positron colliders such as CEPC and FCC-ee. With detailed and realistic considerations of the design of such a new experiment, the proposed detector is dedicated to extending the sensitivity reach of the main detector to v… ▽ More We propose a detector concept, LAYered CAvern Surface Tracker (LAYCAST), to be installed on the ceiling and the wall of the cavern hosting the main experiment of future electron-positron colliders such as CEPC and FCC-ee. With detailed and realistic considerations of the design of such a new experiment, the proposed detector is dedicated to extending the sensitivity reach of the main detector to various theoretical scenarios of long-lived particles (LLPs). We study carefully four such scenarios involving a light scalar boson $X$, the heavy neutral lepton $N$, the lightest neutralino $\tildeχ^0_1$ in the R-parity-violating supersymmetry, and the axion-like particle $a$. Long-lived light scalar bosons are considered to be produced from the Standard-Model (SM) Higgs boson's decay ($h \to X X$) at the center-of-mass energy $\sqrt{s} =$ 240 GeV, while the other three types of LLPs are produced either from $Z$-boson decays (viz. $Z \to ν\, N, ~\tildeχ^0_1\, \tildeχ^0_1 $) or direct scattering process ($ e^- e^+ \to ~γ\, a$) at $\sqrt{s} =$ 91.2 GeV, where $γ$ and $ν$ denote the SM photon and neutrino, respectively. With Monte-Carlo simulations, we derive the sensitivities of the proposed experiment to these LLPs and the corresponding signal-event numbers. Our findings show that LAYCAST can probe large new parameter space beyond both current bounds and the expected reach of the main experiments at CEPC and FCC-ee. Comparison with existing works in similar directions is also made. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: 23 pages, 19 figures, 1 table

arXiv:2406.05763 [pdf, other]

WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark

Authors: Linhan Ma, Dake Guo, Kun Song, Yuepeng Jiang, Shuai Wang, Liumeng Xue, Weiming Xu, Huan Zhao, Binbin Zhang, Lei Xie

Abstract: With the development of large text-to-speech (TTS) models and scale-up of the training data, state-of-the-art TTS systems have achieved impressive performance. In this paper, we present WenetSpeech4TTS, a multi-domain Mandarin corpus derived from the open-sourced WenetSpeech dataset. Tailored for the text-to-speech tasks, we refined WenetSpeech by adjusting segment boundaries, enhancing the audio… ▽ More With the development of large text-to-speech (TTS) models and scale-up of the training data, state-of-the-art TTS systems have achieved impressive performance. In this paper, we present WenetSpeech4TTS, a multi-domain Mandarin corpus derived from the open-sourced WenetSpeech dataset. Tailored for the text-to-speech tasks, we refined WenetSpeech by adjusting segment boundaries, enhancing the audio quality, and eliminating speaker mixing within each segment. Following a more accurate transcription process and quality-based data filtering process, the obtained WenetSpeech4TTS corpus contains $12,800$ hours of paired audio-text data. Furthermore, we have created subsets of varying sizes, categorized by segment quality scores to allow for TTS model training and fine-tuning. VALL-E and NaturalSpeech 2 systems are trained and fine-tuned on these subsets to validate the usability of WenetSpeech4TTS, establishing baselines on benchmark for fair comparison of TTS systems. The corpus and corresponding benchmarks are publicly available on huggingface. △ Less

Submitted 19 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

Comments: Accepted by INTERSPEECH2024

arXiv:2406.05744 [pdf]

Direct observations of cross-scale energy transfer in space plasmas

Authors: **g-Huan Li, Xu-Zhi Zhou, Zhi-Yang Liu, Shan Wang, Yoshiharu Omura, Li Li, Chao Yue, Qiu-Gang Zong, Guan Le, Christopher T. Russell, James L. Burch

Abstract: The collisionless plasmas in space and astrophysical environments are intrinsically multiscale in nature, behaving as conducting fluids at macroscales and kinetically at microscales comparable to ion- and/or electron-gyroradii. A fundamental question in understanding the plasma dynamics is how energy is transported and dissipated across different scales. Here, we present spacecraft measurements in… ▽ More The collisionless plasmas in space and astrophysical environments are intrinsically multiscale in nature, behaving as conducting fluids at macroscales and kinetically at microscales comparable to ion- and/or electron-gyroradii. A fundamental question in understanding the plasma dynamics is how energy is transported and dissipated across different scales. Here, we present spacecraft measurements in the solar wind upstream of the terrestrial bow shock, in which the macroscale ultra-low-frequency waves and microscale whistler waves simultaneously resonate with the ions. The ion acceleration from ultra-low-frequency waves leads to velocity distributions unstable to the growth of whistler waves, which in turn resonate with the electrons to complete cross-scale energy transfer. These observations, consistent with numerical simulations in the occurrence of phase-bunched ion and electron distributions, also highlight the importance of anomalous resonance, a nonlinear modification of the classical cyclotron resonance, in the cross-scale wave coupling and energy transfer processes. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: 22 pages, 7 figures and supplementary material

Showing 101–150 of 8,583 results for author: Wang, S