Search | arXiv e-print repository

DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models

Authors: Chang-Han Yeh, Chin-Yang Lin, Zhixiang Wang, Chi-Wei Hsiao, Ting-Hsuan Chen, Yu-Lun Liu

Abstract: This paper introduces a method for zero-shot video restoration using pre-trained image restoration diffusion models. Traditional video restoration methods often need retraining for different settings and struggle with limited generalization across various degradation types and datasets. Our approach uses a hierarchical token merging strategy for keyframes and local frames, combined with a hybrid c… ▽ More This paper introduces a method for zero-shot video restoration using pre-trained image restoration diffusion models. Traditional video restoration methods often need retraining for different settings and struggle with limited generalization across various degradation types and datasets. Our approach uses a hierarchical token merging strategy for keyframes and local frames, combined with a hybrid correspondence mechanism that blends optical flow and feature-based nearest neighbor matching (latent merging). We show that our method not only achieves top performance in zero-shot video restoration but also significantly surpasses trained models in generalization across diverse datasets and extreme degradations (8$\times$ super-resolution and high-standard deviation video denoising). We present evidence through quantitative metrics and visual comparisons on various challenging datasets. Additionally, our technique works with any 2D restoration diffusion model, offering a versatile and powerful tool for video enhancement tasks without extensive retraining. This research leads to more efficient and widely applicable video restoration technologies, supporting advancements in fields that require high-quality video output. See our project page for video results at https://jimmycv07.github.io/DiffIR2VR_web/. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.01278 [pdf]

doi 10.1109/JSTARS.2021.3115637

Small Aerial Target Detection for Airborne Infrared Detection Systems using LightGBM and Trajectory Constraints

Authors: Xiaoliang Sun, Liangchao Guo, Wenlong Zhang, Zi Wang, Qifeng Yu

Abstract: Factors, such as rapid relative motion, clutter background, etc., make robust small aerial target detection for airborne infrared detection systems a challenge. Existing methods are facing difficulties when dealing with such cases. We consider that a continuous and smooth trajectory is critical in boosting small infrared aerial target detection performance. A simple and effective small aerial targ… ▽ More Factors, such as rapid relative motion, clutter background, etc., make robust small aerial target detection for airborne infrared detection systems a challenge. Existing methods are facing difficulties when dealing with such cases. We consider that a continuous and smooth trajectory is critical in boosting small infrared aerial target detection performance. A simple and effective small aerial target detection method for airborne infrared detection system using light gradient boosting model (LightGBM) and trajectory constraints is proposed in this article. First, we simply formulate target candidate detection as a binary classification problem. Target candidates in every individual frame are detected via interesting pixel detection and a trained LightGBM model. Then, the local smoothness and global continuous characteristic of the target trajectory are modeled as short-strict and long-loose constraints. The trajectory constraints are used efficiently for detecting the true small infrared aerial targets from numerous target candidates. Experiments on public datasets demonstrate that the proposed method performs better than other existing methods. Furthermore, a public dataset for small aerial target detection in airborne infrared detection systems is constructed. To the best of our knowledge, this dataset has the largest data scale and richest scene types within this field. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 15 pages,10 figures

Journal ref: IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 14 9959-9973 2021

arXiv:2407.01219 [pdf, other]

Searching for Best Practices in Retrieval-Augmented Generation

Authors: Xiaohua Wang, Zhenghua Wang, Xuan Gao, Feiran Zhang, Yixin Wu, Zhibo Xu, Tianyuan Shi, Zhengyuan Wang, Shizheng Li, Qi Qian, Ruicheng Yin, Changze Lv, Xiaoqing Zheng, Xuan**g Huang

Abstract: Retrieval-augmented generation (RAG) techniques have proven to be effective in integrating up-to-date information, mitigating hallucinations, and enhancing response quality, particularly in specialized domains. While many RAG approaches have been proposed to enhance large language models through query-dependent retrievals, these approaches still suffer from their complex implementation and prolong… ▽ More Retrieval-augmented generation (RAG) techniques have proven to be effective in integrating up-to-date information, mitigating hallucinations, and enhancing response quality, particularly in specialized domains. While many RAG approaches have been proposed to enhance large language models through query-dependent retrievals, these approaches still suffer from their complex implementation and prolonged response times. Typically, a RAG workflow involves multiple processing steps, each of which can be executed in various ways. Here, we investigate existing RAG approaches and their potential combinations to identify optimal RAG practices. Through extensive experiments, we suggest several strategies for deploying RAG that balance both performance and efficiency. Moreover, we demonstrate that multimodal retrieval techniques can significantly enhance question-answering capabilities about visual inputs and accelerate the generation of multimodal content using a "retrieval as generation" strategy. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.01100 [pdf, other]

Eliminating Position Bias of Language Models: A Mechanistic Approach

Authors: Ziqi Wang, Hanlin Zhang, Xiner Li, Kuan-Hao Huang, Chi Han, Shuiwang Ji, Sham M. Kakade, Hao Peng, Heng Ji

Abstract: Position bias has proven to be a prevalent issue of modern language models (LMs), where the models prioritize content based on its position within the given context. This bias often leads to unexpected model failures and hurts performance, robustness, and reliability across various applications. Our mechanistic analysis attributes the position bias to two components employed in nearly all state-of… ▽ More Position bias has proven to be a prevalent issue of modern language models (LMs), where the models prioritize content based on its position within the given context. This bias often leads to unexpected model failures and hurts performance, robustness, and reliability across various applications. Our mechanistic analysis attributes the position bias to two components employed in nearly all state-of-the-art LMs: causal attention and relative positional encodings. Specifically, we find that causal attention generally causes models to favor distant content, while relative positional encodings like RoPE prefer nearby ones based on the analysis of retrieval-augmented question answering (QA). Further, our empirical study on object detection reveals that position bias is also present in vision-language models (VLMs). Based on the above analyses, we propose to ELIMINATE position bias caused by different input segment orders (e.g., options in LM-as-a-judge, retrieved documents in QA) in a TRAINING-FREE ZERO-SHOT manner. Our method changes the causal attention to bidirectional attention between segments and utilizes model attention values to decide the relative orders of segments instead of using the order provided in input prompts, therefore enabling Position-INvariant inferencE (PINE) at the segment level. By eliminating position bias, models achieve better performance and reliability in downstream tasks where position bias widely exists, such as LM-as-a-judge and retrieval-augmented QA. Notably, PINE is especially useful when adapting LMs for evaluating reasoning pairs: it consistently provides 8 to 10 percentage points performance gains in most cases, and makes Llama-3-70B-Instruct perform even better than GPT-4-0125-preview on the RewardBench reasoning subset. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 18 pages, 5 figures

arXiv:2407.01048 [pdf, other]

Self-absorption of Hankel systems on monoids --a seemingly universal property

Authors: Yong Han, Yanqi Qiu, Zipeng Wang

Abstract: Given any cancellative monoid $\mathcal{M}$, we study the Hankel system determined by its multiplication table. We prove that the Hankel system admits self-absorption property provided that the monoid $\mathcal{M}$ has the local algebraic structure: \[ \big(ax = by, cx=dy, az=bw \,\, \text{in $\mathcal{M}$}\big)\Longrightarrow \big(cz=dw \,\, \text{in $\mathcal{M}$}\big). \] Our result holds for a… ▽ More Given any cancellative monoid $\mathcal{M}$, we study the Hankel system determined by its multiplication table. We prove that the Hankel system admits self-absorption property provided that the monoid $\mathcal{M}$ has the local algebraic structure: \[ \big(ax = by, cx=dy, az=bw \,\, \text{in $\mathcal{M}$}\big)\Longrightarrow \big(cz=dw \,\, \text{in $\mathcal{M}$}\big). \] Our result holds for all group-embeddable monoids and goes beyond. In particular, it works for all cancellative Abelian monoids and most common non-Abelian cancellative monoids such as $$ \mathrm{SL}_d(\mathbb{N}): = \big\{[a_{ij}]_{1\le i,j\le d}\in \mathrm{SL}_d(\mathbb{Z})\big| a_{ij} \in \mathbb{N}\big\}. $$ The Hankel system determined by the multiplication table of a monoid is further generalized to that determined by level sets of any abstract two-variable map. We introduce an algebraic notion of lunar maps and establish a stronger hereditary self-absorption property for the corresponding generalized Hankel systems. As a consequence, we prove the self-absorption property for arbitrary spatial compression of the regular representation system $\{λ_G(g)\}_{g\in G}$ of any discrete group $G$, as well as the Hankel system $\{Γ_\ell^Φ\}$ determined by the level sets of any rational map of the form $Φ(x,y)=a x^m + b y^n$ with $a,b,m,n\in \mathbb{Z}^*$: \[ Γ_\ell^Φ(x, y)= \mathbf{1}(a x^m + b y^n= \ell), \quad x, y\in \mathbb{N}^*, \, \ell\in Φ(\mathbb{N}^*\times \mathbb{N}^*). \] The self-absorption property is applied to the study of completely bounded Fourier multipliers between Hardy spaces. Further applications are: i) exact complete bounded norm of the Carleman embedding in any dimension; ii) mixed Fourier-Schur multiplier inequalities with critical exponent $4/3$; iii) failure of hyper-complete-contractivity for the Poisson semigroup. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 45 pages

arXiv:2407.01035 [pdf]

Off-site production of plasma-activated water for efficient sterilization: the crucial role of high-valence NOx and new chemical pathways

Authors: Zifeng Wang, Xiangyu Wang, Shenghang Xu, Renwu Zhou, Mingyan Zhang, Wanchun Li, Zizhu Zhang, Luge Wang, **kun Chen, Jishen Zhang, Li Guo, Dandan Pei, Dingxin Liu, Mingzhe Rong

Abstract: Efficient sterilization of pathogens with cleaner methods is a critical concern for environmental disinfection and clinical anti-infective treatment. Plasma-activated water (PAW) is a promising alternative to chemical disinfectants and antibiotics for its strong sterilization ability and not inducing any acute toxicity, and only water and air are consumed during production. For more efficient wate… ▽ More Efficient sterilization of pathogens with cleaner methods is a critical concern for environmental disinfection and clinical anti-infective treatment. Plasma-activated water (PAW) is a promising alternative to chemical disinfectants and antibiotics for its strong sterilization ability and not inducing any acute toxicity, and only water and air are consumed during production. For more efficient water activation, plasma sources are commonly placed near or fully in contact with water as possible, but the risks of electrode corrosion and metal contamination of water threaten the safety and stability of PAW production. Herein, plasma-activated gas rich in high-valence NOx is generated by a hybrid plasma configuration and introduced into water for off-site PAW production. Plasma-generated O3 is found to dominate the gas-phase reactions for the formation of high-valence NOx. With the time-evolution of O3 concentration, gaseous NO3 radicals are produced behind N2O5 formation, but will be decomposed before N2O5 quenching. By decoupling the roles of gaseous NO3, N2O5, and O3 in the water activation, results show that short-lived aqueous species induced by gaseous NO3 radicals play the most crucial role in PAW sterilization, and the acidic environment induced by N2O5 is also essential. Moreover, SEM photographs and biomacromolecule leakage assays demonstrate that PAW disrupts the cell membranes of bacteria to achieve inactivation. In real-life applications, an integrated device for off-site PAW production with a yield of 2 L/h and a bactericidal efficiency of >99.9% is developed. The PAW of 50mL produced in 3 minutes using this device is more effective in disinfection than 0.5% NaClO and 3% H2O2 with the same bacterial contact time. This work provides new avenues for efficient PAW production and deepens insights into the fundamental processes that govern the reactive chemistry in PAW sterilization. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.00965 [pdf, other]

Measurement of the integrated luminosity of data samples collected during 2019-2022 by the Belle II experiment

Authors: The Belle II Collaboration, I. Adachi, L. Aggarwal, H. Ahmed, J. K. Ahn, H. Aihara, N. Akopov, A. Aloisio, N. Althubiti, N. Anh Ky, D. M. Asner, H. Atmacan, T. Aushev, V. Aushev, M. Aversano, R. Ayad, V. Babu, H. Bae, S. Bahinipati, P. Bambade, Sw. Banerjee, M. Barrett, J. Baudot, A. Baur, A. Beaubien , et al. (382 additional authors not shown)

Abstract: A series of data samples was collected with the Belle II detector at the SuperKEKB collider from March 2019 to June 2022. We determine the integrated luminosities of these data samples using three distinct methodologies involving Bhabha ($e^+e^- \to e^+e^-(nγ)$), digamma ($e^+e^- \to γγ(nγ)$), and dimuon ($e^+e^- \to μ^+ μ^- (nγ)$) events. The total integrated luminosity obtained with Bhabha, diga… ▽ More A series of data samples was collected with the Belle II detector at the SuperKEKB collider from March 2019 to June 2022. We determine the integrated luminosities of these data samples using three distinct methodologies involving Bhabha ($e^+e^- \to e^+e^-(nγ)$), digamma ($e^+e^- \to γγ(nγ)$), and dimuon ($e^+e^- \to μ^+ μ^- (nγ)$) events. The total integrated luminosity obtained with Bhabha, digamma, and dimuon events is (426.52 $\pm$ 0.03 $\pm$ 2.48)~fb$^{-1}$, (427.32 $\pm$ 0.03 $\pm$ 2.56)~fb$^{-1}$, and (424.84 $\pm$ 0.04 $\pm$ 3.88)~fb$^{-1}$, where the first uncertainties are statistical and the second are systematic. The resulting total integrated luminosity obtained from the combination of the three methods is (426.88 $\pm$ 1.93)~fb$^{-1}$. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 12 pages, 3 figures

Report number: Belle II Preprint 2024-019; KEK Preprint 2024-16

arXiv:2407.00929 [pdf, other]

High-sensitivity multichannel zero-to-ultralow field NMR with atomic magnetometer arrays

Authors: Blake Andrews, Matthew Lai, Zhen Wang, Norihisa Kato, Michael Tayler, Emanuel Druga, Ashok Ajoy

Abstract: Despite its versatility and high chemical specificity, conventional NMR spectroscopy is limited in measurement throughput due to the need for high-homogeneity magnetic fields, necessitating sequential sample analysis, and bulky devices. Here, we propose a multichannel NMR device that overcomes these limitations that leverages the zero-to-ultralow field (ZULF) regime, where simultaneous detection o… ▽ More Despite its versatility and high chemical specificity, conventional NMR spectroscopy is limited in measurement throughput due to the need for high-homogeneity magnetic fields, necessitating sequential sample analysis, and bulky devices. Here, we propose a multichannel NMR device that overcomes these limitations that leverages the zero-to-ultralow field (ZULF) regime, where simultaneous detection of multiple samples is carried out via an array of compact optically pumped magnetometers (OPMs). A magnetic field is used only for pre-polarization, permitting the use of large-bore, high-field, inhomogeneous magnets that can accommodate many samples concurrently. Through systematic advances, we demonstrate high-sensitivity, high resolution ZULF NMR spectroscopy with sensitivity comparable to benchtop NMR systems. The spectroscopy remains robust without the need for field shimming for periods on the order of weeks. We show the detection of ZULF NMR signals from organic molecules without isotopic enrichment, and demonstrate the parallelized detection of three distinct samples simultaneously as a proof-of-concept, with the potential to scale further to over 100 channels at a cost comparable to high-resolution liquid state NMR systems. This work sets the stage for using multichannel "NMR camera" devices for inline reaction monitoring, robotic chemistry, quality control, and high-throughput assays. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 6 pages, 5 figures

arXiv:2407.00922 [pdf]

Staying vigilant in the Age of AI: From content generation to content authentication

Authors: Yufan Li, Zhan Wang, Theo Papatheodorou

Abstract: This paper presents the Yangtze Sea project, an initiative in the battle against Generative AI (GAI)-generated fake con-tent. Addressing a pressing issue in the digital age, we investigate public reactions to AI-created fabrications through a structured experiment on a simulated academic conference platform. Our findings indicate a profound public challenge in discerning such content, highlighted… ▽ More This paper presents the Yangtze Sea project, an initiative in the battle against Generative AI (GAI)-generated fake con-tent. Addressing a pressing issue in the digital age, we investigate public reactions to AI-created fabrications through a structured experiment on a simulated academic conference platform. Our findings indicate a profound public challenge in discerning such content, highlighted by GAI's capacity for realistic fabrications. To counter this, we introduce an innovative approach employing large language models like ChatGPT for truthfulness assess-ment. We detail a specific workflow for scrutinizing the authenticity of everyday digital content, aimed at boosting public awareness and capability in identifying fake mate-rials. We apply this workflow to an agent bot on Telegram to help users identify the authenticity of text content through conversations. Our project encapsulates a two-pronged strategy: generating fake content to understand its dynamics and develo** assessment techniques to mitigate its impact. As part of that effort we propose the creation of speculative fact-checking wearables in the shape of reading glasses and a clip-on. As a computational media art initiative, this project under-scores the delicate interplay between technological progress, ethical consid-erations, and societal consciousness. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: ISEA 2024 full paper https://isea2024.isea-international.org/academic-program/ conference paper, 8 pages

arXiv:2407.00891 [pdf, other]

ZeroDDI: A Zero-Shot Drug-Drug Interaction Event Prediction Method with Semantic Enhanced Learning and Dual-Modal Uniform Alignment

Authors: Ziyan Wang, Zhankun Xiong, Feng Huang, Xuan Liu, Wen Zhang

Abstract: Drug-drug interactions (DDIs) can result in various pharmacological changes, which can be categorized into different classes known as DDI events (DDIEs). In recent years, previously unobserved/unseen DDIEs have been emerging, posing a new classification task when unseen classes have no labelled instances in the training stage, which is formulated as a zero-shot DDIE prediction (ZS-DDIE) task. Howe… ▽ More Drug-drug interactions (DDIs) can result in various pharmacological changes, which can be categorized into different classes known as DDI events (DDIEs). In recent years, previously unobserved/unseen DDIEs have been emerging, posing a new classification task when unseen classes have no labelled instances in the training stage, which is formulated as a zero-shot DDIE prediction (ZS-DDIE) task. However, existing computational methods are not directly applicable to ZS-DDIE, which has two primary challenges: obtaining suitable DDIE representations and handling the class imbalance issue. To overcome these challenges, we propose a novel method named ZeroDDI for the ZS-DDIE task. Specifically, we design a biological semantic enhanced DDIE representation learning module, which emphasizes the key biological semantics and distills discriminative molecular substructure-related semantics for DDIE representation learning. Furthermore, we propose a dual-modal uniform alignment strategy to distribute drug pair representations and DDIE semantic representations uniformly in a unit sphere and align the matched ones, which can mitigate the issue of class imbalance. Extensive experiments showed that ZeroDDI surpasses the baselines and indicate that it is a promising tool for detecting unseen DDIEs. Our code has been released in https://github.com/wzy-Sarah/ZeroDDI. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: Accepted by IJCAI2024

arXiv:2407.00840 [pdf, other]

MUSE-Net: Missingness-aware mUlti-branching Self-attention Encoder for Irregular Longitudinal Electronic Health Records

Authors: Zekai Wang, Tieming Liu, Bing Yao

Abstract: The era of big data has made vast amounts of clinical data readily available, particularly in the form of electronic health records (EHRs), which provides unprecedented opportunities for develo** data-driven diagnostic tools to enhance clinical decision making. However, the application of EHRs in data-driven modeling faces challenges such as irregularly spaced multi-variate time series, issues o… ▽ More The era of big data has made vast amounts of clinical data readily available, particularly in the form of electronic health records (EHRs), which provides unprecedented opportunities for develo** data-driven diagnostic tools to enhance clinical decision making. However, the application of EHRs in data-driven modeling faces challenges such as irregularly spaced multi-variate time series, issues of incompleteness, and data imbalance. Realizing the full data potential of EHRs hinges on the development of advanced analytical models. In this paper, we propose a novel Missingness-aware mUlti-branching Self-attention Encoder (MUSE-Net) to cope with the challenges in modeling longitudinal EHRs for data-driven disease prediction. The MUSE-Net leverages a multi-task Gaussian process (MGP) with missing value masks for data imputation, a multi-branching architecture to address the data imbalance problem, and a time-aware self-attention encoder to account for the irregularly spaced time interval in longitudinal EHRs. We evaluate the proposed MUSE-Net using both synthetic and real-world datasets. Experimental results show that our MUSE-Net outperforms existing methods that are widely used to investigate longitudinal signals. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2407.00754 [pdf, ps, other]

Gene Regulatory Network Inference with Covariance Dynamics

Authors: Yue Wang, Peng Zheng, Yu-Chen Cheng, Zikun Wang, Aleksandr Aravkin

Abstract: Determining gene regulatory network (GRN) structure is a central problem in biology, with a variety of inference methods available for different types of data. For a widely prevalent and challenging use case, namely single-cell gene expression data measured after intervention at multiple time points with unknown joint distributions, there is only one known specifically developed method, which does… ▽ More Determining gene regulatory network (GRN) structure is a central problem in biology, with a variety of inference methods available for different types of data. For a widely prevalent and challenging use case, namely single-cell gene expression data measured after intervention at multiple time points with unknown joint distributions, there is only one known specifically developed method, which does not fully utilize the rich information contained in this data type. We develop an inference method for the GRN in this case, netWork infErence by covariaNce DYnamics, dubbed WENDY. The core idea of WENDY is to model the dynamics of the covariance matrix, and solve this dynamics as an optimization problem to determine the regulatory relationships. To evaluate its effectiveness, we compare WENDY with other inference methods using synthetic data and experimental data. Our results demonstrate that WENDY performs well across different data sets. △ Less

Submitted 17 June, 2024; originally announced July 2024.

arXiv:2407.00681 [pdf, other]

Safe Reinforcement Learning for Power System Control: A Review

Authors: Peipei Yu, Zhenyi Wang, Hongcai Zhang, Yonghua Song

Abstract: The large-scale integration of intermittent renewable energy resources introduces increased uncertainty and volatility to the supply side of power systems, thereby complicating system operation and control. Recently, data-driven approaches, particularly reinforcement learning (RL), have shown significant promise in addressing complex control challenges in power systems, because RL can learn from i… ▽ More The large-scale integration of intermittent renewable energy resources introduces increased uncertainty and volatility to the supply side of power systems, thereby complicating system operation and control. Recently, data-driven approaches, particularly reinforcement learning (RL), have shown significant promise in addressing complex control challenges in power systems, because RL can learn from interactive feedback without needing prior knowledge of the system model. However, the training process of model-free RL methods relies heavily on random decisions for exploration, which may result in ``bad" decisions that violate critical safety constraints and lead to catastrophic control outcomes. Due to the inability of RL methods to theoretically ensure decision safety in power systems, directly deploying traditional RL algorithms in the real world is deemed unacceptable. Consequently, the safety issue in RL applications, known as safe RL, has garnered considerable attention in recent years, leading to numerous important developments. This paper provides a comprehensive review of the state-of-the-art safe RL techniques and discusses how these techniques can be applied to power system control problems such as frequency regulation, voltage control, and energy management. We then present discussions on key challenges and future research directions, related to convergence and optimality, training efficiency, universality, and real-world deployment. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2407.00678 [pdf, other]

A Review of Image Processing Methods in Prostate Ultrasound

Authors: Haiqiao Wang, Hong Wu, Zhuoyuan Wang, Peiyan Yue, Dong Ni, Pheng-Ann Heng, Yi Wang

Abstract: Prostate cancer (PCa) poses a significant threat to men's health, with early diagnosis being crucial for improving prognosis and reducing mortality rates. Transrectal ultrasound (TRUS) plays a vital role in the diagnosis and image-guided intervention of PCa.To facilitate physicians with more accurate and efficient computer-assisted diagnosis and interventions, many image processing algorithms in T… ▽ More Prostate cancer (PCa) poses a significant threat to men's health, with early diagnosis being crucial for improving prognosis and reducing mortality rates. Transrectal ultrasound (TRUS) plays a vital role in the diagnosis and image-guided intervention of PCa.To facilitate physicians with more accurate and efficient computer-assisted diagnosis and interventions, many image processing algorithms in TRUS have been proposed and achieved state-of-the-art performance in several tasks, including prostate gland segmentation, prostate image registration, PCa classification and detection, and interventional needle detection.The rapid development of these algorithms over the past two decades necessitates a comprehensive summary. In consequence, this survey provides a systematic analysis of this field, outlining the evolution of image processing methods in the context of TRUS image analysis and meanwhile highlighting their relevant contributions. Furthermore, this survey discusses current challenges and suggests future research directions to possibly advance this field further. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2407.00499 [pdf, other]

ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees

Authors: Zhiyuan Wang, **hao Duan, Lu Cheng, Yue Zhang, Qingni Wang, Hengtao Shen, Xiaofeng Zhu, Xiaoshuang Shi, Kaidi Xu

Abstract: Uncertainty quantification (UQ) in natural language generation (NLG) tasks remains an open challenge, exacerbated by the intricate nature of the recent large language models (LLMs). This study investigates adapting conformal prediction (CP), which can convert any heuristic measure of uncertainty into rigorous theoretical guarantees by constructing prediction sets, for black-box LLMs in open-ended… ▽ More Uncertainty quantification (UQ) in natural language generation (NLG) tasks remains an open challenge, exacerbated by the intricate nature of the recent large language models (LLMs). This study investigates adapting conformal prediction (CP), which can convert any heuristic measure of uncertainty into rigorous theoretical guarantees by constructing prediction sets, for black-box LLMs in open-ended NLG tasks. We propose a sampling-based uncertainty measure leveraging self-consistency and develop a conformal uncertainty criterion by integrating the uncertainty condition aligned with correctness into the design of the CP algorithm. Experimental results indicate that our uncertainty measure generally surpasses prior state-of-the-art methods. Furthermore, we calibrate the prediction sets within the model's unfixed answer distribution and achieve strict control over the correctness coverage rate across 6 LLMs on 4 free-form NLG datasets, spanning general-purpose and medical domains, while the small average set size further highlights the efficiency of our method in providing trustworthy guarantees for practical open-ended NLG applications. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: 13 pages, 9 figures, 6 tables

arXiv:2407.00371 [pdf, other]

Axiomatization of Gradient Smoothing in Neural Networks

Authors: Linjiang Zhou, Xiaochuan Shi, Chao Ma, Zepeng Wang

Abstract: Gradients play a pivotal role in neural networks explanation. The inherent high dimensionality and structural complexity of neural networks result in the original gradients containing a significant amount of noise. While several approaches were proposed to reduce noise with smoothing, there is little discussion of the rationale behind smoothing gradients in neural networks. In this work, we propos… ▽ More Gradients play a pivotal role in neural networks explanation. The inherent high dimensionality and structural complexity of neural networks result in the original gradients containing a significant amount of noise. While several approaches were proposed to reduce noise with smoothing, there is little discussion of the rationale behind smoothing gradients in neural networks. In this work, we proposed a gradient smooth theoretical framework for neural networks based on the function mollification and Monte Carlo integration. The framework intrinsically axiomatized gradient smoothing and reveals the rationale of existing methods. Furthermore, we provided an approach to design new smooth methods derived from the framework. By experimental measurement of several newly designed smooth methods, we demonstrated the research potential of our framework. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2407.00353 [pdf, ps, other]

New type of solutions for a critical Grushin-type problem with competing potentials

Authors: Wen**g Chen, Zexi Wang

Abstract: In this paper, we consider a critical Grushin-type problem with double potentials. By applying the reduction argument and local Pohouzaev identities, we construct a new family of solutions to this problem, which are concentrated at points lying on the top and the bottom circles of a cylinder. In this paper, we consider a critical Grushin-type problem with double potentials. By applying the reduction argument and local Pohouzaev identities, we construct a new family of solutions to this problem, which are concentrated at points lying on the top and the bottom circles of a cylinder. △ Less

Submitted 29 June, 2024; originally announced July 2024.

MSC Class: 35J15; 35B09; 35B33

arXiv:2407.00347 [pdf, ps, other]

Resource Allocation and Secure Wireless Communication in the Large Model-based Mobile Edge Computing System

Authors: Zefan Wang, Yitong Wang, Jun Zhao

Abstract: With the rapid advancement of large models and mobile edge computing, transfer learning, particularly through fine-tuning, has become crucial for adapting models to downstream tasks. Traditionally, this requires users to share their data with model owners for fine-tuning, which is not only costly but also raises significant privacy concerns. Furthermore, fine-tuning large-scale models is computati… ▽ More With the rapid advancement of large models and mobile edge computing, transfer learning, particularly through fine-tuning, has become crucial for adapting models to downstream tasks. Traditionally, this requires users to share their data with model owners for fine-tuning, which is not only costly but also raises significant privacy concerns. Furthermore, fine-tuning large-scale models is computationally intensive and often impractical for many users. To tackle these challenges, we introduce a system that combines offsite-tuning with physical-layer security, which provides local data owners with a lightweight adapter and a compressed emulator. Data owners then fine-tune the adapter locally and securely send it back to the model owners through a confidential channel for integration, ensuring privacy and resource conservation. Our paper focuses on optimizing computational resource allocation among data owners and the large model owner deployed on edge, and on the compression ratio of adapters. We incorporate a secrecy uplink channel to maximize the utility that we defined while minimizing system costs like energy consumption and delay. The optimization uses the Dinkelbach algorithm, fractional programming, successive convex approximation and alternating optimization. Experiments demonstrate our algorithm's superiority over existing methods. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2407.00322 [pdf]

LLM-Generated Natural Language Meets Scaling Laws: New Explorations and Data Augmentation Methods

Authors: Zhenhua Wang, Guang Xu, Ming Ren

Abstract: With the ascent of large language models (LLM), natural language processing has witnessed enhancements, such as LLM-based data augmentation. Nonetheless, prior research harbors two primary concerns: firstly, a lack of contemplation regarding whether the natural language generated by LLM (LLMNL) truly aligns with human natural language (HNL), a critical foundational question; secondly, an oversight… ▽ More With the ascent of large language models (LLM), natural language processing has witnessed enhancements, such as LLM-based data augmentation. Nonetheless, prior research harbors two primary concerns: firstly, a lack of contemplation regarding whether the natural language generated by LLM (LLMNL) truly aligns with human natural language (HNL), a critical foundational question; secondly, an oversight that augmented data is randomly generated by LLM, implying that not all data may possess equal training value, that could impede the performance of classifiers. To address these challenges, we introduce the scaling laws to intrinsically calculate LLMNL and HNL. Through extensive experiments, we reveal slight deviations (approximately 0.2 Mandelbrot exponent) from Mandelbrot's law in LLMNL, underscore a complexity advantage in HNL, and supplement an interpretive discussion on language style. This establishes a solid foundation for LLM's expansion. Further, we introduce a novel data augmentation method for few-shot text classification, termed ZGPTDA, which leverages fuzzy computing mechanisms driven by the conformity to scaling laws to make decisions about GPT-4 augmented data. Extensive experiments, conducted in real-world scenarios, confirms the effectiveness (improving F1 of Bert and RoBerta by 7-10%) and competitiveness (surpassing recent AugGPT and GENCO methods by about 2% accuracy on DeBerta) of ZGPTDA. In addition, we reveal some interesting insights, e.g., Hilberg's law and Taylor's law can impart more benefits to text classification, etc. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2407.00312 [pdf, other]

UDC: A Unified Neural Divide-and-Conquer Framework for Large-Scale Combinatorial Optimization Problems

Authors: Zhi Zheng, Changliang Zhou, Tong Xialiang, Mingxuan Yuan, Zhenkun Wang

Abstract: Single-stage neural combinatorial optimization solvers have achieved near-optimal results on various small-scale combinatorial optimization (CO) problems without needing expert knowledge. However, these solvers exhibit significant performance degradation when applied to large-scale CO problems. Recently, two-stage neural methods with divide-and-conquer strategies have shown superiorities in addres… ▽ More Single-stage neural combinatorial optimization solvers have achieved near-optimal results on various small-scale combinatorial optimization (CO) problems without needing expert knowledge. However, these solvers exhibit significant performance degradation when applied to large-scale CO problems. Recently, two-stage neural methods with divide-and-conquer strategies have shown superiorities in addressing large-scale CO problems. Nevertheless, the efficiency of these methods highly relies on problem-specific heuristics in either the divide or the conquer procedure, which limits their applicability to general CO problems. Moreover, these methods employ separate training schemes and ignore the interdependencies between the dividing and conquering strategies, which often leads to sub-optimal solutions. To tackle these drawbacks, this article develops a unified neural divide-and-conquer framework (i.e., UDC) for solving general large-scale CO problems. UDC offers a Divide-Conquer-Reunion (DCR) training method to eliminate the negative impact of a sub-optimal dividing policy. Employing a high-efficiency Graph Neural Network (GNN) for global dividing and a fixed-length sub-path solver for conquering sub-problems, the proposed UDC framework demonstrates extensive applicability, achieving superior performance in 10 representative large-scale CO problems. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2407.00136 [pdf, other]

Observation of the Electromagnetic Dalitz Transition $h_c \rightarrow e^+e^-η_c$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, S. Ahmed, M. Albrecht, R. Aliberti, A. Amoroso, M. R. An, Q. An, X. H. Bai, Y. Bai, O. Bakina, R. Baldini Ferroli, I. Balossino, Y. Ban, K. Begzsuren, N. Berger, M. Bertani, D. Bettoni, F. Bianchi, J. Bloms, A. Bortone, I. Boyko, R. A. Briere , et al. (495 additional authors not shown)

Abstract: Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions… ▽ More Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions $\frac{\mathcal{B}(h_c\rightarrow e^+e^-η_c)}{\mathcal{B}(h_c\rightarrow γη_c)}$ separately for the $h_c$ samples produced via $ψ(3686)\toπ^0h_c$ and $e^+e^-\toπ^+π^-h_c$. The average ratio is determined to be $(0.59\pm0.10(\text{stat.})\pm0.04(\text{syst.}))\%$, where the uncertainty includes both statistical and systematic components. △ Less

Submitted 28 June, 2024; originally announced July 2024.

arXiv:2407.00114 [pdf, other]

OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents

Authors: Zihao Wang, Shaofei Cai, Zhancun Mu, Haowei Lin, Ceyao Zhang, Xuejie Liu, Qing Li, Anji Liu, Xiaojian Ma, Yitao Liang

Abstract: We present OmniJARVIS, a novel Vision-Language-Action (VLA) model for open-world instruction-following agents in open-world Minecraft. Compared to prior works that either emit textual goals to separate controllers or produce the control command directly, OmniJARVIS seeks a different path to ensure both strong reasoning and efficient decision-making capabilities via unified tokenization of multimod… ▽ More We present OmniJARVIS, a novel Vision-Language-Action (VLA) model for open-world instruction-following agents in open-world Minecraft. Compared to prior works that either emit textual goals to separate controllers or produce the control command directly, OmniJARVIS seeks a different path to ensure both strong reasoning and efficient decision-making capabilities via unified tokenization of multimodal interaction data. First, we introduce a self-supervised approach to learn a behavior encoder that produces discretized tokens for behavior trajectories $τ$ = {$o_0$, $a_0$, $\dots$} and an imitation learning (IL) policy decoder conditioned on these tokens. These additional behavior tokens will be augmented to the vocabulary of pretrained Multimodal Language Models (MLMs). With this encoder, we then pack long-term multimodal interactions involving task instructions, memories, thoughts, observations, textual responses, behavior trajectories, etc. into unified token sequences and model them with autoregressive transformers. Thanks to the semantically meaningful behavior tokens, the resulting VLA model, OmniJARVIS, can reason (by producing chain-of-thoughts), plan, answer questions, and act (by producing behavior tokens for the IL policy decoder). OmniJARVIS demonstrates excellent performances on a comprehensive collection of atomic, programmatic, and open-ended tasks in open-world Minecraft. Our analysis further unveils the crucial design principles in interaction data formation, unified tokenization, and its scaling potentials. △ Less

Submitted 27 June, 2024; originally announced July 2024.

arXiv:2407.00024 [pdf, other]

LMVD: A Large-Scale Multimodal Vlog Dataset for Depression Detection in the Wild

Authors: Lang He, Kai Chen, Junnan Zhao, Yimeng Wang, Ercheng Pei, Haifeng Chen, Jiewei Jiang, Shiqing Zhang, Jie Zhang, Zhongmin Wang, Tao He, Prayag Tiwari

Abstract: Depression can significantly impact many aspects of an individual's life, including their personal and social functioning, academic and work performance, and overall quality of life. Many researchers within the field of affective computing are adopting deep learning technology to explore potential patterns related to the detection of depression. However, because of subjects' privacy protection con… ▽ More Depression can significantly impact many aspects of an individual's life, including their personal and social functioning, academic and work performance, and overall quality of life. Many researchers within the field of affective computing are adopting deep learning technology to explore potential patterns related to the detection of depression. However, because of subjects' privacy protection concerns, that data in this area is still scarce, presenting a challenge for the deep discriminative models used in detecting depression. To navigate these obstacles, a large-scale multimodal vlog dataset (LMVD), for depression recognition in the wild is built. In LMVD, which has 1823 samples with 214 hours of the 1475 participants captured from four multimedia platforms (Sina Weibo, Bilibili, Tiktok, and YouTube). A novel architecture termed MDDformer to learn the non-verbal behaviors of individuals is proposed. Extensive validations are performed on the LMVD dataset, demonstrating superior performance for depression detection. We anticipate that the LMVD will contribute a valuable function to the depression detection community. The data and code will released at the link: https://github.com/helang818/LMVD/. △ Less

Submitted 8 May, 2024; originally announced July 2024.

arXiv:2407.00014 [pdf]

Simplifying Kinematic Parameter Estimation in sEMG Prosthetic Hands: A Two-Point Approach

Authors: Gang Liu, Zhenxiang Wang, Ziyang He, Shanshan Guo, Rui Zhang, Dezhong Yao

Abstract: Regression-based sEMG prosthetic hands are widely used for their ability to provide continuous kinematic parameters. However, establishing these models traditionally requires complex kinematic sensor systems to collect corresponding kinematic data in synchronization with EMG, which is cumbersome and user-unfriendly. This paper presents a simplified approach utilizing only two data points to depict… ▽ More Regression-based sEMG prosthetic hands are widely used for their ability to provide continuous kinematic parameters. However, establishing these models traditionally requires complex kinematic sensor systems to collect corresponding kinematic data in synchronization with EMG, which is cumbersome and user-unfriendly. This paper presents a simplified approach utilizing only two data points to depict kinematic parameters. Finger flexion is recorded as 1, extension as -1, and a near-linear model is employed to interpolate intermediate values, offering a viable alternative for kinematic data. We validated the approach with twenty participants through offline analysis and online experiments. The offline analysis confirmed the model's capability to fill in intermediate points and the online experiments demonstrated that participants could control gestures, adjust force accurately. This study significantly reduces the complexity of collecting dynamic parameters in EMG-based regression prosthetics, thus enhancing usability for prosthetic hands. △ Less

Submitted 1 May, 2024; originally announced July 2024.

Comments: 13 pages

arXiv:2406.20092 [pdf, other]

LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context Compression

Authors: Jieneng Chen, Luoxin Ye, Ju He, Zhao-Yang Wang, Daniel Khashabi, Alan Yuille

Abstract: While significant advancements have been made in compressed representations for text embeddings in large language models (LLMs), the compression of visual tokens in large multi-modal models (LMMs) has remained a largely overlooked area. In this work, we present the study on the analysis of redundancy concerning visual tokens and efficient training within these models. Our initial experiments show… ▽ More While significant advancements have been made in compressed representations for text embeddings in large language models (LLMs), the compression of visual tokens in large multi-modal models (LMMs) has remained a largely overlooked area. In this work, we present the study on the analysis of redundancy concerning visual tokens and efficient training within these models. Our initial experiments show that eliminating up to 70% of visual tokens at the testing stage by simply average pooling only leads to a minimal 3% reduction in visual question answering accuracy on the GQA benchmark, indicating significant redundancy in visual context. Addressing this, we introduce Visual Context Compressor, which reduces the number of visual tokens during training to enhance training efficiency without sacrificing performance. To minimize information loss caused by the compression on visual tokens while maintaining training efficiency, we develop LLaVolta as a lite training scheme. LLaVolta incorporates stage-wise visual context compression to progressively compress the visual tokens from heavily to lightly, and finally no compression at the end of training, yielding no loss of information when testing. Extensive experiments demonstrate that our approach enhances the performance of MLLMs in both image-language and video-language understanding, while also significantly cutting training costs. Code is available at https://github.com/Beckschen/LLaVolta △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: Code is available at https://github.com/Beckschen/LLaVolta

arXiv:2406.19769 [pdf, other]

Decision Transformer for IRS-Assisted Systems with Diffusion-Driven Generative Channels

Authors: Jie Zhang, Jun Li, Zhe Wang, Yu Han, Long Shi, Bin Cao

Abstract: In this paper, we propose a novel diffusion-decision transformer (D2T) architecture to optimize the beamforming strategies for intelligent reflecting surface (IRS)-assisted multiple-input single-output (MISO) communication systems. The first challenge lies in the expensive computation cost to recover the real-time channel state information (CSI) from the received pilot signals, which usually requi… ▽ More In this paper, we propose a novel diffusion-decision transformer (D2T) architecture to optimize the beamforming strategies for intelligent reflecting surface (IRS)-assisted multiple-input single-output (MISO) communication systems. The first challenge lies in the expensive computation cost to recover the real-time channel state information (CSI) from the received pilot signals, which usually requires prior knowledge of the channel distributions. To reduce the channel estimation complexity, we adopt a diffusion model to automatically learn the map** between the received pilot signals and channel matrices in a model-free manner. The second challenge is that, the traditional optimization or reinforcement learning (RL) algorithms cannot guarantee the optimality of the beamforming policies once the channel distribution changes, and it is costly to resolve the optimized strategies. To enhance the generality of the decision models over varying channel distributions, we propose an offline pre-training and online fine-tuning decision transformer (DT) framework, wherein we first pre-train the DT offline with the data samples collected by the RL algorithms under diverse channel distributions, and then fine-tune the DT online with few-shot samples under a new channel distribution for a generalization purpose. Simulation results demonstrate that, compared with retraining RL algorithms, the proposed D2T algorithm boosts the convergence speed by 3 times with only a few samples from the new channel distribution while enhancing the average user data rate by 6%. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.19381 [pdf, other]

Spontaneous symmetry breaking in open quantum systems: strong, weak, and strong-to-weak

Authors: Ding Gu, Zijian Wang, Zhong Wang

Abstract: Depending on the coupling to the environment, symmetries of open quantum systems manifest in two distinct forms, the strong and the weak. We study the spontaneous symmetry breaking among phases with different symmetries. Concrete Liouvillian models with strong and weak symmetry are constructed, and different scenarios of symmetry-breaking transitions are investigated from complementary approaches.… ▽ More Depending on the coupling to the environment, symmetries of open quantum systems manifest in two distinct forms, the strong and the weak. We study the spontaneous symmetry breaking among phases with different symmetries. Concrete Liouvillian models with strong and weak symmetry are constructed, and different scenarios of symmetry-breaking transitions are investigated from complementary approaches. It is demonstrated that strong symmetry always spontaneously breaks into the corresponding weak symmetry. For strong $U(1)$ symmetry, we show that strong-to-weak symmetry breaking leads to gapless Goldstone modes dictating diffusion of the symmetry charge in translational invariant systems. We conjecture that this relation among strong-to-weak symmetry breaking, gapless modes, and symmetry-charge diffusion is general for continuous symmetries. It can be interpreted as an "enhanced Lieb-Schultz-Mattis (LSM) theorem" for open quantum systems, according to which the gapless spectrum does not require non-integer filling. We also investigate the scenario where the strong symmetry breaks completely. In the symmetry-broken phase, we identify an effective Keldysh action with two Goldstone modes, describing fluctuations of the order parameter and diffusive hydrodynamics of the symmetry charge, respectively. For a particular model studied here, we uncover a transition from a symmetric phase with a "Bose surface" to a symmetry-broken phase with long-range order induced by tuning the filling. It is also shown that the long-range order of $U(1)$ symmetry breaking is possible in spatial dimension $d\geq 3$, in both weak and strong symmetry cases. Our work outline the typical scenarios of spontaneous symmetry breaking in open quantum systems, and highlights their physical consequences. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 15 pages, 3 figures

arXiv:2406.19190 [pdf, ps, other]

Improved measurement of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (643 additional authors not shown)

Abstract: Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential dec… ▽ More Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential decay rate of $D^+_s\to K^0 e^+ν_e$ to be $f^{K^0}_+(0)=0.636\pm0.049\pm0.013$. For both measurements, the first uncertainty is statistical and the second systematic. The branching fraction and form factor measurements are factors of 1.6 and 1.7 more precise than the previous world averages, respectively. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 13 pages, 6 figures

arXiv:2406.19043 [pdf]

CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI

Authors: Zi Wang, Fanwen Wang, Chen Qin, Jun Lyu, Ouyang Cheng, Shuo Wang, Yan Li, Mengyao Yu, Haoyu Zhang, Kunyuan Guo, Zhang Shi, Qirong Li, Ziqiang Xu, Ya**g Zhang, Hao Li, Sha Hua, Binghua Chen, Longyu Sun, Mengting Sun, Qin Li, Ying-Hua Chu, Wenjia Bai, **g Qin, Xiahai Zhuang, Claudia Prieto , et al. (7 additional authors not shown)

Abstract: Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover h… ▽ More Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover high-quality, clinically interpretable images from undersampled measurements. However, the lack of publicly available cardiac MRI k-space dataset in terms of both quantity and diversity has severely hindered substantial technological progress, particularly for data-driven artificial intelligence. Here, we provide a standardized, diverse, and high-quality CMRxRecon2024 dataset to facilitate the technical development, fair evaluation, and clinical transfer of cardiac MRI reconstruction approaches, towards promoting the universal frameworks that enable fast and robust reconstructions across different cardiac MRI protocols in clinical practice. To the best of our knowledge, the CMRxRecon2024 dataset is the largest and most diverse publicly available cardiac k-space dataset. It is acquired from 330 healthy volunteers, covering commonly used modalities, anatomical views, and acquisition trajectories in clinical cardiac MRI workflows. Besides, an open platform with tutorials, benchmarks, and data processing tools is provided to facilitate data usage, advanced method development, and fair performance evaluation. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 19 pages, 3 figures, 2 tables

arXiv:2406.19013 [pdf, other]

Design and Implementation of a Scalable Correlator Based on ROACH2+GPU Cluster for Tianlai 96-Dual-Polarization Antenna Array

Authors: Zhao Wang, Ji-Xia Li, Ke Zhang, 1 Feng-Quan Wu, Hai-Jun Tian, Chen-Hui Niu, Ju-Yong Zhang, Zhi-** Chen, Dong-** Yu, Xue-Lei Chen

Abstract: The digital correlator is one of the most crucial data processing components of a radio telescope array. With the scale of radio interferometeric array growing, many efforts have been devoted to develo** a cost-effective and scalable correlator in the field of radio astronomy. In this paper, a 192-input digital correlator with six CASPER ROACH2 boards and seven GPU servers has been deployed as t… ▽ More The digital correlator is one of the most crucial data processing components of a radio telescope array. With the scale of radio interferometeric array growing, many efforts have been devoted to develo** a cost-effective and scalable correlator in the field of radio astronomy. In this paper, a 192-input digital correlator with six CASPER ROACH2 boards and seven GPU servers has been deployed as the digital signal processing system for Tianlai cylinder pathfinder located in Hongliuxia observatory. The correlator consists of 192 input signals (96 dual-polarization), 125-MHz bandwidth, and full-Stokes output. The correlator inherits the advantages of the CASPER system, for example, low cost, high performance, modular scalability, and a heterogeneous computing architecture. With a rapidly deployable ROACH2 digital sampling system, a commercially expandable 10 Gigabit switching network system, and a flexible upgradable GPU computing system, the correlator forms a low-cost and easily-upgradable system, poised to support scalable large-scale interferometeric array in the future. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 12 pages, 13 figures, accepted for publication in Frontiers in Astronomy and Space Sciences, section Astronomical Instrumentation

arXiv:2406.18964 [pdf, other]

DNLSAT: A Dynamic Variable Ordering MCSAT Framework for Nonlinear Real Arithmetic

Authors: Zhonghan Wang

Abstract: Satisfiability modulo nonlinear real arithmetic theory (SMT(NRA)) solving is essential to multiple applications, including program verification, program synthesis and software testing. In this context, recently model constructing satisfiability calculus (MCSAT) has been invented to directly search for models in the theory space. Although following papers discussed practical directions and updates… ▽ More Satisfiability modulo nonlinear real arithmetic theory (SMT(NRA)) solving is essential to multiple applications, including program verification, program synthesis and software testing. In this context, recently model constructing satisfiability calculus (MCSAT) has been invented to directly search for models in the theory space. Although following papers discussed practical directions and updates on MCSAT, less attention has been paid to the detailed implementation. In this paper, we present an efficient implementation of dynamic variable orderings of MCSAT, called dnlsat. We show carefully designed data structures and promising mechanisms, such as branching heuristic, restart, and lemma management. Besides, we also give a theoretical study of potential influences brought by the dynamic variablr ordering. The experimental evaluation shows that dnlsat accelerates the solving speed and solves more satisfiable instances than other state-of-the-art SMT solvers. Demonstration Video: https://youtu.be/T2Z0gZQjnPw Code: https://github.com/yogurt-shadow/dnlsat/tree/master/code Benchmark https://zenodo.org/records/10607722/files/QF_NRA.tar.zst?download=1 △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.18849 [pdf, other]

Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs

Authors: Jie Zhang, Zhongqi Wang, Mengqi Lei, Zheng Yuan, Bei Yan, Shiguang Shan, Xilin Chen

Abstract: Currently many benchmarks have been proposed to evaluate the perception ability of the Large Vision-Language Models (LVLMs). However, most benchmarks conduct questions by selecting images from existing datasets, resulting in the potential data leakage. Besides, these benchmarks merely focus on evaluating LVLMs on the realistic style images and clean scenarios, leaving the multi-stylized images and… ▽ More Currently many benchmarks have been proposed to evaluate the perception ability of the Large Vision-Language Models (LVLMs). However, most benchmarks conduct questions by selecting images from existing datasets, resulting in the potential data leakage. Besides, these benchmarks merely focus on evaluating LVLMs on the realistic style images and clean scenarios, leaving the multi-stylized images and noisy scenarios unexplored. In response to these challenges, we propose a dynamic and scalable benchmark named Dysca for evaluating LVLMs by leveraging synthesis images. Specifically, we leverage Stable Diffusion and design a rule-based method to dynamically generate novel images, questions and the corresponding answers. We consider 51 kinds of image styles and evaluate the perception capability in 20 subtasks. Moreover, we conduct evaluations under 4 scenarios (i.e., Clean, Corruption, Print Attacking and Adversarial Attacking) and 3 question types (i.e., Multi-choices, True-or-false and Free-form). Thanks to the generative paradigm, Dysca serves as a scalable benchmark for easily adding new subtasks and scenarios. A total of 8 advanced open-source LVLMs with 10 checkpoints are evaluated on Dysca, revealing the drawbacks of current LVLMs. The benchmark is released in \url{https://github.com/Benchmark-Dysca/Dysca}. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.18744 [pdf, other]

Quantum Resources Required for Binding Affinity Calculations of Amyloid beta

Authors: Matthew Otten, Thomas W. Watts, Samuel D. Johnson, Rashmi Sundareswara, Zhihui Wang, Tarini S. Hardikar, Kenneth Heitritter, James Brown, Kanav Setia, Adam Holmes

Abstract: Amyloid beta, an intrinsically disordered protein, plays a seemingly important but not well-understood role in neurodegenerative diseases like Alzheimer's disease. A key feature of amyloid beta, which could lead to potential therapeutic intervention pathways, is its binding affinity to certain metal centers, like iron and copper. Numerically calculating such binding affinities is a computationally… ▽ More Amyloid beta, an intrinsically disordered protein, plays a seemingly important but not well-understood role in neurodegenerative diseases like Alzheimer's disease. A key feature of amyloid beta, which could lead to potential therapeutic intervention pathways, is its binding affinity to certain metal centers, like iron and copper. Numerically calculating such binding affinities is a computationally challenging task, involving strongly correlated metal centers. A key bottleneck in understanding the binding affinity is obtaining estimates of the ground state energy. Quantum computers have the potential to accelerate such calculations but it is important to understand the quantum resources required. In this work, we detail a computational workflow for binding affinity calculations for amyloid beta utilizing quantum algorithms, providing estimated quantum resources required, at both the logical and hardware level. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.18676 [pdf, other]

Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation

Authors: Guanting Dong, Yutao Zhu, Chenghao Zhang, Zechen Wang, Zhicheng Dou, Ji-Rong Wen

Abstract: Retrieval-augmented generation (RAG) has demonstrated effectiveness in mitigating the hallucination problem of large language models (LLMs). However, the difficulty of aligning the retriever with the diverse LLMs' knowledge preferences inevitably poses an inevitable challenge in develo** a reliable RAG system. To address this issue, we propose DPA-RAG, a universal framework designed to align div… ▽ More Retrieval-augmented generation (RAG) has demonstrated effectiveness in mitigating the hallucination problem of large language models (LLMs). However, the difficulty of aligning the retriever with the diverse LLMs' knowledge preferences inevitably poses an inevitable challenge in develo** a reliable RAG system. To address this issue, we propose DPA-RAG, a universal framework designed to align diverse knowledge preferences within RAG systems. Specifically, we initially introduce a preference knowledge construction pipline and incorporate five novel query augmentation strategies to alleviate preference data scarcity. Based on preference data, DPA-RAG accomplishes both external and internal preference alignment: 1) It jointly integrate pair-wise, point-wise, and contrastive preference alignment abilities into the reranker, achieving external preference alignment among RAG components. 2) It further introduces a pre-aligned stage before vanilla Supervised Fine-tuning (SFT), enabling LLMs to implicitly capture knowledge aligned with their reasoning preferences, achieving LLMs' internal alignment. Experimental results across four knowledge-intensive QA datasets demonstrate that DPA-RAG outperforms all baselines and seamlessly integrates both black-box and open-sourced LLM readers. Further qualitative analysis and discussions also provide empirical guidance for achieving reliable RAG systems. Our code is publicly available at https://github.com/dongguanting/DPA-RAG. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: Work in progress

arXiv:2406.18605 [pdf, other]

The neutron array of the compact spectrometer for heavy ion experiments in Fermi energy region

Authors: Dawei Si, Sheng Xiao, Yuhao Qin, Yijie Wang, Junhuai Xu, Baiting Tian, Boyuan Zhang, Dong Guo, Qin Zhi, Xiaobao Wei, Yibo Hao, Zengxiang Wang, Tianren Zhuo, Yuansheng Yang, Xianglun Wei, Herun Yang, Peng Ma, Limin Duan, Fangfang Duan, Junbing Ma, Shiwei Xu, Zhen Bai, Guo Yang, Yanyun Yang, Zhigang Xiao

Abstract: The emission of neutrons from heavy ion reactions is an important observable for studying the asymmetric nuclear equation of state and the reaction dynamics. A 20-unit neutron array has been developed and mounted on the compact spectrometer for heavy ion experiments (CSHINE) to measure the neutron spectra, neutron-neutron and neutron-proton correlation functions. Each unit consists of a… ▽ More The emission of neutrons from heavy ion reactions is an important observable for studying the asymmetric nuclear equation of state and the reaction dynamics. A 20-unit neutron array has been developed and mounted on the compact spectrometer for heavy ion experiments (CSHINE) to measure the neutron spectra, neutron-neutron and neutron-proton correlation functions. Each unit consists of a $\rm 15\times 15\times 15~cm^3$ plastic scintillator coupled to a $ φ=52 ~\rm mm$ photomultiplier. The Geant4 simulation with optical process is performed to investigate the time resolution and the neutron detection efficiency. The inherent time resolution of 212 ps is obtained by cosmic ray coincidence test. The n-$γ$ discrimination and time-of-flight performance are given by $\rm ^{252}Cf$ radioactive source test and beam test. The neutron energy spectra have been obtained in the angle range $30^\circ \le θ_{\rm lab} \le 51^\circ$ in the beam experiment of $^{124}$Sn+$^{124}$Sn at 25 MeV/u with CSHINE. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 8 pages, 11 figures

arXiv:2406.18583 [pdf, other]

Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT

Authors: Le Zhuo, Ruoyi Du, Han Xiao, Yangguang Li, Dongyang Liu, Rongjie Huang, Wenze Liu, Lirui Zhao, Fu-Yun Wang, Zhanyu Ma, Xu Luo, Zehan Wang, Kaipeng Zhang, Xiangyang Zhu, Si Liu, Xiangyu Yue, Dingning Liu, Wanli Ouyang, Ziwei Liu, Yu Qiao, Hongsheng Li, Peng Gao

Abstract: Lumina-T2X is a nascent family of Flow-based Large Diffusion Transformers that establishes a unified framework for transforming noise into various modalities, such as images and videos, conditioned on text instructions. Despite its promising capabilities, Lumina-T2X still encounters challenges including training instability, slow inference, and extrapolation artifacts. In this paper, we present Lu… ▽ More Lumina-T2X is a nascent family of Flow-based Large Diffusion Transformers that establishes a unified framework for transforming noise into various modalities, such as images and videos, conditioned on text instructions. Despite its promising capabilities, Lumina-T2X still encounters challenges including training instability, slow inference, and extrapolation artifacts. In this paper, we present Lumina-Next, an improved version of Lumina-T2X, showcasing stronger generation performance with increased training and inference efficiency. We begin with a comprehensive analysis of the Flag-DiT architecture and identify several suboptimal components, which we address by introducing the Next-DiT architecture with 3D RoPE and sandwich normalizations. To enable better resolution extrapolation, we thoroughly compare different context extrapolation methods applied to text-to-image generation with 3D RoPE, and propose Frequency- and Time-Aware Scaled RoPE tailored for diffusion transformers. Additionally, we introduced a sigmoid time discretization schedule to reduce sampling steps in solving the Flow ODE and the Context Drop method to merge redundant visual tokens for faster network evaluation, effectively boosting the overall sampling speed. Thanks to these improvements, Lumina-Next not only improves the quality and efficiency of basic text-to-image generation but also demonstrates superior resolution extrapolation capabilities and multilingual generation using decoder-based LLMs as the text encoder, all in a zero-shot manner. To further validate Lumina-Next as a versatile generative framework, we instantiate it on diverse tasks including visual recognition, multi-view, audio, music, and point cloud generation, showcasing strong performance across these domains. By releasing all codes and model weights, we aim to advance the development of next-generation generative AI capable of universal modeling. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: Code at: https://github.com/Alpha-VLLM/Lumina-T2X

arXiv:2406.18521 [pdf, other]

CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs

Authors: Zirui Wang, Mengzhou Xia, Luxi He, Howard Chen, Yitao Liu, Richard Zhu, Kaiqu Liang, Xindi Wu, Haotian Liu, Sadhika Malladi, Alexis Chevalier, Sanjeev Arora, Danqi Chen

Abstract: Chart understanding plays a pivotal role when applying Multimodal Large Language Models (MLLMs) to real-world tasks such as analyzing scientific papers or financial reports. However, existing datasets often focus on oversimplified and homogeneous charts with template-based questions, leading to an over-optimistic measure of progress. We demonstrate that although open-source models can appear to ou… ▽ More Chart understanding plays a pivotal role when applying Multimodal Large Language Models (MLLMs) to real-world tasks such as analyzing scientific papers or financial reports. However, existing datasets often focus on oversimplified and homogeneous charts with template-based questions, leading to an over-optimistic measure of progress. We demonstrate that although open-source models can appear to outperform strong proprietary models on these benchmarks, a simple stress test with slightly different charts or questions can deteriorate performance by up to 34.5%. In this work, we propose CharXiv, a comprehensive evaluation suite involving 2,323 natural, challenging, and diverse charts from arXiv papers. CharXiv includes two types of questions: 1) descriptive questions about examining basic chart elements and 2) reasoning questions that require synthesizing information across complex visual elements in the chart. To ensure quality, all charts and questions are handpicked, curated, and verified by human experts. Our results reveal a substantial, previously underestimated gap between the reasoning skills of the strongest proprietary model (i.e., GPT-4o), which achieves 47.1% accuracy, and the strongest open-source model (i.e., InternVL Chat V1.5), which achieves 29.2%. All models lag far behind human performance of 80.5%, underscoring weaknesses in the chart understanding capabilities of existing MLLMs. We hope CharXiv facilitates future research on MLLM chart understanding by providing a more realistic and faithful measure of progress. Project page and leaderboard: https://charxiv.github.io/ △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: 121 pages, 90 figures

arXiv:2406.18517 [pdf, other]

Generalized Concentratable Entanglement via Parallelized Permutation Tests

Authors: Xiaoyu Liu, Johannes Knörzer, Zherui Jerry Wang, Jordi Tura

Abstract: Multipartite entanglement is an essential resource for quantum information theory and technologies, but its quantification has been a persistent challenge. Recently, Concentratable Entanglement (CE) has been introduced as a promising candidate for a multipartite entanglement measure, which can be efficiently estimated across two state copies. In this work, we introduce Generalized Concentratable E… ▽ More Multipartite entanglement is an essential resource for quantum information theory and technologies, but its quantification has been a persistent challenge. Recently, Concentratable Entanglement (CE) has been introduced as a promising candidate for a multipartite entanglement measure, which can be efficiently estimated across two state copies. In this work, we introduce Generalized Concentratable Entanglement (GCE) measures, highlight a natural correspondence to quantum Tsallis entropies, and conjecture a new entropic inequality that may be of independent interest. We show how to efficiently measure the GCE in a quantum computer, using parallelized permutation tests across a prime number of state copies. We exemplify the practicality of such computation for probabilistic entanglement concentration into W states with three state copies. Moreover, we show that an increased number of state copies provides an improved error bound on this family of multipartite entanglement measures in the presence of imperfections. Finally, we prove that GCE is still a well-defined entanglement monotone as its value, on average, does not increase under local operations and classical communication (LOCC). △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: 5 pages + 17 pages appendix; 4 + 2 figures; 1 table

arXiv:2406.18516 [pdf, other]

Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration

Authors: Kang Liao, Zongsheng Yue, Zhouxia Wang, Chen Change Loy

Abstract: Although deep learning-based image restoration methods have made significant progress, they still struggle with limited generalization to real-world scenarios due to the substantial domain gap caused by training on synthetic data. Existing methods address this issue by improving data synthesis pipelines, estimating degradation kernels, employing deep internal learning, and performing domain adapta… ▽ More Although deep learning-based image restoration methods have made significant progress, they still struggle with limited generalization to real-world scenarios due to the substantial domain gap caused by training on synthetic data. Existing methods address this issue by improving data synthesis pipelines, estimating degradation kernels, employing deep internal learning, and performing domain adaptation and regularization. Previous domain adaptation methods have sought to bridge the domain gap by learning domain-invariant knowledge in either feature or pixel space. However, these techniques often struggle to extend to low-level vision tasks within a stable and compact framework. In this paper, we show that it is possible to perform domain adaptation via the noise-space using diffusion models. In particular, by leveraging the unique property of how the multi-step denoising process is influenced by auxiliary conditional inputs, we obtain meaningful gradients from noise prediction to gradually align the restored results of both synthetic and real-world data to a common clean distribution. We refer to this method as denoising as adaptation. To prevent shortcuts during training, we present useful techniques such as channel shuffling and residual-swap** contrastive learning. Experimental results on three classical image restoration tasks, namely denoising, deblurring, and deraining, demonstrate the effectiveness of the proposed method. Code will be released at: https://github.com/KangLiao929/Noise-DA/. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: Github Repository: https://github.com/KangLiao929/Noise-DA/

arXiv:2406.18227 [pdf, other]

GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension

Authors: Jiafeng Liang, Shixin Jiang, Zekun Wang, Haojie Pan, Zerui Chen, Zheng Chu, Ming Liu, Ruiji Fu, Zhongyuan Wang, Bing Qin

Abstract: There are substantial instructional videos on the Internet, which provide us tutorials for completing various tasks. Existing instructional video datasets only focus on specific steps at the video level, lacking experiential guidelines at the task level, which can lead to beginners struggling to learn new tasks due to the lack of relevant experience. Moreover, the specific steps without guidelines… ▽ More There are substantial instructional videos on the Internet, which provide us tutorials for completing various tasks. Existing instructional video datasets only focus on specific steps at the video level, lacking experiential guidelines at the task level, which can lead to beginners struggling to learn new tasks due to the lack of relevant experience. Moreover, the specific steps without guidelines are trivial and unsystematic, making it difficult to provide a clear tutorial. To address these problems, we present the GUIDE (Guideline-Guided) dataset, which contains 3.5K videos of 560 instructional tasks in 8 domains related to our daily life. Specifically, we annotate each instructional task with a guideline, representing a common pattern shared by all task-related videos. On this basis, we annotate systematic specific steps, including their associated guideline steps, specific step descriptions and timestamps. Our proposed benchmark consists of three sub-tasks to evaluate comprehension ability of models: (1) Step Captioning: models have to generate captions for specific steps from videos. (2) Guideline Summarization: models have to mine the common pattern in task-related videos and summarize a guideline from them. (3) Guideline-Guided Captioning: models have to generate captions for specific steps under the guide of guideline. We evaluate plenty of foundation models with GUIDE and perform in-depth analysis. Given the diversity and practicality of GUIDE, we believe that it can be used as a better benchmark for instructional video comprehension. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: IJCAI 2024

arXiv:2406.18219 [pdf, other]

A Closer Look into Mixture-of-Experts in Large Language Models

Authors: Ka Man Lo, Zeyu Huang, Zihan Qiu, Zili Wang, Jie Fu

Abstract: Mixture-of-experts (MoE) is gaining increasing attention due to its unique properties and remarkable performance, especially for language tasks. By sparsely activating a subset of parameters for each token, MoE architecture could increase the model size without sacrificing computational efficiency, achieving a better trade-off between performance and training costs. However, the underlying mechani… ▽ More Mixture-of-experts (MoE) is gaining increasing attention due to its unique properties and remarkable performance, especially for language tasks. By sparsely activating a subset of parameters for each token, MoE architecture could increase the model size without sacrificing computational efficiency, achieving a better trade-off between performance and training costs. However, the underlying mechanism of MoE still lacks further exploration, and its modularization degree remains questionable. In this paper, we make an initial attempt to understand the inner workings of MoE-based large language models. Concretely, we comprehensively study the parametric and behavioral features of three recent MoE-based models and reveal some intriguing observations, including (1) Neurons act like fine-grained experts. (2) The router of MoE usually selects experts with larger output norms. (3) The expert diversity increases as the layer increases, while the last layer is an outlier. Based on the observations, we also provide suggestions for a broad spectrum of MoE practitioners, such as router design and expert allocation. We hope this work could shed light on future research on the MoE framework and other modular architectures. Code is available at https://github.com/kamanphoebe/Look-into-MoEs. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.18200 [pdf, other]

SEED: Accelerating Reasoning Tree Construction via Scheduled Speculative Decoding

Authors: Zhenglin Wang, Jialong Wu, Yilong Lai, Congzhi Zhang, Deyu Zhou

Abstract: Large Language Models (LLMs) demonstrate remarkable emergent abilities across various tasks, yet fall short of complex reasoning and planning tasks. The tree-search-based reasoning methods address this by surpassing the capabilities of chain-of-thought prompting, encouraging exploration of intermediate steps. However, such methods introduce significant inference latency due to the systematic explo… ▽ More Large Language Models (LLMs) demonstrate remarkable emergent abilities across various tasks, yet fall short of complex reasoning and planning tasks. The tree-search-based reasoning methods address this by surpassing the capabilities of chain-of-thought prompting, encouraging exploration of intermediate steps. However, such methods introduce significant inference latency due to the systematic exploration and evaluation of multiple thought paths. This paper introduces SeeD, a novel and efficient inference framework to optimize runtime speed and GPU memory management concurrently. By employing a scheduled speculative execution, SeeD efficiently handles multiple iterations for the thought generation and the state evaluation, leveraging a rounds-scheduled strategy to manage draft model dispatching. Extensive experimental evaluations on three reasoning datasets demonstrate superior speedup performance of SeeD, providing a viable path for batched inference in training-free speculative decoding. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.18183 [pdf, other]

Measurement of the cross sections of $e^+e^-\to K^{-}\barΞ^{+}Λ/Σ^{0}$ at center-of-mass energies between 3.510 and 4.914 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of… ▽ More Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$, evidence for $ψ(4160) \to K^{-}\barΞ^{+}Λ$ is found for the first time with a significance of 4.4$σ$, including systematic uncertainties. No evidence for other possible resonances is found. In addition, the products of electronic partial width and branching fraction for all assumed resonances decaying into $K^{-}\barΞ^{+}Λ/Σ^{0}$ are determined. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: 26 pages,5 tables, 4 figures

arXiv:2406.18155 [pdf, other]

SuperGrad: a differentiable simulator for superconducting processors

Authors: Ziang Wang, Feng Wu, Hui-Hai Zhao, Xin Wan, Xiaotong Ni

Abstract: One significant advantage of superconducting processors is their extensive design flexibility, which encompasses various types of qubits and interactions. Given the large number of tunable parameters of a processor, the ability to perform gradient optimization would be highly beneficial. Efficient backpropagation for gradient computation requires a tightly integrated software library, for which no… ▽ More One significant advantage of superconducting processors is their extensive design flexibility, which encompasses various types of qubits and interactions. Given the large number of tunable parameters of a processor, the ability to perform gradient optimization would be highly beneficial. Efficient backpropagation for gradient computation requires a tightly integrated software library, for which no open-source implementation is currently available. In this work, we introduce SuperGrad, a simulator that accelerates the design of superconducting quantum processors by incorporating gradient computation capabilities. SuperGrad offers a user-friendly interface for constructing Hamiltonians and computing both static and dynamic properties of composite systems. This differentiable simulation is valuable for a range of applications, including optimal control, design optimization, and experimental data fitting. In this paper, we demonstrate these applications through examples and code snippets. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: 23 pages, 7 figures, 3 tables, the code is available at https://github.com/iqubit-org/supergrad

arXiv:2406.18122 [pdf, other]

Poisoned LangChain: Jailbreak LLMs by LangChain

Authors: Ziqiu Wang, Jun Liu, Shengkai Zhang, Yang Yang

Abstract: With the development of natural language processing (NLP), large language models (LLMs) are becoming increasingly popular. LLMs are integrating more into everyday life, raising public concerns about their security vulnerabilities. Consequently, the security of large language models is becoming critically important. Currently, the techniques for attacking and defending against LLMs are continuously… ▽ More With the development of natural language processing (NLP), large language models (LLMs) are becoming increasingly popular. LLMs are integrating more into everyday life, raising public concerns about their security vulnerabilities. Consequently, the security of large language models is becoming critically important. Currently, the techniques for attacking and defending against LLMs are continuously evolving. One significant method type of attack is the jailbreak attack, which designed to evade model safety mechanisms and induce the generation of inappropriate content. Existing jailbreak attacks primarily rely on crafting inducement prompts for direct jailbreaks, which are less effective against large models with robust filtering and high comprehension abilities. Given the increasing demand for real-time capabilities in large language models, real-time updates and iterations of new knowledge have become essential. Retrieval-Augmented Generation (RAG), an advanced technique to compensate for the model's lack of new knowledge, is gradually becoming mainstream. As RAG enables the model to utilize external knowledge bases, it provides a new avenue for jailbreak attacks. In this paper, we conduct the first work to propose the concept of indirect jailbreak and achieve Retrieval-Augmented Generation via LangChain. Building on this, we further design a novel method of indirect jailbreak attack, termed Poisoned-LangChain (PLC), which leverages a poisoned external knowledge base to interact with large language models, thereby causing the large models to generate malicious non-compliant dialogues.We tested this method on six different large language models across three major categories of jailbreak issues. The experiments demonstrate that PLC successfully implemented indirect jailbreak attacks under three different scenarios, achieving success rates of 88.56%, 79.04%, and 82.69% respectively. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: 6 pages,2 figures,This paper is a submission to ACM TURC. It has been accepted by the editor of the organizer

arXiv:2406.18083 [pdf, other]

Measurements of $K_S^0$-$K_L^0$ asymmetries in the decays $Λ_c^+ \to pK_{L,S}^0$, $pK_{L,S}^0π^+π^-$ and $pK_{L,S}^0π^0$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (643 additional authors not shown)

Abstract: Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, an… ▽ More Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, and $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^0)=(2.02 \pm 0.13 \pm 0.05)\%$, where the first uncertainties are statistical and the second systematic. Combining with the known branching fractions of $Λ_c^+ \to pK_{S}^{0}$, $Λ_c^+ \to pK_{S}^{0}π^+π^-$, and $Λ_c^+ \to pK_{S}^{0}π^0$, we present the first measurements of the $K_{S}^{0}$-$K_{L}^{0}$ asymmetries $R(Λ_c^+, K_{S,L}^0X) = \frac{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) - \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) + \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}$ in charmed baryon decays: $R(Λ_c^+, pK_{S,L}^0) = -0.025 \pm 0.031$, $R(Λ_c^+, pK_{S,L}^0π^+π^-) = -0.027 \pm 0.048$, and $R(Λ_c^+, pK_{S,L}^0π^0) =-0.015 \pm 0.046$. No significant asymmetries within the uncertainties are observed. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: 19 pages, 2 figures

arXiv:2406.18078 [pdf, other]

Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction

Authors: Yice Zhang, Jie Zeng, Weiming Hu, Ziyi Wang, Shiwei Chen, Ruifeng Xu

Abstract: Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review, which is the most representative and challenging task in aspect-based sentiment analysis. A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods. To tackle this issue, we propose a self-tra… ▽ More Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review, which is the most representative and challenging task in aspect-based sentiment analysis. A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods. To tackle this issue, we propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels, aiming to filter out mismatches and thereby enhance the effectiveness of self-training. We highlight two critical aspects to ensure the scorer's effectiveness and reliability: the quality of the training dataset and its model architecture. To this end, we create a human-annotated comparison dataset and train a generative model on it using ranking-based objectives. Extensive experiments on public ASQP datasets reveal that using our scorer can greatly and consistently improve the effectiveness of self-training. Moreover, we explore the possibility of replacing humans with large language models for comparison dataset annotation, and experiments demonstrate its feasibility. We release our code and data at https://github.com/HITSZ-HLT/ST-w-Scorer-ABSA . △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: Accepted to ACL 2024 Main Conference

arXiv:2406.17923 [pdf, other]

PAFT: A Parallel Training Paradigm for Effective LLM Fine-Tuning

Authors: Shiva Kumar Pentyala, Zhichao Wang, Bin Bi, Kiran Ramnath, Xiang-Bo Mao, Regunathan Radhakrishnan, Sitaram Asur, Na, Cheng

Abstract: Large language models (LLMs) have shown remarkable abilities in diverse natural language processing (NLP) tasks. The LLMs generally undergo supervised fine-tuning (SFT) followed by preference alignment to be usable in downstream applications. However, this sequential training pipeline leads to alignment tax that degrades the LLM performance. This paper introduces PAFT, a new PArallel training pa… ▽ More Large language models (LLMs) have shown remarkable abilities in diverse natural language processing (NLP) tasks. The LLMs generally undergo supervised fine-tuning (SFT) followed by preference alignment to be usable in downstream applications. However, this sequential training pipeline leads to alignment tax that degrades the LLM performance. This paper introduces PAFT, a new PArallel training paradigm for effective LLM Fine-Tuning, which independently performs SFT and preference alignment (e.g., DPO and ORPO, etc.) with the same pre-trained model on respective datasets. The model produced by SFT and the model from preference alignment are then merged into a final model by parameter fusing for use in downstream applications. This work reveals important findings that preference alignment like DPO naturally results in a sparse model while SFT leads to a natural dense model which needs to be sparsified for effective model merging. This paper introduces an effective interference resolution which reduces the redundancy by sparsifying the delta parameters. The LLM resulted from the new training paradigm achieved Rank #1 on the HuggingFace Open LLM Leaderboard. Comprehensive evaluation shows the effectiveness of the parallel training paradigm. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.17841 [pdf, other]

Probing many-body Bell correlation depth with superconducting qubits

Authors: Ke Wang, Weikang Li, Shibo Xu, Mengyao Hu, Jiachen Chen, Yaozu Wu, Chuanyu Zhang, Feitong **, Xuhao Zhu, Yu Gao, Ziqi Tan, Aosai Zhang, Ning Wang, Yiren Zou, Tingting Li, Fanhao Shen, Jiarun Zhong, Zehang Bao, Zitian Zhu, Zixuan Song, **feng Deng, Hang Dong, Xu Zhang, Pengfei Zhang, Wenjie Jiang , et al. (10 additional authors not shown)

Abstract: Quantum nonlocality describes a stronger form of quantum correlation than that of entanglement. It refutes Einstein's belief of local realism and is among the most distinctive and enigmatic features of quantum mechanics. It is a crucial resource for achieving quantum advantages in a variety of practical applications, ranging from cryptography and certified random number generation via self-testing… ▽ More Quantum nonlocality describes a stronger form of quantum correlation than that of entanglement. It refutes Einstein's belief of local realism and is among the most distinctive and enigmatic features of quantum mechanics. It is a crucial resource for achieving quantum advantages in a variety of practical applications, ranging from cryptography and certified random number generation via self-testing to machine learning. Nevertheless, the detection of nonlocality, especially in quantum many-body systems, is notoriously challenging. Here, we report an experimental certification of genuine multipartite Bell correlations, which signal nonlocality in quantum many-body systems, up to 24 qubits with a fully programmable superconducting quantum processor. In particular, we employ energy as a Bell correlation witness and variationally decrease the energy of a many-body system across a hierarchy of thresholds, below which an increasing Bell correlation depth can be certified from experimental data. As an illustrating example, we variationally prepare the low-energy state of a two-dimensional honeycomb model with 73 qubits and certify its Bell correlations by measuring an energy that surpasses the corresponding classical bound with up to 48 standard deviations. In addition, we variationally prepare a sequence of low-energy states and certify their genuine multipartite Bell correlations up to 24 qubits via energies measured efficiently by parity oscillation and multiple quantum coherence techniques. Our results establish a viable approach for preparing and certifying multipartite Bell correlations, which provide not only a finer benchmark beyond entanglement for quantum devices, but also a valuable guide towards exploiting multipartite Bell correlation in a wide spectrum of practical applications. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 11 pages,6 figures + 14 pages, 6 figures

arXiv:2406.17801 [pdf, other]

A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge

Authors: Xiaopeng Wang, Yi Lu, Xin Qi, Zhiyong Wang, Yuankun Xie, Shuchen Shi, Ruibo Fu

Abstract: This paper presents the development of a speech synthesis system for the LIMMITS'24 Challenge, focusing primarily on Track 2. The objective of the challenge is to establish a multi-speaker, multi-lingual Indic Text-to-Speech system with voice cloning capabilities, covering seven Indian languages with both male and female speakers. The system was trained using challenge data and fine-tuned for few-… ▽ More This paper presents the development of a speech synthesis system for the LIMMITS'24 Challenge, focusing primarily on Track 2. The objective of the challenge is to establish a multi-speaker, multi-lingual Indic Text-to-Speech system with voice cloning capabilities, covering seven Indian languages with both male and female speakers. The system was trained using challenge data and fine-tuned for few-shot voice cloning on target speakers. Evaluation included both mono-lingual and cross-lingual synthesis across all seven languages, with subjective tests assessing naturalness and speaker similarity. Our system uses the VITS2 architecture, augmented with a multi-lingual ID and a BERT model to enhance contextual language comprehension. In Track 1, where no additional data usage was permitted, our model achieved a Speaker Similarity score of 4.02. In Track 2, which allowed the use of extra data, it attained a Speaker Similarity score of 4.17. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Showing 1–50 of 13,608 results for author: Wang, Z