-
Intrinsic supercurrent diode effect in NbSe2 nanobridge
Authors:
Yiwen Zhang,
Jiliang Cai,
Peng Dong,
Jiadian He,
Yifan Ding,
**ghui Wang,
Xiang Zhou,
Kecheng Cao,
Yueshen Wu,
Jun Li
Abstract:
The significance of the superconducting diode effect lies in its potential application as a fundamental component in the development of next-generation superconducting circuit technology. The stringent operating conditions at low temperatures have posed challenges for the conventional semiconductor diode, primarily due to its exceptionally high resistivity. In response to this limitation, various…
▽ More
The significance of the superconducting diode effect lies in its potential application as a fundamental component in the development of next-generation superconducting circuit technology. The stringent operating conditions at low temperatures have posed challenges for the conventional semiconductor diode, primarily due to its exceptionally high resistivity. In response to this limitation, various approaches have emerged to achieve the superconducting diode effect, primarily involving the disruption of inversion symmetry in a two-dimensional superconductor through heterostructure fabrication. In this study, we present a direct observation of the supercurrent diode effect in a NbSe2 nanobridge with a length of approximately 15 nm, created using focused helium ion beam fabrication. Nonreciprocal supercurrents were identified, reaching a peak value of approximately 380 $μ$A for each bias polarity at $B_{z}^{max} =\pm 0.2$ mT. Notably, the nonreciprocal supercurrent can be toggled by altering the bias polarity. This discovery of the superconducting diode effect introduces a novel avenue and mechanism through nanofabrication on a superconducting flake, offering fresh perspectives for the development of superconducting devices and potential circuits.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
Localization in Reconfigurable Intelligent Surface Aided mmWave Systems: A Multiple Measurement Vector Based Channel Estimation Method
Authors:
Kunlun Li,
Jiguang He,
Mohammed El-Hajjar,
Lie-Liang Yang
Abstract:
The sparsity of millimeter wave (mmWave) channels in the angular and temporal domains is beneficial to channel estimation, while the associated channel parameters can be utilized for localization. However, line-of-sight (LoS) blockage poses a significant challenge on the localization in mmWave systems, potentially leading to substantial positioning errors. A promising solution is to employ reconfi…
▽ More
The sparsity of millimeter wave (mmWave) channels in the angular and temporal domains is beneficial to channel estimation, while the associated channel parameters can be utilized for localization. However, line-of-sight (LoS) blockage poses a significant challenge on the localization in mmWave systems, potentially leading to substantial positioning errors. A promising solution is to employ reconfigurable intelligent surface (RIS) to generate the virtual line-of-sight (VLoS) paths to aid localization. Consequently, wireless localization in the RIS-assisted mmWave systems has become the essential research issue. In this paper, a multiple measurement vector (MMV) model is constructed and a two-stage channel estimation based localization scheme is proposed. During the first stage, by exploiting the beamspace sparsity and employing a random RIS phase shift matrix, the channel parameters are estimated, based on which the precoder at base station and combiner at user equipment (UE) are designed. Then, in the second stage, based on the designed precoding and combining matrices, the optimal phase shift matrix for RIS is designed using the proposed modified temporally correlated multiple sparse Bayesian learning (TMSBL) algorithm. Afterwards, the channel parameters, such as angle of reflection, time-of-arrival, etc., embedding location information are estimated for finally deriving the location of UE. We demonstrate the achievable performance of the proposed algorithm and compare it with the state-of-the-art algorithms. Our studies show that the proposed localization scheme is capable of achieving centimeter level localization accuracy, when LoS path is blocked. Furthermore, the proposed algorithm has a low computational complexity and outperforms the legacy algorithms in different perspectives.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
Topological skyrmions in monolayer multiferroic MoPtGe2S6
Authors:
Zuxin Fu,
Kuanrong Hao,
Min Guo,
**g**g He,
Xiaohong Yan,
Yangbo Zhou,
Lei Shen,
Jiaren Yuan
Abstract:
Two-dimensional (2D) multiferroic materials with coexisting ferroelectricity and ferromagnetism have garnered substantial attention for their intriguing physical properties and diverse promising applications in spintronics. For example, multiferroic materials with electronically controlled broken central symmetry provide a versatile platform for designing and manipulating topological skyrmions and…
▽ More
Two-dimensional (2D) multiferroic materials with coexisting ferroelectricity and ferromagnetism have garnered substantial attention for their intriguing physical properties and diverse promising applications in spintronics. For example, multiferroic materials with electronically controlled broken central symmetry provide a versatile platform for designing and manipulating topological skyrmions and diverse spintronic applications. Here, we investigate the complex magnetic properties of room-temerature multiferroic material MoPtGe2S6 and its electrical control of topological skyrmions using first-principles calculations and atomistic micromagnetic simulations. A sizable Dzyaloshinskii-Moriya interaction (DMI) (2.1 meV) is found in the multiferroic material MoPtGe2S6 with an electrically polarized ground state. The magnetic skyrmions can be stabilized in monolayer MoPtGe2S6 under zero magnetic field, and the chirality of skyrmions can be reversed with electric field-induced flip** of electrical polarization due to the reversed chirality of the DMI. Furthermore, an external magnetic fielc can reverse the magnetization direction and topological charge of the skyrmions as well as tune the size of skyrmions. These results demonstrate that the monolayer MoPtGe2S6 can enrich the 2D skyrmion community and pave the way for electronically controlled spintronic devices.
△ Less
Submitted 24 February, 2024;
originally announced February 2024.
-
Ultrastrong coupling between polar distortion and optical properties in ferroelectric MoBr$_2$O$_2$
Authors:
Zhaojun Li,
Lorenzo Varrassi,
Yali Yang,
Cesare Franchini,
Laurent Bellaiche,
Jiangang He
Abstract:
Tuning the properties of materials using external stimuli is crucial for develo** versatile smart materials. A strong coupling among order parameters within a single-phase material constitutes a potent foundation for achieving precise property control. However, cross-coupling is pretty weak in most single materials. Leveraging first principles calculations, we demonstrate the layered mixed anion…
▽ More
Tuning the properties of materials using external stimuli is crucial for develo** versatile smart materials. A strong coupling among order parameters within a single-phase material constitutes a potent foundation for achieving precise property control. However, cross-coupling is pretty weak in most single materials. Leveraging first principles calculations, we demonstrate the layered mixed anion compound MoBr$_2$O$_2$ exhibits electric-field switchable spontaneous polarization and ultrastrong coupling between polar distortion and electronic structures as well as optical properties. It offers feasible avenues of achieving tunable Rashba spin-splitting, electrochromism, thermochromism, photochromism, and nonlinear optics by applying an external electric field to a single domain sample, heating, as well as intense light illumination. Additionally, it exhibits an exceptionally large photostrictive effect. These findings not only showcase the feasibility of achieving multiple order parameter coupling within a single material, but also pave the way for comprehensive applications based on property control, such as energy harvesting, information processing, and ultrafast control.
△ Less
Submitted 24 February, 2024;
originally announced February 2024.
-
ELAA Near-Field Localization and Sensing with Partial Blockage Detection
Authors:
Hui Chen,
Pinjun Zheng,
Yu Ge,
Ahmed Elzanaty,
Jiguang He,
Tareq Y. Al-Naffouri,
Henk Wymeersch
Abstract:
High-frequency communication systems bring extremely large aperture arrays (ELAA) and large bandwidths, integrating localization and (bi-static) sensing functions without extra infrastructure. Such systems are likely to operate in the near-field (NF), where the performance of localization and sensing is degraded if a simplified far-field channel model is considered. However, when taking advantage…
▽ More
High-frequency communication systems bring extremely large aperture arrays (ELAA) and large bandwidths, integrating localization and (bi-static) sensing functions without extra infrastructure. Such systems are likely to operate in the near-field (NF), where the performance of localization and sensing is degraded if a simplified far-field channel model is considered. However, when taking advantage of the additional geometry information in the NF, e.g., the encapsulated information in the wavefront, localization and sensing performance can be improved. In this work, we formulate a joint synchronization, localization, and sensing problem in the NF. Considering the array size could be much larger than an obstacle, the effect of partial blockage (i.e., a portion of antennas are blocked) is investigated, and a blockage detection algorithm is proposed. The simulation results show that blockage greatly impacts performance for certain positions, and the proposed blockage detection algorithm can mitigate this impact by identifying the blocked antennas.
△ Less
Submitted 24 February, 2024;
originally announced February 2024.
-
Modification of $χ_{c1}$(3872) and $ψ$(2$S$) production in $p$Pb collisions at $\sqrt{s_{NN}} = 8.16$ TeV
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
A. Alfonso Albero,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1082 additional authors not shown)
Abstract:
The LHCb collaboration measures production of the exotic hadron $χ_{c1}$(3872) in proton-nucleus collisions for the first time. Comparison with the charmonium state $ψ$(2$S$) suggests that the exotic $χ_{c1}$(3872) experiences different dynamics in the nuclear medium than conventional hadrons, and comparison with data from proton-proton collisions indicates that the presence of the nucleus may mod…
▽ More
The LHCb collaboration measures production of the exotic hadron $χ_{c1}$(3872) in proton-nucleus collisions for the first time. Comparison with the charmonium state $ψ$(2$S$) suggests that the exotic $χ_{c1}$(3872) experiences different dynamics in the nuclear medium than conventional hadrons, and comparison with data from proton-proton collisions indicates that the presence of the nucleus may modify $χ_{c1}$(3872) production rates. This is the first measurement of the nuclear modification factor of an exotic hadron.
△ Less
Submitted 19 June, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
Think before You Leap: Content-Aware Low-Cost Edge-Assisted Video Semantic Segmentation
Authors:
Mingxuan Yan,
Yi Wang,
Xuedou Xiao,
Zhiqing Luo,
Jianhua He,
Wei Wang
Abstract:
Offloading computing to edge servers is a promising solution to support growing video understanding applications at resource-constrained IoT devices. Recent efforts have been made to enhance the scalability of such systems by reducing inference costs on edge servers. However, existing research is not directly applicable to pixel-level vision tasks such as video semantic segmentation (VSS), partly…
▽ More
Offloading computing to edge servers is a promising solution to support growing video understanding applications at resource-constrained IoT devices. Recent efforts have been made to enhance the scalability of such systems by reducing inference costs on edge servers. However, existing research is not directly applicable to pixel-level vision tasks such as video semantic segmentation (VSS), partly due to the fluctuating VSS accuracy and segment bitrate caused by the dynamic video content. In response, we present Penance, a new edge inference cost reduction framework. By exploiting softmax outputs of VSS models and the prediction mechanism of H.264/AVC codecs, Penance optimizes model selection and compression settings to minimize the inference cost while meeting the required accuracy within the available bandwidth constraints. We implement Penance in a commercial IoT device with only CPUs. Experimental results show that Penance consumes a negligible 6.8% more computation resources than the optimal strategy while satisfying accuracy and bandwidth constraints with a low failure rate.
△ Less
Submitted 27 March, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
OPDAI at SemEval-2024 Task 6: Small LLMs can Accelerate Hallucination Detection with Weakly Supervised Data
Authors:
Chengcheng Wei,
Ze Chen,
Songtan Fang,
Jiarong He,
Max Gao
Abstract:
This paper mainly describes a unified system for hallucination detection of LLMs, which wins the second prize in the model-agnostic track of the SemEval-2024 Task 6, and also achieves considerable results in the model-aware track. This task aims to detect hallucination with LLMs for three different text-generation tasks without labeled training data. We utilize prompt engineering and few-shot lear…
▽ More
This paper mainly describes a unified system for hallucination detection of LLMs, which wins the second prize in the model-agnostic track of the SemEval-2024 Task 6, and also achieves considerable results in the model-aware track. This task aims to detect hallucination with LLMs for three different text-generation tasks without labeled training data. We utilize prompt engineering and few-shot learning to verify the performance of different LLMs on the validation data. Then we select the LLMs with better performance to generate high-quality weakly supervised training data, which not only satisfies the consistency of different LLMs, but also satisfies the consistency of the optimal LLM with different sampling parameters. Furthermore, we finetune different LLMs by using the constructed training data, and finding that a relatively small LLM can achieve a competitive level of performance in hallucination detection, when compared to the large LLMs and the prompt-based approaches using GPT-4.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Don't Go To Extremes: Revealing the Excessive Sensitivity and Calibration Limitations of LLMs in Implicit Hate Speech Detection
Authors:
Min Zhang,
Jianfeng He,
Taoran Ji,
Chang-Tien Lu
Abstract:
The fairness and trustworthiness of Large Language Models (LLMs) are receiving increasing attention. Implicit hate speech, which employs indirect language to convey hateful intentions, occupies a significant portion of practice. However, the extent to which LLMs effectively address this issue remains insufficiently examined. This paper delves into the capability of LLMs to detect implicit hate spe…
▽ More
The fairness and trustworthiness of Large Language Models (LLMs) are receiving increasing attention. Implicit hate speech, which employs indirect language to convey hateful intentions, occupies a significant portion of practice. However, the extent to which LLMs effectively address this issue remains insufficiently examined. This paper delves into the capability of LLMs to detect implicit hate speech (Classification Task) and express confidence in their responses (Calibration Task). Our evaluation meticulously considers various prompt patterns and mainstream uncertainty estimation methods. Our findings highlight that LLMs exhibit two extremes: (1) LLMs display excessive sensitivity towards groups or topics that may cause fairness issues, resulting in misclassifying benign statements as hate speech. (2) LLMs' confidence scores for each method excessively concentrate on a fixed range, remaining unchanged regardless of the dataset's complexity. Consequently, the calibration performance is heavily reliant on primary classification accuracy. These discoveries unveil new limitations of LLMs, underscoring the need for caution when optimizing models to ensure they do not veer towards extremes. This serves as a reminder to carefully consider sensitivity and confidence in the pursuit of model fairness.
△ Less
Submitted 26 February, 2024; v1 submitted 17 February, 2024;
originally announced February 2024.
-
Vector spectrometer with Hertz-level resolution and super-recognition capability
Authors:
Ting Qing,
Shupeng Li,
Huashan Yang,
Lihan Wang,
Yijie Fang,
Xiaohu Tang,
Meihui Cao,
Jianming Lu,
Jijun He,
Junqiu Liu,
Yueguang Lyu,
Shilong Pan
Abstract:
High-resolution optical spectrometers are crucial in revealing intricate characteristics of signals, determining laser frequencies, measuring physical constants, identifying substances, and advancing biosensing applications. Conventional spectrometers, however, often grapple with inherent trade-offs among spectral resolution, wavelength range, and accuracy. Furthermore, even at high resolution, re…
▽ More
High-resolution optical spectrometers are crucial in revealing intricate characteristics of signals, determining laser frequencies, measuring physical constants, identifying substances, and advancing biosensing applications. Conventional spectrometers, however, often grapple with inherent trade-offs among spectral resolution, wavelength range, and accuracy. Furthermore, even at high resolution, resolving overlap** spectral lines during spectroscopic analyses remains a huge challenge. Here, we propose a vector spectrometer with ultrahigh resolution, combining broadband optical frequency hop**, ultrafine microwave-photonic scanning, and vector detection. A programmable frequency-hop** laser was developed, facilitating a sub-Hz linewidth and Hz-level frequency stability, an improvement of four and six orders of magnitude, respectively, compared to those of state-of-the-art tunable lasers. We also designed an asymmetric optical transmitter and receiver to eliminate measurement errors arising from modulation nonlinearity and multi-channel crosstalk. The resultant vector spectrometer exhibits an unprecedented frequency resolution of 2 Hz, surpassing the state-of-the-art by four orders of magnitude, over a 33-nm range. Through high-resolution vector analysis, we observed that group delay information enhances the separation capability of overlap** spectral lines by over 47%, significantly streamlining the real-time identification of diverse substances. Our technique fills the gap in optical spectrometers with resolutions below 10 kHz and enables vector measurement to embrace revolution in functionality.
△ Less
Submitted 6 March, 2024; v1 submitted 15 February, 2024;
originally announced February 2024.
-
Instruction Tuning for Secure Code Generation
Authors:
**gxuan He,
Mark Vero,
Gabriela Krasnopolska,
Martin Vechev
Abstract:
Modern language models (LMs) have gained widespread acceptance in everyday and professional contexts, particularly in programming. An essential procedure enabling this adoption is instruction tuning, which substantially enhances LMs' practical utility by training them to follow user instructions and human preferences. However, existing instruction tuning schemes overlook a crucial aspect: the secu…
▽ More
Modern language models (LMs) have gained widespread acceptance in everyday and professional contexts, particularly in programming. An essential procedure enabling this adoption is instruction tuning, which substantially enhances LMs' practical utility by training them to follow user instructions and human preferences. However, existing instruction tuning schemes overlook a crucial aspect: the security of generated code. As a result, even the state-of-the-art instruction-tuned LMs frequently produce unsafe code, posing significant security risks. In this work, we introduce SafeCoder to address this gap. SafeCoder performs security-centric fine-tuning using a diverse and high-quality dataset that we collected using an automated pipeline. We integrate the security fine-tuning with standard instruction tuning, to facilitate a joint optimization of both security and utility. Despite its simplicity, we show that SafeCoder is effective across a variety of popular LMs and datasets. It is able to drastically improve security (by about 30%), while preserving utility.
△ Less
Submitted 12 July, 2024; v1 submitted 14 February, 2024;
originally announced February 2024.
-
Reinforcement Learning from Human Feedback with Active Queries
Authors:
Kaixuan Ji,
Jiafan He,
Quanquan Gu
Abstract:
Aligning large language models (LLM) with human preference plays a key role in building modern generative models and can be achieved by reinforcement learning from human feedback (RLHF). Despite their superior performance, current RLHF approaches often require a large amount of human-labelled preference data, which is expensive to collect. In this paper, inspired by the success of active learning,…
▽ More
Aligning large language models (LLM) with human preference plays a key role in building modern generative models and can be achieved by reinforcement learning from human feedback (RLHF). Despite their superior performance, current RLHF approaches often require a large amount of human-labelled preference data, which is expensive to collect. In this paper, inspired by the success of active learning, we address this problem by proposing query-efficient RLHF methods. We first formalize the alignment problem as a contextual dueling bandit problem and design an active-query-based proximal policy optimization (APPO) algorithm with an $\tilde{O}(d^2/Δ)$ regret bound and an $\tilde{O}(d^2/Δ^2)$ query complexity, where $d$ is the dimension of feature space and $Δ$ is the sub-optimality gap over all the contexts. We then propose ADPO, a practical version of our algorithm based on direct preference optimization (DPO) and apply it to fine-tuning LLMs. Our experiments show that ADPO, while only making about half of queries for human preference, matches the performance of the state-of-the-art DPO method.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Deep Rib Fracture Instance Segmentation and Classification from CT on the RibFrac Challenge
Authors:
Jiancheng Yang,
Rui Shi,
Liang **,
Xiaoyang Huang,
Kaiming Kuang,
Donglai Wei,
Shixuan Gu,
Jianying Liu,
Pengfei Liu,
Zhizhong Chai,
Yongjie Xiao,
Hao Chen,
Liming Xu,
Bang Du,
Xiangyi Yan,
Hao Tang,
Adam Alessio,
Gregory Holste,
Jiapeng Zhang,
Xiaoming Wang,
Jianye He,
Lixuan Che,
Hanspeter Pfister,
Ming Li,
Bingbing Ni
Abstract:
Rib fractures are a common and potentially severe injury that can be challenging and labor-intensive to detect in CT scans. While there have been efforts to address this field, the lack of large-scale annotated datasets and evaluation benchmarks has hindered the development and validation of deep learning algorithms. To address this issue, the RibFrac Challenge was introduced, providing a benchmar…
▽ More
Rib fractures are a common and potentially severe injury that can be challenging and labor-intensive to detect in CT scans. While there have been efforts to address this field, the lack of large-scale annotated datasets and evaluation benchmarks has hindered the development and validation of deep learning algorithms. To address this issue, the RibFrac Challenge was introduced, providing a benchmark dataset of over 5,000 rib fractures from 660 CT scans, with voxel-level instance mask annotations and diagnosis labels for four clinical categories (buckle, nondisplaced, displaced, or segmental). The challenge includes two tracks: a detection (instance segmentation) track evaluated by an FROC-style metric and a classification track evaluated by an F1-style metric. During the MICCAI 2020 challenge period, 243 results were evaluated, and seven teams were invited to participate in the challenge summary. The analysis revealed that several top rib fracture detection solutions achieved performance comparable or even better than human experts. Nevertheless, the current rib fracture classification solutions are hardly clinically applicable, which can be an interesting area in the future. As an active benchmark and research resource, the data and online evaluation of the RibFrac Challenge are available at the challenge website. As an independent contribution, we have also extended our previous internal baseline by incorporating recent advancements in large-scale pretrained networks and point-based rib segmentation techniques. The resulting FracNet+ demonstrates competitive performance in rib fracture detection, which lays a foundation for further research and development in AI-assisted rib fracture detection and diagnosis.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM
Authors:
Yutao Hu,
Tianbin Li,
Quanfeng Lu,
Wenqi Shao,
Junjun He,
Yu Qiao,
** Luo
Abstract:
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in various multimodal tasks. However, their potential in the medical domain remains largely unexplored. A significant challenge arises from the scarcity of diverse medical images spanning various modalities and anatomical regions, which is essential in real-world medical applications. To solve this problem, in this pape…
▽ More
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in various multimodal tasks. However, their potential in the medical domain remains largely unexplored. A significant challenge arises from the scarcity of diverse medical images spanning various modalities and anatomical regions, which is essential in real-world medical applications. To solve this problem, in this paper, we introduce OmniMedVQA, a novel comprehensive medical Visual Question Answering (VQA) benchmark. This benchmark is collected from 73 different medical datasets, including 12 different modalities and covering more than 20 distinct anatomical regions. Importantly, all images in this benchmark are sourced from authentic medical scenarios, ensuring alignment with the requirements of the medical field and suitability for evaluating LVLMs. Through our extensive experiments, we have found that existing LVLMs struggle to address these medical VQA problems effectively. Moreover, what surprises us is that medical-specialized LVLMs even exhibit inferior performance to those general-domain models, calling for a more versatile and robust LVLM in the biomedical field. The evaluation results not only reveal the current limitations of LVLM in understanding real medical images but also highlight our dataset's significance. Our code with dataset are available at https://github.com/OpenGVLab/Multi-Modality-Arena.
△ Less
Submitted 21 April, 2024; v1 submitted 14 February, 2024;
originally announced February 2024.
-
Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path
Authors:
Qiwei Di,
Jiafan He,
Dongruo Zhou,
Quanquan Gu
Abstract:
We study the Stochastic Shortest Path (SSP) problem with a linear mixture transition kernel, where an agent repeatedly interacts with a stochastic environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the cost function or an upper bound of the expected length for the optimal policy. In this paper, we p…
▽ More
We study the Stochastic Shortest Path (SSP) problem with a linear mixture transition kernel, where an agent repeatedly interacts with a stochastic environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the cost function or an upper bound of the expected length for the optimal policy. In this paper, we propose a new algorithm to eliminate these restrictive assumptions. Our algorithm is based on extended value iteration with a fine-grained variance-aware confidence set, where the variance is estimated recursively from high-order moments. Our algorithm achieves an $\tilde{\mathcal O}(dB_*\sqrt{K})$ regret bound, where $d$ is the dimension of the feature map** in the linear transition kernel, $B_*$ is the upper bound of the total cumulative cost for the optimal policy, and $K$ is the number of episodes. Our regret upper bound matches the $Ω(dB_*\sqrt{K})$ lower bound of linear mixture SSPs in Min et al. (2022), which suggests that our algorithm is nearly minimax optimal.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Towards Robust Model-Based Reinforcement Learning Against Adversarial Corruption
Authors:
Chenlu Ye,
Jiafan He,
Quanquan Gu,
Tong Zhang
Abstract:
This study tackles the challenges of adversarial corruption in model-based reinforcement learning (RL), where the transition dynamics can be corrupted by an adversary. Existing studies on corruption-robust RL mostly focus on the setting of model-free RL, where robust least-square regression is often employed for value function estimation. However, these techniques cannot be directly applied to mod…
▽ More
This study tackles the challenges of adversarial corruption in model-based reinforcement learning (RL), where the transition dynamics can be corrupted by an adversary. Existing studies on corruption-robust RL mostly focus on the setting of model-free RL, where robust least-square regression is often employed for value function estimation. However, these techniques cannot be directly applied to model-based RL. In this paper, we focus on model-based RL and take the maximum likelihood estimation (MLE) approach to learn transition model. Our work encompasses both online and offline settings. In the online setting, we introduce an algorithm called corruption-robust optimistic MLE (CR-OMLE), which leverages total-variation (TV)-based information ratios as uncertainty weights for MLE. We prove that CR-OMLE achieves a regret of $\tilde{\mathcal{O}}(\sqrt{T} + C)$, where $C$ denotes the cumulative corruption level after $T$ episodes. We also prove a lower bound to show that the additive dependence on $C$ is optimal. We extend our weighting technique to the offline setting, and propose an algorithm named corruption-robust pessimistic MLE (CR-PMLE). Under a uniform coverage condition, CR-PMLE exhibits suboptimality worsened by $\mathcal{O}(C/n)$, nearly matching the lower bound. To the best of our knowledge, this is the first work on corruption-robust model-based RL algorithms with provable guarantees.
△ Less
Submitted 14 February, 2024; v1 submitted 14 February, 2024;
originally announced February 2024.
-
RIS-Augmented Millimeter-Wave MIMO Systems for Passive Drone Detection
Authors:
Jiguang He,
Aymen Fakhreddine,
George C. Alexandropoulos
Abstract:
In the past decade, the number of amateur drones is increasing, and this trend is expected to continue in the future. The security issues brought by abuse and misconduct of drones become more and more severe and may incur a negative impact to the society. In this paper, we leverage existing cellular multiple-input multiple-output (MIMO) base station (BS) infrastructure, operating at millimeter wav…
▽ More
In the past decade, the number of amateur drones is increasing, and this trend is expected to continue in the future. The security issues brought by abuse and misconduct of drones become more and more severe and may incur a negative impact to the society. In this paper, we leverage existing cellular multiple-input multiple-output (MIMO) base station (BS) infrastructure, operating at millimeter wave (mmWave) frequency bands, for drone detection in a device-free manner with the aid of one reconfigurable intelligent surface (RIS), deployed in the proximity of the BS. We theoretically examine the feasibility of drone detection with the aid of the generalized likelihood ratio test (GLRT) and validate via simulations that, the optimized deployment of an RIS can bring added benefits compared to RIS-free systems. In addition, the effect of RIS training beams, training overhead, and radar cross section, is investigated in order to offer theoretical design guidance for the proposed cellular RIS-based passive drone detection system.
△ Less
Submitted 11 February, 2024;
originally announced February 2024.
-
GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting
Authors:
Xiaoyu Zhou,
Xingjian Ran,
Yajiao Xiong,
**lin He,
Zhiwei Lin,
Yongtao Wang,
Deqing Sun,
Ming-Hsuan Yang
Abstract:
We present GALA3D, generative 3D GAussians with LAyout-guided control, for effective compositional text-to-3D generation. We first utilize large language models (LLMs) to generate the initial layout and introduce a layout-guided 3D Gaussian representation for 3D content generation with adaptive geometric constraints. We then propose an instance-scene compositional optimization mechanism with condi…
▽ More
We present GALA3D, generative 3D GAussians with LAyout-guided control, for effective compositional text-to-3D generation. We first utilize large language models (LLMs) to generate the initial layout and introduce a layout-guided 3D Gaussian representation for 3D content generation with adaptive geometric constraints. We then propose an instance-scene compositional optimization mechanism with conditioned diffusion to collaboratively generate realistic 3D scenes with consistent geometry, texture, scale, and accurate interactions among multiple objects while simultaneously adjusting the coarse layout priors extracted from the LLMs to align with the generated scene. Experiments show that GALA3D is a user-friendly, end-to-end framework for state-of-the-art scene-level 3D content generation and controllable editing while ensuring the high fidelity of object-level entities within the scene. The source codes and models will be available at gala3d.github.io.
△ Less
Submitted 11 June, 2024; v1 submitted 11 February, 2024;
originally announced February 2024.
-
Coverage and Rate Analysis for Distributed RISs-Assisted mmWave Communications
Authors:
Yuan Xu,
Chongwen Huang,
Wei Li,
Yongxu Zhu,
Zhaohui Yang,
Jiguang He,
Jun Yang,
Zhaoyang Zhang,
Chau Yuen,
Merouane Debbah
Abstract:
The millimeter wave (mmWave) has received considerable interest due to its expansive bandwidth and high frequency. However, a noteworthy challenge arises from its vulnerability to blockages, leading to reduced coverage and achievable rates. To address these limitations, a potential solution is to deploy distributed reconfigurable intelligent surfaces (RISs), which comprise many low-cost and passiv…
▽ More
The millimeter wave (mmWave) has received considerable interest due to its expansive bandwidth and high frequency. However, a noteworthy challenge arises from its vulnerability to blockages, leading to reduced coverage and achievable rates. To address these limitations, a potential solution is to deploy distributed reconfigurable intelligent surfaces (RISs), which comprise many low-cost and passively reflected elements, and can facilitate the establishment of extra communication links. In this paper, we leverage stochastic geometry to investigate the ergodic coverage probability and the achievable rate in both distributed RISs-assisted single-cell and multi-cell mmWave wireless communication systems. Specifically, we first establish the system model considering the stochastically distributed blockages, RISs and users by the Poisson point process. Then we give the association criterion and derive the association probabilities, the distance distributions, and the conditional coverage probabilities for two cases of associations between base stations and users without or with RISs. Finally, we use Campbell's theorem and the total probability theorem to obtain the closed-form expressions of the ergodic coverage probability and the achievable rate. Simulation results verify the effectiveness of our analysis method, and demonstrate that by deploying distributed RISs, the ergodic coverage probability is significantly improved by approximately 50%, and the achievable rate is increased by more than 1.5 times.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
DE$^3$-BERT: Distance-Enhanced Early Exiting for BERT based on Prototypical Networks
Authors:
Jianing He,
Qi Zhang,
Wei** Ding,
Duoqian Miao,
Jun Zhao,
Liang Hu,
Longbing Cao
Abstract:
Early exiting has demonstrated its effectiveness in accelerating the inference of pre-trained language models like BERT by dynamically adjusting the number of layers executed. However, most existing early exiting methods only consider local information from an individual test sample to determine their exiting indicators, failing to leverage the global information offered by sample population. This…
▽ More
Early exiting has demonstrated its effectiveness in accelerating the inference of pre-trained language models like BERT by dynamically adjusting the number of layers executed. However, most existing early exiting methods only consider local information from an individual test sample to determine their exiting indicators, failing to leverage the global information offered by sample population. This leads to suboptimal estimation of prediction correctness, resulting in erroneous exiting decisions. To bridge the gap, we explore the necessity of effectively combining both local and global information to ensure reliable early exiting during inference. Purposefully, we leverage prototypical networks to learn class prototypes and devise a distance metric between samples and class prototypes. This enables us to utilize global information for estimating the correctness of early predictions. On this basis, we propose a novel Distance-Enhanced Early Exiting framework for BERT (DE$^3$-BERT). DE$^3$-BERT implements a hybrid exiting strategy that supplements classic entropy-based local information with distance-based global information to enhance the estimation of prediction correctness for more reliable early exiting decisions. Extensive experiments on the GLUE benchmark demonstrate that DE$^3$-BERT consistently outperforms state-of-the-art models under different speed-up ratios with minimal storage or computational overhead, yielding a better trade-off between model performance and inference efficiency. Additionally, an in-depth analysis further validates the generality and interpretability of our method.
△ Less
Submitted 3 February, 2024;
originally announced February 2024.
-
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Authors:
Dongyang Liu,
Renrui Zhang,
Longtian Qiu,
Siyuan Huang,
Weifeng Lin,
Shitian Zhao,
Shijie Geng,
Ziyi Lin,
Peng **,
Kaipeng Zhang,
Wenqi Shao,
Chao Xu,
Conghui He,
Junjun He,
Hao Shao,
Pan Lu,
Hongsheng Li,
Yu Qiao,
Peng Gao
Abstract:
We propose SPHINX-X, an extensive Multimodality Large Language Model (MLLM) series developed upon SPHINX. To improve the architecture and training efficiency, we modify the SPHINX framework by removing redundant visual encoders, bypassing fully-padded sub-images with skip tokens, and simplifying multi-stage training into a one-stage all-in-one paradigm. To fully unleash the potential of MLLMs, we…
▽ More
We propose SPHINX-X, an extensive Multimodality Large Language Model (MLLM) series developed upon SPHINX. To improve the architecture and training efficiency, we modify the SPHINX framework by removing redundant visual encoders, bypassing fully-padded sub-images with skip tokens, and simplifying multi-stage training into a one-stage all-in-one paradigm. To fully unleash the potential of MLLMs, we assemble a comprehensive multi-domain and multimodal dataset covering publicly available resources in language, vision, and vision-language tasks. We further enrich this collection with our curated OCR intensive and Set-of-Mark datasets, extending the diversity and generality. By training over different base LLMs including TinyLlama1.1B, InternLM2-7B, LLaMA2-13B, and Mixtral8x7B, we obtain a spectrum of MLLMs that vary in parameter size and multilingual capabilities. Comprehensive benchmarking reveals a strong correlation between the multi-modal performance with the data and parameter scales. Code and models are released at https://github.com/Alpha-VLLM/LLaMA2-Accessory
△ Less
Submitted 26 June, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
Reinforcement Learning as a Catalyst for Robust and Fair Federated Learning: Deciphering the Dynamics of Client Contributions
Authors:
Jialuo He,
Wei Chen,
Xiao** Zhang
Abstract:
Recent advancements in federated learning (FL) have produced models that retain user privacy by training across multiple decentralized devices or systems holding local data samples. However, these strategies often neglect the inherent challenges of statistical heterogeneity and vulnerability to adversarial attacks, which can degrade model robustness and fairness. Personalized FL strategies offer s…
▽ More
Recent advancements in federated learning (FL) have produced models that retain user privacy by training across multiple decentralized devices or systems holding local data samples. However, these strategies often neglect the inherent challenges of statistical heterogeneity and vulnerability to adversarial attacks, which can degrade model robustness and fairness. Personalized FL strategies offer some respite by adjusting models to fit individual client profiles, yet they tend to neglect server-side aggregation vulnerabilities. To address these issues, we propose Reinforcement Federated Learning (RFL), a novel framework that leverages deep reinforcement learning to adaptively optimize client contribution during aggregation, thereby enhancing both model robustness against malicious clients and fairness across participants under non-identically distributed settings. To achieve this goal, we propose a meticulous approach involving a Deep Deterministic Policy Gradient-based algorithm for continuous control of aggregation weights, an innovative client selection method based on model parameter distances, and a reward mechanism guided by validation set performance. Empirically, extensive experiments demonstrate that, in terms of robustness, RFL outperforms the state-of-the-art methods, while maintaining comparable levels of fairness, offering a promising solution to build resilient and fair federated systems.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Measurement of the Branching Fraction of $B^{0} \rightarrow J/ψπ^{0}$ Decays
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
J. A. Adams,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey
, et al. (1067 additional authors not shown)
Abstract:
The ratio of branching fractions between $B^{0} \rightarrow J/ψπ^{0}$ and $B^{+} \rightarrow J/ψK^{*+}$ decays is measured with proton-proton collision data collected by the LHCb experiment, corresponding to an integrated luminosity of 9 fb$^{-1}$. The measured value is…
▽ More
The ratio of branching fractions between $B^{0} \rightarrow J/ψπ^{0}$ and $B^{+} \rightarrow J/ψK^{*+}$ decays is measured with proton-proton collision data collected by the LHCb experiment, corresponding to an integrated luminosity of 9 fb$^{-1}$. The measured value is $\frac{\mathcal{B}_{B^{0} \rightarrow J/ψπ^{0}}}{\mathcal{B}_{B^{+} \rightarrow J/ψK^{*+}}} = (1.153 \pm 0.053 \pm 0.048 ) \times 10^{-2}$, where the first uncertainty is statistical and the second is systematic. The branching fraction for $B^{0} \rightarrow J/ψπ^{0}$ decays is determined using the branching fraction of the normalisation channel, resulting in $\mathcal{B}_{B^{0} \rightarrow J/ψπ^{0}} = (1.670 \pm 0.077 \pm 0.069 \pm 0.095) \times 10^{-5}$, where the last uncertainty corresponds to that of the external input. This result is consistent with the current world average value and competitive with the most precise single measurement to date.
△ Less
Submitted 23 May, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
Observation of the $B_c^+ \to J/ψπ^+ π^0$ decay
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
J. A. Adams,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey
, et al. (1064 additional authors not shown)
Abstract:
The first observation of the $B_c^+ \to J/ψπ^+ π^0$ decay is reported with high significance using proton-proton collision data, corresponding to an integrated luminosity of 9fb$^{-1}$, collected with the LHCb detector at centre-of-mass energies of 7, 8, and 13 TeV. The ratio of its branching fraction relative to the $B_c^+ \to J/ψπ^+$ channel is measured to be…
▽ More
The first observation of the $B_c^+ \to J/ψπ^+ π^0$ decay is reported with high significance using proton-proton collision data, corresponding to an integrated luminosity of 9fb$^{-1}$, collected with the LHCb detector at centre-of-mass energies of 7, 8, and 13 TeV. The ratio of its branching fraction relative to the $B_c^+ \to J/ψπ^+$ channel is measured to be
$$
\frac{ {\cal{B}}( B_c^+ \to J/ψπ^+π^0 ) }
{ {\cal{B}}( B_c^+ \to J/ψπ^+ ) }
= 2.80 \pm 0.15 \pm 0.11 \pm 0.16 \,,
$$ where the first uncertainty is statistical, the second systematic and the third related to imprecise knowledge of the branching fractions for $B^+ \to J/ψK^{*+}$ and $B^+ \to J/ψK^+$ decays, which are used to determine the $π^0$ detection efficiency. The $π^+π^0$ mass spectrum is found to be consistent with the dominance of an intermediate $ρ^+$ contribution in accordance with a model based on QCD factorisation.
△ Less
Submitted 15 May, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
Learning on Multimodal Graphs: A Survey
Authors:
Ciyuan Peng,
Jiayuan He,
Feng Xia
Abstract:
Multimodal data pervades various domains, including healthcare, social media, and transportation, where multimodal graphs play a pivotal role. Machine learning on multimodal graphs, referred to as multimodal graph learning (MGL), is essential for successful artificial intelligence (AI) applications. The burgeoning research in this field encompasses diverse graph data types and modalities, learning…
▽ More
Multimodal data pervades various domains, including healthcare, social media, and transportation, where multimodal graphs play a pivotal role. Machine learning on multimodal graphs, referred to as multimodal graph learning (MGL), is essential for successful artificial intelligence (AI) applications. The burgeoning research in this field encompasses diverse graph data types and modalities, learning techniques, and application scenarios. This survey paper conducts a comparative analysis of existing works in multimodal graph learning, elucidating how multimodal learning is achieved across different graph types and exploring the characteristics of prevalent learning techniques. Additionally, we delineate significant applications of multimodal graph learning and offer insights into future directions in this domain. Consequently, this paper serves as a foundational resource for researchers seeking to comprehend existing MGL techniques and their applicability across diverse scenarios.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
PandaX-xT: a Multi-ten-tonne Liquid Xenon Observatory at the China **** Underground Laboratory
Authors:
PandaX Collaboration,
Abdusalam Abdukerim,
Zihao Bo,
Wei Chen,
Xun Chen,
Chen Cheng,
Zhaokan Cheng,
Xiangyi Cui,
Yingjie Fan,
Deqing Fang,
Lisheng Geng,
Karl Giboni,
Linhui Gu,
Xunan Guo,
Xuyuan Guo,
Zhichao Guo,
Chencheng Han,
Ke Han,
Changda He,
**rong He,
Di Huang,
Junting Huang,
Zhou Huang,
Ruquan Hou,
Yu Hou
, et al. (68 additional authors not shown)
Abstract:
We propose a major upgrade to the existing PandaX-4T experiment in the China **** Underground Laboratory. The new experiment, PandaX-xT, will be a multi-ten-tonne liquid xenon, ultra-low background, and general-purpose observatory. The full-scaled PandaX-xT contains a 43-tonne liquid xenon active target. Such an experiment will significantly advance our fundamental understanding of particle phy…
▽ More
We propose a major upgrade to the existing PandaX-4T experiment in the China **** Underground Laboratory. The new experiment, PandaX-xT, will be a multi-ten-tonne liquid xenon, ultra-low background, and general-purpose observatory. The full-scaled PandaX-xT contains a 43-tonne liquid xenon active target. Such an experiment will significantly advance our fundamental understanding of particle physics and astrophysics. The sensitivity of dark matter direct detection will be improved by nearly two orders of magnitude compared to the current best limits, approaching the so-called "neutrino floor" for a dark matter mass above 10 GeV/$c^2$, providing a decisive test to the Weakly Interacting Massive Particle paradigm. By searching for the neutrinoless double beta decay of $^{136}$Xe isotope in the detector, the effective Majorana neutrino mass can be measured to a [10 -- 41] meV/$c^2$ sensitivity, providing a key test to the Dirac/Majorana nature of neutrino s. Astrophysical neutrinos and other ultra-rare interactions can also be measured and searched for with an unprecedented background level, opening up new windows of discovery. Depending on the findings, PandaX-xT will seek the next stage upgrade utilizing isotopic separation on natural xenon.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models
Authors:
Zhiyuan Hu,
Chumin Liu,
Xidong Feng,
Yilun Zhao,
See-Kiong Ng,
Anh Tuan Luu,
Junxian He,
Pang Wei Koh,
Bryan Hooi
Abstract:
In the face of uncertainty, the ability to *seek information* is of fundamental importance. In many practical applications, such as medical diagnosis and troubleshooting, the information needed to solve the task is not initially given and has to be actively sought by asking follow-up questions (for example, a doctor asking a patient for more details about their symptoms). In this work, we introduc…
▽ More
In the face of uncertainty, the ability to *seek information* is of fundamental importance. In many practical applications, such as medical diagnosis and troubleshooting, the information needed to solve the task is not initially given and has to be actively sought by asking follow-up questions (for example, a doctor asking a patient for more details about their symptoms). In this work, we introduce Uncertainty of Thoughts (UoT), an algorithm to augment large language models with the ability to actively seek information by asking effective questions. UoT combines 1) an *uncertainty-aware simulation approach* which enables the model to simulate possible future scenarios and how likely they are to occur, 2) *uncertainty-based rewards* motivated by information gain which incentivizes the model to seek information, and 3) a *reward propagation scheme* to select the optimal question to ask in a way that maximizes the expected reward. In experiments on medical diagnosis, troubleshooting, and the `20 Questions` game, UoT achieves an average performance improvement of 38.1% in the rate of successful task completion across multiple LLMs compared with direct prompting and also improves efficiency (i.e., the number of questions needed to complete the task). Our code has been released [here](https://github.com/zhiyuanhubj/UoT)
△ Less
Submitted 30 May, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Data-induced multiscale losses and efficient multirate gradient descent schemes
Authors:
Juncai He,
Liangchen Liu,
Yen-Hsi Richard Tsai
Abstract:
This paper investigates the impact of multiscale data on machine learning algorithms, particularly in the context of deep learning. A dataset is multiscale if its distribution shows large variations in scale across different directions. This paper reveals multiscale structures in the loss landscape, including its gradients and Hessians inherited from the data. Correspondingly, it introduces a nove…
▽ More
This paper investigates the impact of multiscale data on machine learning algorithms, particularly in the context of deep learning. A dataset is multiscale if its distribution shows large variations in scale across different directions. This paper reveals multiscale structures in the loss landscape, including its gradients and Hessians inherited from the data. Correspondingly, it introduces a novel gradient descent approach, drawing inspiration from multiscale algorithms used in scientific computing. This approach seeks to transcend empirical learning rate selection, offering a more systematic, data-informed strategy to enhance training efficiency, especially in the later stages.
△ Less
Submitted 6 February, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Production of open-charm pentaquark molecules in decay $B^0 \rightarrow \bar{D}^0 p \bar{p}$
Authors:
Shu-Yi Kong,
Jun-Tao Zhu,
Shu Chen,
Jun He
Abstract:
This study explores the production of open-charm pentaquark molecular states, specifically $N\bar{D}^*$ and $\bar{N}\bar{D}^*$, within the $B^0 \rightarrow \bar{D}^0 p \bar{p}$ decay process. We analyze the invariant mass spectrum of $p\bar{D}^0$ and $\bar{p}\bar{D}^0$, incorporating the rescattering process calculated using a quasipotential Bethe-Salpeter equation approach. Our findings suggest t…
▽ More
This study explores the production of open-charm pentaquark molecular states, specifically $N\bar{D}^*$ and $\bar{N}\bar{D}^*$, within the $B^0 \rightarrow \bar{D}^0 p \bar{p}$ decay process. We analyze the invariant mass spectrum of $p\bar{D}^0$ and $\bar{p}\bar{D}^0$, incorporating the rescattering process calculated using a quasipotential Bethe-Salpeter equation approach. Our findings suggest the potential identification of the isoscalar $\bar{N}\bar{D}^*$ molecule with $3/2^+$, serving as the antiparticle partner of the $Λ_c(2940)$, in the $\bar{p}\bar{D}^0$ mass distribution. Additionally, distinctive signals of the isovector $N\bar{D}^*$ molecule with $1/2^-$ may emerge in the $p\bar{D}^0$ invariant mass distribution. We highlight the significance of the three-body decay of the bottom meson as a valuable avenue for studying open-charm molecules and advocate for increased attention and more precise experimental measurements of the $B^0 \rightarrow \bar{D}^0 p \bar{p}$ process.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
Multi-modal Causal Structure Learning and Root Cause Analysis
Authors:
Lecheng Zheng,
Zhengzhang Chen,
**grui He,
Haifeng Chen
Abstract:
Effective root cause analysis (RCA) is vital for swiftly restoring services, minimizing losses, and ensuring the smooth operation and management of complex systems. Previous data-driven RCA methods, particularly those employing causal discovery techniques, have primarily focused on constructing dependency or causal graphs for backtracking the root causes. However, these methods often fall short as…
▽ More
Effective root cause analysis (RCA) is vital for swiftly restoring services, minimizing losses, and ensuring the smooth operation and management of complex systems. Previous data-driven RCA methods, particularly those employing causal discovery techniques, have primarily focused on constructing dependency or causal graphs for backtracking the root causes. However, these methods often fall short as they rely solely on data from a single modality, thereby resulting in suboptimal solutions. In this work, we propose Mulan, a unified multi-modal causal structure learning method for root cause localization. We leverage a log-tailored language model to facilitate log representation learning, converting log sequences into time-series data. To explore intricate relationships across different modalities, we propose a contrastive learning-based approach to extract modality-invariant and modality-specific representations within a shared latent space. Additionally, we introduce a novel key performance indicator-aware attention mechanism for assessing modality reliability and co-learning a final causal graph. Finally, we employ random walk with restart to simulate system fault propagation and identify potential root causes. Extensive experiments on three real-world datasets validate the effectiveness of our proposed framework.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
Measurements of the branching fraction ratio $\cal{B}(φ\to μ^+μ^-)/\cal{B}(φ\to e^+e^-)$ with charm meson decays
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
A. Alfonso Albero,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1080 additional authors not shown)
Abstract:
Measurements of the branching fraction ratio ${\cal{B}(φ\to μ^+ μ^-)/\cal{B}(φ\to e^+e^-)}$ with ${D_{s}^{+} \to π^{+} φ}$ and ${D^{+} \to π^{+} φ}$ decays, denoted $R^{s}_{φπ}$ and $R^{d}_{φπ}$, are presented. The analysis is performed using a dataset corresponding to an integrated luminosity of 5.4$\,\rm{fb}^{-1}$ of $pp$ collision data collected with the LHCb experiment. The branching fractions…
▽ More
Measurements of the branching fraction ratio ${\cal{B}(φ\to μ^+ μ^-)/\cal{B}(φ\to e^+e^-)}$ with ${D_{s}^{+} \to π^{+} φ}$ and ${D^{+} \to π^{+} φ}$ decays, denoted $R^{s}_{φπ}$ and $R^{d}_{φπ}$, are presented. The analysis is performed using a dataset corresponding to an integrated luminosity of 5.4$\,\rm{fb}^{-1}$ of $pp$ collision data collected with the LHCb experiment. The branching fractions are normalised with respect to the ${B^{+} \to K^{+} J/ψ(\to e^+e^-)}$ and ${B^{+} \to K^{+} J/ψ(\to μ^+μ^-)}$ decay modes. The combination of the results yields $$ R_{φπ} = 1.022 \pm 0.012 \,({\rm stat}) \, \pm 0.048 \,({\rm syst}). $$ The result is compatible with previous measurements of the $φ\to \ell^{+}\ell^{-}$ branching fractions and predictions based on the Standard Model.
△ Less
Submitted 1 May, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
Schur rings over Free Abelian Group of Rank Two
Authors:
Gang Chen,
Jiawei He,
Zhiman Wu
Abstract:
Schur rings are a type of subrings of group rings afforded by a partition of the underlined group. In this paper, Schur rings over free abelian group of rank two are classified under the assumption that one of the direct factor is a union of some basic sets. There are eight different types, and all but one type of which are traditional.
Schur rings are a type of subrings of group rings afforded by a partition of the underlined group. In this paper, Schur rings over free abelian group of rank two are classified under the assumption that one of the direct factor is a union of some basic sets. There are eight different types, and all but one type of which are traditional.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Deeper or Wider: A Perspective from Optimal Generalization Error with Sobolev Loss
Authors:
Yahong Yang,
Juncai He
Abstract:
Constructing the architecture of a neural network is a challenging pursuit for the machine learning community, and the dilemma of whether to go deeper or wider remains a persistent question. This paper explores a comparison between deeper neural networks (DeNNs) with a flexible number of layers and wider neural networks (WeNNs) with limited hidden layers, focusing on their optimal generalization e…
▽ More
Constructing the architecture of a neural network is a challenging pursuit for the machine learning community, and the dilemma of whether to go deeper or wider remains a persistent question. This paper explores a comparison between deeper neural networks (DeNNs) with a flexible number of layers and wider neural networks (WeNNs) with limited hidden layers, focusing on their optimal generalization error in Sobolev losses. Analytical investigations reveal that the architecture of a neural network can be significantly influenced by various factors, including the number of sample points, parameters within the neural networks, and the regularity of the loss function. Specifically, a higher number of parameters tends to favor WeNNs, while an increased number of sample points and greater regularity in the loss function lean towards the adoption of DeNNs. We ultimately apply this theory to address partial differential equations using deep Ritz and physics-informed neural network (PINN) methods, guiding the design of neural networks.
△ Less
Submitted 12 May, 2024; v1 submitted 31 January, 2024;
originally announced February 2024.
-
Study of $CP$ violation in $B^0_{(s)} \to D K^{*}(892)^0$ decays with $D \to K π( ππ)$, $ ππ( ππ)$, and $KK$ final states
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
A. Alfonso Albero,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1072 additional authors not shown)
Abstract:
A measurement of $CP$-violating observables associated with the interference of $B^0\to D^0 K^{*}(892)^0$ and $B^0\to \bar{D}^0 K^*(892)^0$ decay amplitudes is performed in the $D^0 \to K^{\mp}π^{\pm}(π^+π^-),$ $D^0 \to π^+π^-(π^+π^-)$, and $D^0\to K^+K^-$ final states using data collected by the LHCb experiment corresponding to an integrated luminosity of $9$ $\text{fb}^{-1}$. $CP$-violating obse…
▽ More
A measurement of $CP$-violating observables associated with the interference of $B^0\to D^0 K^{*}(892)^0$ and $B^0\to \bar{D}^0 K^*(892)^0$ decay amplitudes is performed in the $D^0 \to K^{\mp}π^{\pm}(π^+π^-),$ $D^0 \to π^+π^-(π^+π^-)$, and $D^0\to K^+K^-$ final states using data collected by the LHCb experiment corresponding to an integrated luminosity of $9$ $\text{fb}^{-1}$. $CP$-violating observables related to the interference of $B^0_s\to D^0 \bar{K}^*(892)^0$ and $B_s^0\to \bar{D}^0 \bar{K}^*(892)^0$ are also measured, but no evidence for interference is found. The $B^0$ observables are used to constrain the parameter space of the CKM angle $γ$ and the hadronic parameters $r_{B^0}^{DK^*}$ and $δ_{B^0}^{DK^*}$ with inputs from other measurements. In a combined analysis, these measurements allow for four solutions in the parameter space, only one of which is consistent with the world average.
△ Less
Submitted 13 May, 2024; v1 submitted 31 January, 2024;
originally announced January 2024.
-
CyberCardia: Patient-specific electrophysiological heart model for assisting left atrium arrhythmia ablation
Authors:
Jiyue He
Abstract:
Atrial arrhythmia can be categorized into tachycardia, flutter, and fibrillation. Atrial fibrillation is a prevalent heart disease that results in weak and irregular contractions of the atria. It affects millions people worldwide and contributes to hundreds of thousands deaths annually. Cardiac ablation is among the most successful treatment options, involving the use of radio frequency energy to…
▽ More
Atrial arrhythmia can be categorized into tachycardia, flutter, and fibrillation. Atrial fibrillation is a prevalent heart disease that results in weak and irregular contractions of the atria. It affects millions people worldwide and contributes to hundreds of thousands deaths annually. Cardiac ablation is among the most successful treatment options, involving the use of radio frequency energy to kill diseased cells or create lesion lines that obstruct abnormal activation waves. During the procedure, catheters are inserted into the left atrium to map the atrium geometry and record endocardium electrograms that are then converted into electroanatomical maps to pinpoint the arrhythmia source locations.
However, identifying these sources is challenging. The electrograms are asynchronous and can be susceptible to noise. The spatial distribution of sampling sites is non-uniform, which leads to inaccurate maps. Identifying arrhythmia source locations is not a trivial task. Therefore, an ablation procedure often lasts from 3 to 6 hours, and arrhythmia recurrence within 12 months after first ablation is around 50%. To address these challenges, we developed an integrated computational heart mode for clinical left atrium arrhythmia ablation. Our system takes in the left atrium geometry and electrograms, processes them to extract regional tissue properties, which are used to tune a heart model, creating a patient-specific whole-atrium model. With this model, we can simulate and detect arrhythmia sources, and provide ablation assistance. To build such a system, we investigated the fiber effects on atrial activation patterns. We developed a fast heart model tuning method which takes only a few seconds of computation time on a personal computer, enabling real-time assistance during the ablation procedure. We achieved high accuracy in simulating arrhythmias, which we validated on patient data.
△ Less
Submitted 6 January, 2024;
originally announced January 2024.
-
Physical Priors Augmented Event-Based 3D Reconstruction
Authors:
Jiaxu Wang,
Junhao He,
Ziyi Zhang,
Ren**g Xu
Abstract:
3D neural implicit representations play a significant component in many robotic applications. However, reconstructing neural radiance fields (NeRF) from realistic event data remains a challenge due to the sparsities and the lack of information when only event streams are available. In this paper, we utilize motion, geometry, and density priors behind event data to impose strong physical constraint…
▽ More
3D neural implicit representations play a significant component in many robotic applications. However, reconstructing neural radiance fields (NeRF) from realistic event data remains a challenge due to the sparsities and the lack of information when only event streams are available. In this paper, we utilize motion, geometry, and density priors behind event data to impose strong physical constraints to augment NeRF training. The proposed novel pipeline can directly benefit from those priors to reconstruct 3D scenes without additional inputs. Moreover, we present a novel density-guided patch-based sampling strategy for robust and efficient learning, which not only accelerates training procedures but also conduces to expressions of local geometries. More importantly, we establish the first large dataset for event-based 3D reconstruction, which contains 101 objects with various materials and geometries, along with the groundtruth of images and depth maps for all camera viewpoints, which significantly facilitates other research in the related fields. The code and dataset will be publicly available at https://github.com/Mercerai/PAEv3d.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Type-based Neural Link Prediction Adapter for Complex Query Answering
Authors:
Lingning Song,
Yi Zu,
Shan Lu,
Jieyue He
Abstract:
Answering complex logical queries on incomplete knowledge graphs (KGs) is a fundamental and challenging task in multi-hop reasoning. Recent work defines this task as an end-to-end optimization problem, which significantly reduces the training cost and enhances the generalization of the model by a pretrained link predictors for query answering. However, most existing proposals ignore the critical s…
▽ More
Answering complex logical queries on incomplete knowledge graphs (KGs) is a fundamental and challenging task in multi-hop reasoning. Recent work defines this task as an end-to-end optimization problem, which significantly reduces the training cost and enhances the generalization of the model by a pretrained link predictors for query answering. However, most existing proposals ignore the critical semantic knowledge inherently available in KGs, such as type information, which could help answer complex logical queries. To this end, we propose TypE-based Neural Link Prediction Adapter (TENLPA), a novel model that constructs type-based entity-relation graphs to discover the latent relationships between entities and relations by leveraging type information in KGs. Meanwhile, in order to effectively combine type information with complex logical queries, an adaptive learning mechanism is introduced, which is trained by back-propagating during the complex query answering process to achieve adaptive adjustment of neural link predictors. Experiments on 3 standard datasets show that TENLPA model achieves state-of-the-art performance on complex query answering with good generalization and robustness.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
TransTroj: Transferable Backdoor Attacks to Pre-trained Models via Embedding Indistinguishability
Authors:
Hao Wang,
Tao Xiang,
Shangwei Guo,
Jialing He,
Hangcheng Liu,
Tianwei Zhang
Abstract:
Pre-trained models (PTMs) are extensively utilized in various downstream tasks. Adopting untrusted PTMs may suffer from backdoor attacks, where the adversary can compromise the downstream models by injecting backdoors into the PTM. However, existing backdoor attacks to PTMs can only achieve partially task-agnostic and the embedded backdoors are easily erased during the fine-tuning process. In this…
▽ More
Pre-trained models (PTMs) are extensively utilized in various downstream tasks. Adopting untrusted PTMs may suffer from backdoor attacks, where the adversary can compromise the downstream models by injecting backdoors into the PTM. However, existing backdoor attacks to PTMs can only achieve partially task-agnostic and the embedded backdoors are easily erased during the fine-tuning process. In this paper, we propose a novel transferable backdoor attack, TransTroj, to simultaneously meet functionality-preserving, durable, and task-agnostic. In particular, we first formalize transferable backdoor attacks as the indistinguishability problem between poisoned and clean samples in the embedding space. We decompose the embedding indistinguishability into pre- and post-indistinguishability, representing the similarity of the poisoned and reference embeddings before and after the attack. Then, we propose a two-stage optimization that separately optimizes triggers and victim PTMs to achieve embedding indistinguishability. We evaluate TransTroj on four PTMs and six downstream tasks. Experimental results show that TransTroj significantly outperforms SOTA task-agnostic backdoor attacks (18%$\sim$99%, 68% on average) and exhibits superior performance under various system settings. The code is available at https://github.com/haowang-cqu/TransTroj .
△ Less
Submitted 28 January, 2024;
originally announced January 2024.
-
A semidefinite programming approach for robust elliptic localization
Authors:
Wenxin Xiong,
Jiajun He,
Zhang-Lei Shi,
Keyuan Hu,
Hing Cheung So,
Chi-Sing Leung
Abstract:
This short communication addresses the problem of elliptic localization with outlier measurements, whose occurrences are prevalent in various location-enabled applications and can significantly compromise the positioning performance if not adequately handled. In contrast to the reliance on $M$-estimation adopted in the majority of existing solutions, we take a different path, specifically explorin…
▽ More
This short communication addresses the problem of elliptic localization with outlier measurements, whose occurrences are prevalent in various location-enabled applications and can significantly compromise the positioning performance if not adequately handled. In contrast to the reliance on $M$-estimation adopted in the majority of existing solutions, we take a different path, specifically exploring the worst-case robust approximation criterion, to bolster resistance of the elliptic location estimator against outliers. From a geometric standpoint, our method boils down to pinpointing the Chebyshev center of the feasible set determined by the available bistatic ranges with bounded measurement errors. For a practical approach to the associated min-max problem, we convert it into the well-established convex optimization framework of semidefinite programming (SDP). Numerical simulations confirm that our SDP-based technique can outperform a number of existing elliptic localization schemes in terms of positioning accuracy in Gaussian mixture noise, a common type of impulsive interference in the context of range-based localization.
△ Less
Submitted 28 January, 2024;
originally announced January 2024.
-
Multi-Trigger Backdoor Attacks: More Triggers, More Threats
Authors:
Yige Li,
Xingjun Ma,
Jiabo He,
Hanxun Huang,
Yu-Gang Jiang
Abstract:
Backdoor attacks have emerged as a primary threat to (pre-)training and deployment of deep neural networks (DNNs). While backdoor attacks have been extensively studied in a body of works, most of them were focused on single-trigger attacks that poison a dataset using a single type of trigger. Arguably, real-world backdoor attacks can be much more complex, e.g., the existence of multiple adversarie…
▽ More
Backdoor attacks have emerged as a primary threat to (pre-)training and deployment of deep neural networks (DNNs). While backdoor attacks have been extensively studied in a body of works, most of them were focused on single-trigger attacks that poison a dataset using a single type of trigger. Arguably, real-world backdoor attacks can be much more complex, e.g., the existence of multiple adversaries for the same dataset if it is of high value. In this work, we investigate the practical threat of backdoor attacks under the setting of \textbf{multi-trigger attacks} where multiple adversaries leverage different types of triggers to poison the same dataset. By proposing and investigating three types of multi-trigger attacks, including parallel, sequential, and hybrid attacks, we provide a set of important understandings of the coexisting, overwriting, and cross-activating effects between different triggers on the same dataset. Moreover, we show that single-trigger attacks tend to cause overly optimistic views of the security of current defense techniques, as all examined defense methods struggle to defend against multi-trigger attacks. Finally, we create a multi-trigger backdoor poisoning dataset to help future evaluation of backdoor attacks and defenses. Although our work is purely empirical, we hope it can help steer backdoor research toward more realistic settings.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
LIV-GaussMap: LiDAR-Inertial-Visual Fusion for Real-time 3D Radiance Field Map Rendering
Authors:
Sheng Hong,
Junjie He,
Xinhu Zheng,
Chunran Zheng,
Shaojie Shen
Abstract:
We introduce an integrated precise LiDAR, Inertial, and Visual (LIV) multimodal sensor fused map** system that builds on the differentiable \pre{surface splatting }\now{Gaussians} to improve the map** fidelity, quality, and structural accuracy. Notably, this is also a novel form of tightly coupled map for LiDAR-visual-inertial sensor fusion.
This system leverages the complementary characteri…
▽ More
We introduce an integrated precise LiDAR, Inertial, and Visual (LIV) multimodal sensor fused map** system that builds on the differentiable \pre{surface splatting }\now{Gaussians} to improve the map** fidelity, quality, and structural accuracy. Notably, this is also a novel form of tightly coupled map for LiDAR-visual-inertial sensor fusion.
This system leverages the complementary characteristics of LiDAR and visual data to capture the geometric structures of large-scale 3D scenes and restore their visual surface information with high fidelity. The initialization for the scene's surface Gaussians and the sensor's poses of each frame are obtained using a LiDAR-inertial system with the feature of size-adaptive voxels. Then, we optimized and refined the Gaussians using visual-derived photometric gradients to optimize their quality and density.
Our method is compatible with various types of LiDAR, including solid-state and mechanical LiDAR, supporting both repetitive and non-repetitive scanning modes. Bolstering structure construction through LiDAR and facilitating real-time generation of photorealistic renderings across diverse LIV datasets. It showcases notable resilience and versatility in generating real-time photorealistic scenes potentially for digital twins and virtual reality, while also holding potential applicability in real-time SLAM and robotics domains.
We release our software and hardware and self-collected datasets to benefit the community.
△ Less
Submitted 16 May, 2024; v1 submitted 26 January, 2024;
originally announced January 2024.
-
On the Scarcity of Dense Cores ($n>10^{5}$ cm$^{-3}$) in High Latitude Planck Galactic Cold Clumps
Authors:
Fengwei Xu,
Ke Wang,
Tie Liu,
David Eden,
Xunchuan Liu,
Mika Juvela,
**hua He,
Doug Johnstone,
Paul Goldsmith,
Guido Garay,
Yuefang Wu,
Archana Soam,
Alessio Traficante,
Isabelle Ristorcelli,
Edith Falgarone,
Huei-Ru Vivien Chen,
Naomi Hirano,
Yasuo Doi,
Woo** Kwon,
Glenn J. White,
Anthony Whitworth,
Patricio Sanhueza,
Mark G. Rawlings,
Dana Alina,
Zhiyuan Ren
, et al. (12 additional authors not shown)
Abstract:
High-latitude ($|b|>30^{\circ}$) molecular clouds have virial parameters that exceed 1, but whether these clouds can form stars has not been studied systematically. Using JCMT SCUBA-2 archival data, we surveyed 70 fields that target high-latitude Planck galactic cold clumps (HLPCs) to find dense cores with density of $10^{5}$-$10^{6}$ cm$^{-3}$ and size of $<0.1$ pc. The sample benefits from both…
▽ More
High-latitude ($|b|>30^{\circ}$) molecular clouds have virial parameters that exceed 1, but whether these clouds can form stars has not been studied systematically. Using JCMT SCUBA-2 archival data, we surveyed 70 fields that target high-latitude Planck galactic cold clumps (HLPCs) to find dense cores with density of $10^{5}$-$10^{6}$ cm$^{-3}$ and size of $<0.1$ pc. The sample benefits from both the representativeness of the parent sample and covering densest clumps at the high column density end ($>1\times10^{21}$ cm$^{-2}$). At an average noise rms of 15 mJy/beam, we detected Galactic dense cores in only one field, G6.04+36.77 (L183), while also identifying 12 extragalactic objects and two young stellar objects. Compared to the low-latitude clumps, dense cores are scarce in HLPCs. With synthetic observations, the densities of cores are constrained to be $n_c\lesssim10^5$ cm$^{-3}$, should they exist in HLPCs. Low-latitude clumps, Taurus clumps, and HLPCs form a sequence where a higher virial parameter corresponds to a lower dense core detection rate. If HLPCs were affected by the Local Bubble, the scarcity should favor turbulence-inhibited rather than supernova-driven star formation. Studies of the formation mechanism of the L183 molecular cloud are warranted.
△ Less
Submitted 22 February, 2024; v1 submitted 26 January, 2024;
originally announced January 2024.
-
GOAt: Explaining Graph Neural Networks via Graph Output Attribution
Authors:
Shengyao Lu,
Keith G. Mills,
Jiao He,
Bang Liu,
Di Niu
Abstract:
Understanding the decision-making process of Graph Neural Networks (GNNs) is crucial to their interpretability. Most existing methods for explaining GNNs typically rely on training auxiliary models, resulting in the explanations remain black-boxed. This paper introduces Graph Output Attribution (GOAt), a novel method to attribute graph outputs to input graph features, creating GNN explanations tha…
▽ More
Understanding the decision-making process of Graph Neural Networks (GNNs) is crucial to their interpretability. Most existing methods for explaining GNNs typically rely on training auxiliary models, resulting in the explanations remain black-boxed. This paper introduces Graph Output Attribution (GOAt), a novel method to attribute graph outputs to input graph features, creating GNN explanations that are faithful, discriminative, as well as stable across similar samples. By expanding the GNN as a sum of scalar products involving node features, edge features and activation patterns, we propose an efficient analytical method to compute contribution of each node or edge feature to each scalar product and aggregate the contributions from all scalar products in the expansion form to derive the importance of each node and edge. Through extensive experiments on synthetic and real-world data, we show that our method not only outperforms various state-ofthe-art GNN explainers in terms of the commonly used fidelity metric, but also exhibits stronger discriminability, and stability by a remarkable margin.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Design Principles for Generative AI Applications
Authors:
Justin D. Weisz,
Jessica He,
Michael Muller,
Gabriela Hoefer,
Rachel Miles,
Werner Geyer
Abstract:
Generative AI applications present unique design challenges. As generative AI technologies are increasingly being incorporated into mainstream applications, there is an urgent need for guidance on how to design user experiences that foster effective and safe use. We present six principles for the design of generative AI applications that address unique characteristics of generative AI UX and offer…
▽ More
Generative AI applications present unique design challenges. As generative AI technologies are increasingly being incorporated into mainstream applications, there is an urgent need for guidance on how to design user experiences that foster effective and safe use. We present six principles for the design of generative AI applications that address unique characteristics of generative AI UX and offer new interpretations and extensions of known issues in the design of AI applications. Each principle is coupled with a set of design strategies for implementing that principle via UX capabilities or through the design process. The principles and strategies were developed through an iterative process involving literature review, feedback from design practitioners, validation against real-world generative AI applications, and incorporation into the design process of two generative AI applications. We anticipate the principles to usefully inform the design of generative AI applications by driving actionable design recommendations.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
QCD analysis of the $P$-wave charmonium electromagnetic Dalitz decays $h_{c}\rightarrowη^{(\prime)}\ell^{+}\ell^{-}$
Authors:
Chao-Jie Fan,
Jun-Kang He
Abstract:
The $P$-wave charmonium electromagnetic Dalitz decays $h_{c}\rightarrowη^{(\prime)}\ell^{+}\ell^{-}$ $(\ell=e, μ)$ with large recoil momentum are investigated in the framework of perturbative QCD, and the contributions from the small recoil momentum region are described by the overlap of soft wave functions. The transition form factors $f_{h_{c}η^{(\prime)}}(q^{2})$ and the normalized transition f…
▽ More
The $P$-wave charmonium electromagnetic Dalitz decays $h_{c}\rightarrowη^{(\prime)}\ell^{+}\ell^{-}$ $(\ell=e, μ)$ with large recoil momentum are investigated in the framework of perturbative QCD, and the contributions from the small recoil momentum region are described by the overlap of soft wave functions. The transition form factors $f_{h_{c}η^{(\prime)}}(q^{2})$ and the normalized transition form factors $F_{h_{c} η^{(\prime)}}(q^{2})$ in full kinematic region are derived for the first time. It is noticed that there are no IR divergences at one-loop level, and the transition form factors with the relativistic corrections from the internal momentum of $h_{c}$ are insensitive to both the shapes of $η^{(\prime)}$ distribution amplitudes and the invariant mass of the lepton pair in the large recoil momentum region. Intriguingly, unlike the situation in the $S$-wave charmonium decays $J/ψ\rightarrowη^{(\prime)}\ell^{+}\ell^{-}$, we find the contributions from the small recoil momentum region are comparable with those from the large recoil momentum region in the $P$-wave charmonium decays $h_{c}\rightarrowη^{(\prime)}\ell^{+}\ell^{-}$. By employing the obtained $F_{h_{c} η^{(\prime)}}(q^{2})$, we give the predictions of the branching ratios $\mathcal{B}(h_{c}\rightarrowη^{(\prime)}\ell^{+}\ell^{-})$, which may come within the range of measurement of present or near-future experiments.
△ Less
Submitted 9 April, 2024; v1 submitted 25 January, 2024;
originally announced January 2024.
-
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Authors:
Fanghua Yu,
**** Gu,
Zheyuan Li,
**fan Hu,
Xiangtao Kong,
Xintao Wang,
**gwen He,
Yu Qiao,
Chao Dong
Abstract:
We introduce SUPIR (Scaling-UP Image Restoration), a groundbreaking image restoration method that harnesses generative prior and the power of model scaling up. Leveraging multi-modal techniques and advanced generative prior, SUPIR marks a significant advance in intelligent and realistic image restoration. As a pivotal catalyst within SUPIR, model scaling dramatically enhances its capabilities and…
▽ More
We introduce SUPIR (Scaling-UP Image Restoration), a groundbreaking image restoration method that harnesses generative prior and the power of model scaling up. Leveraging multi-modal techniques and advanced generative prior, SUPIR marks a significant advance in intelligent and realistic image restoration. As a pivotal catalyst within SUPIR, model scaling dramatically enhances its capabilities and demonstrates new potential for image restoration. We collect a dataset comprising 20 million high-resolution, high-quality images for model training, each enriched with descriptive text annotations. SUPIR provides the capability to restore images guided by textual prompts, broadening its application scope and potential. Moreover, we introduce negative-quality prompts to further improve perceptual quality. We also develop a restoration-guided sampling method to suppress the fidelity issue encountered in generative-based restoration. Experiments demonstrate SUPIR's exceptional restoration effects and its novel capacity to manipulate restoration through textual prompts.
△ Less
Submitted 3 April, 2024; v1 submitted 24 January, 2024;
originally announced January 2024.
-
DenoSent: A Denoising Objective for Self-Supervised Sentence Representation Learning
Authors:
Xinghao Wang,
Junliang He,
Pengyu Wang,
Yunhua Zhou,
Tianxiang Sun,
Xipeng Qiu
Abstract:
Contrastive-learning-based methods have dominated sentence representation learning. These methods regularize the representation space by pulling similar sentence representations closer and pushing away the dissimilar ones and have been proven effective in various NLP tasks, e.g., semantic textual similarity (STS) tasks. However, it is challenging for these methods to learn fine-grained semantics a…
▽ More
Contrastive-learning-based methods have dominated sentence representation learning. These methods regularize the representation space by pulling similar sentence representations closer and pushing away the dissimilar ones and have been proven effective in various NLP tasks, e.g., semantic textual similarity (STS) tasks. However, it is challenging for these methods to learn fine-grained semantics as they only learn from the inter-sentence perspective, i.e., their supervision signal comes from the relationship between data samples. In this work, we propose a novel denoising objective that inherits from another perspective, i.e., the intra-sentence perspective. By introducing both discrete and continuous noise, we generate noisy sentences and then train our model to restore them to their original form. Our empirical evaluations demonstrate that this approach delivers competitive results on both semantic textual similarity (STS) and a wide range of transfer tasks, standing up well in comparison to contrastive-learning-based methods. Notably, the proposed intra-sentence denoising objective complements existing inter-sentence contrastive methodologies and can be integrated with them to further enhance performance. Our code is available at https://github.com/xinghaow99/DenoSent.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
MF-AED-AEC: Speech Emotion Recognition by Leveraging Multimodal Fusion, Asr Error Detection, and Asr Error Correction
Authors:
Jiajun He,
Xiaohan Shi,
Xingfeng Li,
Tomoki Toda
Abstract:
The prevalent approach in speech emotion recognition (SER) involves integrating both audio and textual information to comprehensively identify the speaker's emotion, with the text generally obtained through automatic speech recognition (ASR). An essential issue of this approach is that ASR errors from the text modality can worsen the performance of SER. Previous studies have proposed using an auxi…
▽ More
The prevalent approach in speech emotion recognition (SER) involves integrating both audio and textual information to comprehensively identify the speaker's emotion, with the text generally obtained through automatic speech recognition (ASR). An essential issue of this approach is that ASR errors from the text modality can worsen the performance of SER. Previous studies have proposed using an auxiliary ASR error detection task to adaptively assign weights of each word in ASR hypotheses. However, this approach has limited improvement potential because it does not address the coherence of semantic information in the text. Additionally, the inherent heterogeneity of different modalities leads to distribution gaps between their representations, making their fusion challenging. Therefore, in this paper, we incorporate two auxiliary tasks, ASR error detection (AED) and ASR error correction (AEC), to enhance the semantic coherence of ASR text, and further introduce a novel multi-modal fusion (MF) method to learn shared representations across modalities. We refer to our method as MF-AED-AEC. Experimental results indicate that MF-AED-AEC significantly outperforms the baseline model by a margin of 4.1\%.
△ Less
Submitted 28 May, 2024; v1 submitted 24 January, 2024;
originally announced January 2024.
-
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents
Authors:
Chang Ma,
Junlei Zhang,
Zhihao Zhu,
Cheng Yang,
Yujiu Yang,
Yaohui **,
Zhenzhong Lan,
Lingpeng Kong,
Junxian He
Abstract:
Evaluating large language models (LLMs) as general-purpose agents is essential for understanding their capabilities and facilitating their integration into practical applications. However, the evaluation process presents substantial challenges. A primary obstacle is the benchmarking of agent performance across diverse scenarios within a unified framework, especially in maintaining partially-observ…
▽ More
Evaluating large language models (LLMs) as general-purpose agents is essential for understanding their capabilities and facilitating their integration into practical applications. However, the evaluation process presents substantial challenges. A primary obstacle is the benchmarking of agent performance across diverse scenarios within a unified framework, especially in maintaining partially-observable environments and ensuring multi-round interactions. Moreover, current evaluation frameworks mostly focus on the final success rate, revealing few insights during the process and failing to provide a deep understanding of the model abilities. To address these challenges, we introduce AgentBoard, a pioneering comprehensive benchmark and accompanied open-source evaluation framework tailored to analytical evaluation of LLM agents. AgentBoard offers a fine-grained progress rate metric that captures incremental advancements as well as a comprehensive evaluation toolkit that features easy assessment of agents for multi-faceted analysis through interactive visualization. This not only sheds light on the capabilities and limitations of LLM agents but also propels the interpretability of their performance to the forefront. Ultimately, AgentBoard serves as a significant step towards demystifying agent behaviors and accelerating the development of stronger LLM agents.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
Nonreciprocal charge transport in the titanium sesquioxide heterointerface superconductor
Authors:
Peng Dong,
Lijie Wang,
Guanqun Zhang,
Jiadian He,
Yiwen Zhang,
Yifan Ding,
Xiaohui Zeng,
**ghui Wang,
Xiang Zhou,
Yueshen Wu,
Wei Li,
Jun Li
Abstract:
Nonreciprocal charge transport in heterostructural superconductors exhibits appealing quantum physical phenomena and holds the promising potential for superconducting circuits applications. Realizing a nonreciprocity is, however, fundamentally and technologically challenging, as it requires a material structure without a centre of inversion, which is scarce among superconducting materials. Here, w…
▽ More
Nonreciprocal charge transport in heterostructural superconductors exhibits appealing quantum physical phenomena and holds the promising potential for superconducting circuits applications. Realizing a nonreciprocity is, however, fundamentally and technologically challenging, as it requires a material structure without a centre of inversion, which is scarce among superconducting materials. Here, we report an evidence of helical superconductivity, in which the Rashba spin-orbit coupling induces momentum-dependent superconducting gap in the inversion symmetry breaking heterointerface superconductor consisting of Mott insulating Ti$_2$O$_3$ and polar semiconducting GaN. Remarkably, the nonlinear responses emerge in the superconducting transition regime, when the magnetic field is precisely aligned in-plane orientations perpendicular to the applied current. In particular, the observed nonreciprocal supercurrent is extremely sensitive to the direction of the magnetic field for 0.5 degree, suggestive of a crossover from a symmetry breaking state to a symmetric one. Our finding not only unveils the underlying rich physical properties in heterointerface superconductors, but also provides an exciting opportunity for the development of novel mesoscopic superconducting devices.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.