Search | arXiv e-print repository

doi 10.1007/s11432-023-4030-y

Towards Imbalanced Motion: Part-Decoupling Network for Video Portrait Segmentation

Authors: Tianshu Yu, Changqun Xia, Jia Li

Abstract: Video portrait segmentation (VPS), aiming at segmenting prominent foreground portraits from video frames, has received much attention in recent years. However, simplicity of existing VPS datasets leads to a limitation on extensive research of the task. In this work, we propose a new intricate large-scale Multi-scene Video Portrait Segmentation dataset MVPS consisting of 101 video clips in 7 scenar… ▽ More Video portrait segmentation (VPS), aiming at segmenting prominent foreground portraits from video frames, has received much attention in recent years. However, simplicity of existing VPS datasets leads to a limitation on extensive research of the task. In this work, we propose a new intricate large-scale Multi-scene Video Portrait Segmentation dataset MVPS consisting of 101 video clips in 7 scenario categories, in which 10,843 sampled frames are finely annotated at pixel level. The dataset has diverse scenes and complicated background environments, which is the most complex dataset in VPS to our best knowledge. Through the observation of a large number of videos with portraits during dataset construction, we find that due to the joint structure of human body, motion of portraits is part-associated, which leads that different parts are relatively independent in motion. That is, motion of different parts of the portraits is imbalanced. Towards this imbalance, an intuitive and reasonable idea is that different motion states in portraits can be better exploited by decoupling the portraits into parts. To achieve this, we propose a Part-Decoupling Network (PDNet) for video portrait segmentation. Specifically, an Inter-frame Part-Discriminated Attention (IPDA) module is proposed which unsupervisedly segments portrait into parts and utilizes different attentiveness on discriminative features specified to each different part. In this way, appropriate attention can be imposed to portrait parts with imbalanced motion to extract part-discriminated correlations, so that the portraits can be segmented more accurately. Experimental results demonstrate that our method achieves leading performance with the comparison to state-of-the-art methods. △ Less

Submitted 31 May, 2024; v1 submitted 31 July, 2023; originally announced July 2023.

Journal ref: Science China Information Sciences 67.7 (2024) 172104

arXiv:2307.12820 [pdf, other]

Diurnal modulation of electron recoils from DM-nucleon scattering through the Migdal effect

Authors: Mai Qiao, Chen Xia, Yu-Feng Zhou

Abstract: Halo dark matter (DM) particles could lose energy due to the scattering off nuclei within the Earth before reaching the underground detectors of DM direct detection experiments. This Earth shielding effect can result in diurnal modulation of the DM-induced recoil event rates observed underground due to the self-rotation of the Earth. For electron recoil signals from DM-electron scatterings, the cu… ▽ More Halo dark matter (DM) particles could lose energy due to the scattering off nuclei within the Earth before reaching the underground detectors of DM direct detection experiments. This Earth shielding effect can result in diurnal modulation of the DM-induced recoil event rates observed underground due to the self-rotation of the Earth. For electron recoil signals from DM-electron scatterings, the current experimental constraints are very stringent such that the diurnal modulation cannot be observed for halo DM. We propose a novel type of diurnal modulation effect: diurnal modulation in electron recoil signals induced by DM-nucleon scattering via the Migdal effect. We set so far the most stringent constraints on DM-nucleon scattering cross section via the Migdal effect for sub-GeV DM using the S2-only data of PandaX-II and PandaX-4T with improved simulations of the Earth shielding effect. Based on the updated constraints, we show that the Migdal effect induced diurnal modulation of electron events can still be significant in the low energy region, and can be probed by experiments such as PandaX-4T in the near future. △ Less

Submitted 1 November, 2023; v1 submitted 24 July, 2023; originally announced July 2023.

Comments: Comments on the Migdal effects added, figures and text improved, version accepted by JCAP

arXiv:2307.12650 [pdf, other]

Active Flow Control for Bluff Body Drag Reduction Using Reinforcement Learning with Partial Measurements

Authors: Chengwei Xia, Junjie Zhang, Eric C. Kerrigan, Georgios Rigas

Abstract: Active flow control for drag reduction with reinforcement learning (RL) is performed in the wake of a 2D square bluff body at laminar regimes with vortex shedding. Controllers parameterised by neural networks are trained to drive two blowing and suction jets that manipulate the unsteady flow. RL with full observability (sensors in the wake) successfully discovers a control policy which reduces the… ▽ More Active flow control for drag reduction with reinforcement learning (RL) is performed in the wake of a 2D square bluff body at laminar regimes with vortex shedding. Controllers parameterised by neural networks are trained to drive two blowing and suction jets that manipulate the unsteady flow. RL with full observability (sensors in the wake) successfully discovers a control policy which reduces the drag by suppressing the vortex shedding in the wake. However, a non-negligible performance degradation (~50% less drag reduction) is observed when the controller is trained with partial measurements (sensors on the body). To mitigate this effect, we propose an energy-efficient, dynamic, maximum entropy RL control scheme. First, an energy-efficiency-based reward function is proposed to optimise the energy consumption of the controller while maximising drag reduction. Second, the controller is trained with an augmented state consisting of both current and past measurements and actions, which can be formulated as a nonlinear autoregressive exogenous model, to alleviate the partial observability problem. Third, maximum entropy RL algorithms (Soft Actor Critic and Truncated Quantile Critics) which promote exploration and exploitation in a sample efficient way are used and discover near-optimal policies in the challenging case of partial measurements. Stabilisation of the vortex shedding is achieved in the near wake using only surface pressure measurements on the rear of the body, resulting in similar drag reduction as in the case with wake sensors. The proposed approach opens new avenues for dynamic flow control using partial measurements for realistic configurations. △ Less

Submitted 16 January, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

arXiv:2307.10643 [pdf]

Study of (3He, t) charge exchange reactions to isobaric analog states in inverse kinematics

Authors: Zhixuan He, Wenjuan Bu, Chaoyuan Xiao, Meng Li, Herun Yang, Bitao Hu, Yi Zhang

Abstract: The transition between isobaric analog states (IAS) in the (3He, t) charge exchange reaction presents a unique opportunity to access the isospin structure of the nuclei. In this study not only the Fermi transition but also the Gamow-Teller (G-T) transition of the IAS reaction were investigated for the 13,14C(3He, t) and 17,18,19,20O(3He, t) reactions, in order to explore the neutron number depende… ▽ More The transition between isobaric analog states (IAS) in the (3He, t) charge exchange reaction presents a unique opportunity to access the isospin structure of the nuclei. In this study not only the Fermi transition but also the Gamow-Teller (G-T) transition of the IAS reaction were investigated for the 13,14C(3He, t) and 17,18,19,20O(3He, t) reactions, in order to explore the neutron number dependence of the IAS reaction for the light neutron-rich nuclei. It was found that the G-T type IAS reaction also exhibited a significant dependence of the transition strength on the neutron number and the angular momentum configuration of the nuclei. Additionally, the inverse kinematics was also discussed for extracting the yields of the interested reaction channels in the proposed experiments on radioactive beams. The calculated triton yields demonstrated the capability of the proposed experiments to obtain meaningful results. △ Less

Submitted 20 July, 2023; originally announced July 2023.

arXiv:2307.09942 [pdf, other]

TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic Tree-Based Memory Network

Authors: Brandon Theodorou, Cao Xiao, Jimeng Sun

Abstract: Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment. In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials based on longitudinal patient electronic health records (EHR) data and eligibility criteria of clinical trials. However, they ei… ▽ More Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment. In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials based on longitudinal patient electronic health records (EHR) data and eligibility criteria of clinical trials. However, they either depend on trial-specific expert rules that cannot expand to other trials or perform matching at a very general level with a black-box model where the lack of interpretability makes the model results difficult to be adopted. To provide accurate and interpretable patient trial matching, we introduce a personalized dynamic tree-based memory network model named TREEMENT. It utilizes hierarchical clinical ontologies to expand the personalized patient representation learned from sequential EHR data, and then uses an attentional beam-search query learned from eligibility criteria embedding to offer a granular level of alignment for improved performance and interpretability. We evaluated TREEMENT against existing models on real-world datasets and demonstrated that TREEMENT outperforms the best baseline by 7% in terms of error reduction in criteria-level matching and achieves state-of-the-art results in its trial-level matching ability. Furthermore, we also show TREEMENT can offer good interpretability to make the model results easier for adoption. △ Less

Submitted 19 July, 2023; originally announced July 2023.

arXiv:2307.07176 [pdf, other]

SafeDreamer: Safe Reinforcement Learning with World Models

Authors: Weidong Huang, Jiaming Ji, Borong Zhang, Chunhe Xia, Yaodong Yang

Abstract: The deployment of Reinforcement Learning (RL) in real-world applications is constrained by its failure to satisfy safety criteria. Existing Safe Reinforcement Learning (SafeRL) methods, which rely on cost functions to enforce safety, often fail to achieve zero-cost performance in complex scenarios, especially vision-only tasks. These limitations are primarily due to model inaccuracies and inadequa… ▽ More The deployment of Reinforcement Learning (RL) in real-world applications is constrained by its failure to satisfy safety criteria. Existing Safe Reinforcement Learning (SafeRL) methods, which rely on cost functions to enforce safety, often fail to achieve zero-cost performance in complex scenarios, especially vision-only tasks. These limitations are primarily due to model inaccuracies and inadequate sample efficiency. The integration of world models has proven effective in mitigating these shortcomings. In this work, we introduce SafeDreamer, a novel algorithm incorporating Lagrangian-based methods into world model planning processes within the superior Dreamer framework. Our method achieves nearly zero-cost performance on various tasks, spanning low-dimensional and vision-only input, within the Safety-Gymnasium benchmark, showcasing its efficacy in balancing performance and safety in RL tasks. Further details and resources are available on the project website: https://sites.google.com/view/safedreamer. △ Less

Submitted 7 October, 2023; v1 submitted 14 July, 2023; originally announced July 2023.

arXiv:2307.03032 [pdf, other]

Quarkyonic matter and quarkyonic stars in an extended RMF model

Authors: Cheng-Jun Xia, Hao-Miao **, Ting-Ting Sun

Abstract: By combining RMF models and equivparticle models with density-dependent quark masses, we construct explicitly ``a quark Fermi Sea'' and ``a baryonic Fermi surface'' to model the quarkyonic phase, where baryons with momentums ranging from zero to Fermi momentums are included. The properties of nuclear matter, quark matter, and quarkyonic matter are then investigated in a unified manner, where quark… ▽ More By combining RMF models and equivparticle models with density-dependent quark masses, we construct explicitly ``a quark Fermi Sea'' and ``a baryonic Fermi surface'' to model the quarkyonic phase, where baryons with momentums ranging from zero to Fermi momentums are included. The properties of nuclear matter, quark matter, and quarkyonic matter are then investigated in a unified manner, where quarkyonic matter is more stable and energy minimization is still applicable to obtain the microscopic properties of dense matter. Three different covariant density functionals TW99, PKDD, and DD-ME2 are adopted in our work, where TW99 gives satisfactory predictions for the properties of nuclear matter both in neutron stars and heavy-ion collisions and quarkyonic transition is unfavorable. Nevertheless, if PKDD with larger slope of symmetry energy $L$ or DD-ME2 with larger skewness coefficient $J$ are adopted, the corresponding EOSs are too stiff according to both experimental and astrophysical constraints. The situation is improved if quarkyonic transition takes place, where the EOSs become softer and can accommodate various experimental and astrophysical constraints. △ Less

Submitted 6 July, 2023; originally announced July 2023.

arXiv:2306.17194 [pdf, other]

On the Exploitability of Instruction Tuning

Authors: Manli Shu, Jiongxiao Wang, Chen Zhu, Jonas Gei**, Chaowei Xiao, Tom Goldstein

Abstract: Instruction tuning is an effective technique to align large language models (LLMs) with human intents. In this work, we investigate how an adversary can exploit instruction tuning by injecting specific instruction-following examples into the training data that intentionally changes the model's behavior. For example, an adversary can achieve content injection by injecting training examples that men… ▽ More Instruction tuning is an effective technique to align large language models (LLMs) with human intents. In this work, we investigate how an adversary can exploit instruction tuning by injecting specific instruction-following examples into the training data that intentionally changes the model's behavior. For example, an adversary can achieve content injection by injecting training examples that mention target content and eliciting such behavior from downstream models. To achieve this goal, we propose \textit{AutoPoison}, an automated data poisoning pipeline. It naturally and coherently incorporates versatile attack goals into poisoned data with the help of an oracle LLM. We showcase two example attacks: content injection and over-refusal attacks, each aiming to induce a specific exploitable behavior. We quantify and benchmark the strength and the stealthiness of our data poisoning scheme. Our results show that AutoPoison allows an adversary to change a model's behavior by poisoning only a small fraction of data while maintaining a high level of stealthiness in the poisoned examples. We hope our work sheds light on how data quality affects the behavior of instruction-tuned models and raises awareness of the importance of data quality for responsible deployments of LLMs. Code is available at \url{https://github.com/azshue/AutoPoison}. △ Less

Submitted 28 October, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

Comments: NeurIPS 2023 camera-ready (21 pages, 10 figures)

arXiv:2306.15742 [pdf, other]

Differentially Private Video Activity Recognition

Authors: Zelun Luo, Yuliang Zou, Yi** Yang, Zane Durante, De-An Huang, Zhiding Yu, Chaowei Xiao, Li Fei-Fei, Animashree Anandkumar

Abstract: In recent years, differential privacy has seen significant advancements in image classification; however, its application to video activity recognition remains under-explored. This paper addresses the challenges of applying differential privacy to video activity recognition, which primarily stem from: (1) a discrepancy between the desired privacy level for entire videos and the nature of input dat… ▽ More In recent years, differential privacy has seen significant advancements in image classification; however, its application to video activity recognition remains under-explored. This paper addresses the challenges of applying differential privacy to video activity recognition, which primarily stem from: (1) a discrepancy between the desired privacy level for entire videos and the nature of input data processed by contemporary video architectures, which are typically short, segmented clips; and (2) the complexity and sheer size of video datasets relative to those in image classification, which render traditional differential privacy methods inadequate. To tackle these issues, we propose Multi-Clip DP-SGD, a novel framework for enforcing video-level differential privacy through clip-based classification models. This method samples multiple clips from each video, averages their gradients, and applies gradient clip** in DP-SGD without incurring additional privacy loss. Moreover, we incorporate a parameter-efficient transfer learning strategy to make the model scalable for large-scale video datasets. Through extensive evaluations on the UCF-101 and HMDB-51 datasets, our approach exhibits impressive performance, achieving 81% accuracy with a privacy budget of epsilon=5 on UCF-101, marking a 76% improvement compared to a direct application of DP-SGD. Furthermore, we demonstrate that our transfer learning strategy is versatile and can enhance differentially private image classification across an array of datasets including CheXpert, ImageNet, CIFAR-10, and CIFAR-100. △ Less

Submitted 27 June, 2023; originally announced June 2023.

arXiv:2306.13971 [pdf, other]

Towards Robust Aspect-based Sentiment Analysis through Non-counterfactual Augmentations

Authors: Xinyu Liu, Yan Ding, Kaikai An, Chunyang Xiao, Pranava Madhyastha, Tong Xiao, **gbo Zhu

Abstract: While state-of-the-art NLP models have demonstrated excellent performance for aspect based sentiment analysis (ABSA), substantial evidence has been presented on their lack of robustness. This is especially manifested as significant degradation in performance when faced with out-of-distribution data. Recent solutions that rely on counterfactually augmented datasets show promising results, but they… ▽ More While state-of-the-art NLP models have demonstrated excellent performance for aspect based sentiment analysis (ABSA), substantial evidence has been presented on their lack of robustness. This is especially manifested as significant degradation in performance when faced with out-of-distribution data. Recent solutions that rely on counterfactually augmented datasets show promising results, but they are inherently limited because of the lack of access to explicit causal structure. In this paper, we present an alternative approach that relies on non-counterfactual data augmentation. Our proposal instead relies on using noisy, cost-efficient data augmentations that preserve semantics associated with the target aspect. Our approach then relies on modelling invariances between different versions of the data to improve robustness. A comprehensive suite of experiments shows that our proposal significantly improves upon strong pre-trained baselines on both standard and robustness-specific datasets. Our approach further establishes a new state-of-the-art on the ABSA robustness benchmark and transfers well across domains. △ Less

Submitted 20 July, 2023; v1 submitted 24 June, 2023; originally announced June 2023.

Comments: 10pages,1 figure,10 tables

arXiv:2306.12646 [pdf, other]

Learnability and Algorithm for Continual Learning

Authors: Gyuhak Kim, Changnan Xiao, Tatsuya Konishi, Bing Liu

Abstract: This paper studies the challenging continual learning (CL) setting of Class Incremental Learning (CIL). CIL learns a sequence of tasks consisting of disjoint sets of concepts or classes. At any time, a single model is built that can be applied to predict/classify test instances of any classes learned thus far without providing any task related information for each test instance. Although many tech… ▽ More This paper studies the challenging continual learning (CL) setting of Class Incremental Learning (CIL). CIL learns a sequence of tasks consisting of disjoint sets of concepts or classes. At any time, a single model is built that can be applied to predict/classify test instances of any classes learned thus far without providing any task related information for each test instance. Although many techniques have been proposed for CIL, they are mostly empirical. It has been shown recently that a strong CIL system needs a strong within-task prediction (WP) and a strong out-of-distribution (OOD) detection for each task. However, it is still not known whether CIL is actually learnable. This paper shows that CIL is learnable. Based on the theory, a new CIL algorithm is also proposed. Experimental results demonstrate its effectiveness. △ Less

Submitted 21 June, 2023; originally announced June 2023.

Comments: ICML 2023

arXiv:2306.12139 [pdf, other]

doi 10.1145/3580305.3599510

Spatial Heterophily Aware Graph Neural Networks

Authors: Congxi Xiao, **gbo Zhou, Jizhou Huang, Tong Xu, Hui Xiong

Abstract: Graph Neural Networks (GNNs) have been broadly applied in many urban applications upon formulating a city as an urban graph whose nodes are urban objects like regions or points of interest. Recently, a few enhanced GNN architectures have been developed to tackle heterophily graphs where connected nodes are dissimilar. However, urban graphs usually can be observed to possess a unique spatial hetero… ▽ More Graph Neural Networks (GNNs) have been broadly applied in many urban applications upon formulating a city as an urban graph whose nodes are urban objects like regions or points of interest. Recently, a few enhanced GNN architectures have been developed to tackle heterophily graphs where connected nodes are dissimilar. However, urban graphs usually can be observed to possess a unique spatial heterophily property; that is, the dissimilarity of neighbors at different spatial distances can exhibit great diversity. This property has not been explored, while it often exists. To this end, in this paper, we propose a metric, named Spatial Diversity Score, to quantitatively measure the spatial heterophily and show how it can influence the performance of GNNs. Indeed, our experimental investigation clearly shows that existing heterophilic GNNs are still deficient in handling the urban graph with high spatial diversity score. This, in turn, may degrade their effectiveness in urban applications. Along this line, we propose a Spatial Heterophily Aware Graph Neural Network (SHGNN), to tackle the spatial diversity of heterophily of urban graphs. Based on the key observation that spatially close neighbors on the urban graph present a more similar mode of difference to the central node, we first design a rotation-scaling spatial aggregation module, whose core idea is to properly group the spatially close neighbors and separately process each group with less diversity inside. Then, a heterophily-sensitive spatial interaction module is designed to adaptively capture the commonality and diverse dissimilarity in different spatial groups. Extensive experiments on three real-world urban datasets demonstrate the superiority of our SHGNN over several its competitors. △ Less

Submitted 21 June, 2023; originally announced June 2023.

Comments: Accepted by KDD 2023

arXiv:2306.11252 [pdf, other]

HK-LegiCoST: Leveraging Non-Verbatim Transcripts for Speech Translation

Authors: Cihan Xiao, Henry Li Xinyuan, **yi Yang, Dongji Gao, Matthew Wiesner, Kevin Duh, Sanjeev Khudanpur

Abstract: We introduce HK-LegiCoST, a new three-way parallel corpus of Cantonese-English translations, containing 600+ hours of Cantonese audio, its standard traditional Chinese transcript, and English translation, segmented and aligned at the sentence level. We describe the notable challenges in corpus preparation: segmentation, alignment of long audio recordings, and sentence-level alignment with non-verb… ▽ More We introduce HK-LegiCoST, a new three-way parallel corpus of Cantonese-English translations, containing 600+ hours of Cantonese audio, its standard traditional Chinese transcript, and English translation, segmented and aligned at the sentence level. We describe the notable challenges in corpus preparation: segmentation, alignment of long audio recordings, and sentence-level alignment with non-verbatim transcripts. Such transcripts make the corpus suitable for speech translation research when there are significant differences between the spoken and written forms of the source language. Due to its large size, we are able to demonstrate competitive speech translation baselines on HK-LegiCoST and extend them to promising cross-corpus results on the FLEURS Cantonese subset. These results deliver insights into speech recognition and translation research in languages for which non-verbatim or ``noisy'' transcription is common due to various factors, including vernacular and dialectal speech. △ Less

Submitted 19 June, 2023; originally announced June 2023.

arXiv:2306.09622 [pdf, other]

Nonlinear current response of two-dimensional systems under in-plane magnetic field

Authors: Yue-Xin Huang, Yang Wang, Hui Wang, Cong Xiao, Xiao Li, Shengyuan A. Yang

Abstract: We theoretically investigate the nonlinear response current of a two-dimensional system under an in-plane magnetic field. Based on the extended semiclassical theory, we develop a unified theory including both longitudinal and transverse currents and classify contributions according to their scaling with the relaxation time. Besides time-reversal-even contributions, we reveal a previously unknown t… ▽ More We theoretically investigate the nonlinear response current of a two-dimensional system under an in-plane magnetic field. Based on the extended semiclassical theory, we develop a unified theory including both longitudinal and transverse currents and classify contributions according to their scaling with the relaxation time. Besides time-reversal-even contributions, we reveal a previously unknown time-reversal-odd contribution to the Hall current, which occurs in magnetic systems, exhibits band geometric origin, and is linear in relaxation time. We show that the different contributions exhibit different symmetry characters, especially in their angular dependence on the field orientation, which can be used to distinguish them in experiment. The theory is explicitly demonstrated in the study of the Rashba model. Our work presents a deepened understanding of nonlinear planar transport, proposes approaches to distinguish different contributions, and sheds light on possible routes to enhance the effect in practice. △ Less

Submitted 22 June, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

Comments: 10 pages, 6 figures

arXiv:2306.07824 [pdf, ps, other]

JCCS-PFGM: A Novel Circle-Supervision based Poisson Flow Generative Model for Multiphase CECT Progressive Low-Dose Reconstruction with Joint Condition

Authors: Rongjun Ge, Yuting He, Cong Xia, Yang Chen, Daoqiang Zhang, Ge Wang

Abstract: Multiphase contrast-enhanced computed tomography (CECT) scan is clinically significant to demonstrate the anatomy at different phases. In practice, such a multiphase CECT scan inherently takes longer time and deposits much more radiation dose into a patient body than a regular CT scan, and reduction of the radiation dose typically compromise the CECT image quality and its diagnostic value. With Jo… ▽ More Multiphase contrast-enhanced computed tomography (CECT) scan is clinically significant to demonstrate the anatomy at different phases. In practice, such a multiphase CECT scan inherently takes longer time and deposits much more radiation dose into a patient body than a regular CT scan, and reduction of the radiation dose typically compromise the CECT image quality and its diagnostic value. With Joint Condition and Circle-Supervision, here we propose a novel Poisson Flow Generative Model (JCCS-PFGM) to promote the progressive low-dose reconstruction for multiphase CECT. JCCS-PFGM is characterized by the following three aspects: a progressive low-dose reconstruction scheme, a circle-supervision strategy, and a joint condition mechanism. Our extensive experiments are performed on a clinical dataset consisting of 11436 images. The results show that our JCCS-PFGM achieves promising PSNR up to 46.3dB, SSIM up to 98.5%, and MAE down to 9.67 HU averagely on phases I, II and III, in quantitative evaluations, as well as gains high-quality readable visualizations in qualitative assessments. All of these findings reveal our method a great potential to be adapted for clinical CECT scans at a much-reduced radiation dose. △ Less

Submitted 13 June, 2023; originally announced June 2023.

arXiv:2306.06440 [pdf, ps, other]

Epidemic spreading in wireless sensor networks with node sleep scheduling

Authors: Yanqing Wu, Cunlai Pu, Gongxuan Zhang, Lunbo Li, Yongxiang Xia, Chengyi Xia

Abstract: Wireless Sensor Networks (WSNs) have become widely used in various fields like environmental monitoring, smart agriculture, and health care. However, their extensive usage also introduces significant vulnerabilities to cyber viruses. Addressing this security issue in WSNs is very challenging due to their inherent limitations in energy and bandwidth to implement real-time security measures. To tack… ▽ More Wireless Sensor Networks (WSNs) have become widely used in various fields like environmental monitoring, smart agriculture, and health care. However, their extensive usage also introduces significant vulnerabilities to cyber viruses. Addressing this security issue in WSNs is very challenging due to their inherent limitations in energy and bandwidth to implement real-time security measures. To tackle the virus issue, it is crucial to first understand how it spreads in WSNs. In this brief, we propose a novel epidemic spreading model for WSNs, integrating the susceptible-infected-susceptible (SIS) epidemic spreading model and node probabilistic sleep scheduling--a critical mechanism for optimizing energy efficiency. Using the microscopic Markov chain (MMC) method, we derive the spreading equations and epidemic threshold of our model. We conduct numerical simulations to validate the theoretical results and investigate the impact of key factors on epidemic spreading in WSNs. Notably, we discover that the epidemic threshold is directly proportional to the ratio of node slee** and node activation probabilities. △ Less

Submitted 10 June, 2023; originally announced June 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2306.06395 [pdf, other]

Are the $a_{0}(1710)$ or $a_{0}(1817)$ resonances in the $D_{s}^{+} \rightarrow K_{S}^{0}K^{+}π^{0}$ decay?

Authors: Zhong-Yu Wang, Yu-Wen Peng, **g-Yu Yi, W. C. Luo, C. W. Xiao

Abstract: The BESIII Collaboration claimed that a new $a_{0}(1817)$ resonance was found in the recent results of the $D_{s}^{+} \rightarrow K_{S}^{0}K^{+}π^{0}$ decay. For this decay process, we perform a unitary amplitude to analyze the contributions of the states $a_{0}(980)^{+}$ and $a_{0}(1710)^{+}$ with the final state interactions. Considering the Cabibbo-favored external and internal $W$-emission mec… ▽ More The BESIII Collaboration claimed that a new $a_{0}(1817)$ resonance was found in the recent results of the $D_{s}^{+} \rightarrow K_{S}^{0}K^{+}π^{0}$ decay. For this decay process, we perform a unitary amplitude to analyze the contributions of the states $a_{0}(980)^{+}$ and $a_{0}(1710)^{+}$ with the final state interactions. Considering the Cabibbo-favored external and internal $W$-emission mechanisms at the quark level, and the contributions of the resonances $a_{0}(980)^{+}$, $a_{0}(1710)^{+}$ in the $S$-wave and $\bar{K}^{*}(892)^{0}$, ${K}^{*}(892)^{+}$ in the $P$-wave, the recent experimental data of the $K_{S}^{0}K^{+}$ invariant mass spectrum from the BESIII Collaboration can be described well. In our results, the states $a_{0}(980)$ and $a_{0}(1710)$ are dynamically generated from the final state interactions of $K\bar{K}$ and $K^{*}\bar{K}^{*}$, respectively, which support the molecular nature for them. Moreover, some obtained branching fractions are in agreement with the experimental measurements. △ Less

Submitted 10 June, 2023; originally announced June 2023.

arXiv:2306.05911 [pdf, other]

Sketch2Stress: Sketching with Structural Stress Awareness

Authors: Deng Yu, Chufeng Xiao, Manfred Lau, Hongbo Fu

Abstract: In the process of product design and digital fabrication, the structural analysis of a designed prototype is a fundamental and essential step. However, such a step is usually invisible or inaccessible to designers at the early sketching phase. This limits the user's ability to consider a shape's physical properties and structural soundness. To bridge this gap, we introduce a novel approach Sketch2… ▽ More In the process of product design and digital fabrication, the structural analysis of a designed prototype is a fundamental and essential step. However, such a step is usually invisible or inaccessible to designers at the early sketching phase. This limits the user's ability to consider a shape's physical properties and structural soundness. To bridge this gap, we introduce a novel approach Sketch2Stress that allows users to perform structural analysis of desired objects at the sketching stage. This method takes as input a 2D freehand sketch and one or multiple locations of user-assigned external forces. With the specially-designed two-branch generative-adversarial framework, it automatically predicts a normal map and a corresponding structural stress map distributed over the user-sketched underlying object. In this way, our method empowers designers to easily examine the stress sustained everywhere and identify potential problematic regions of their sketched object. Furthermore, combined with the predicted normal map, users are able to conduct a region-wise structural analysis efficiently by aggregating the stress effects of multiple forces in the same direction. Finally, we demonstrate the effectiveness and practicality of our system with extensive experiments and user studies. △ Less

Submitted 11 December, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

Comments: Accepted by IEEE Transactions on Visualization and Computer Graphics (IEEE TVCG)

arXiv:2306.05726 [pdf, other]

Iteratively Refined Behavior Regularization for Offline Reinforcement Learning

Authors: Xiaohan Hu, Yi Ma, Chenjun Xiao, Yan Zheng, Jianye Hao

Abstract: One of the fundamental challenges for offline reinforcement learning (RL) is ensuring robustness to data distribution. Whether the data originates from a near-optimal policy or not, we anticipate that an algorithm should demonstrate its ability to learn an effective control policy that seamlessly aligns with the inherent distribution of offline data. Unfortunately, behavior regularization, a simpl… ▽ More One of the fundamental challenges for offline reinforcement learning (RL) is ensuring robustness to data distribution. Whether the data originates from a near-optimal policy or not, we anticipate that an algorithm should demonstrate its ability to learn an effective control policy that seamlessly aligns with the inherent distribution of offline data. Unfortunately, behavior regularization, a simple yet effective offline RL algorithm, tends to struggle in this regard. In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration. Our key observation is that by iteratively refining the reference policy used for behavior regularization, conservative policy update guarantees gradually improvement, while also implicitly avoiding querying out-of-sample actions to prevent catastrophic learning failures. We prove that in the tabular setting this algorithm is capable of learning the optimal policy covered by the offline dataset, commonly referred to as the in-sample optimal policy. We then explore several implementation details of the algorithm when function approximations are applied. The resulting algorithm is easy to implement, requiring only a few lines of code modification to existing methods. Experimental results on the D4RL benchmark indicate that our method outperforms previous state-of-the-art baselines in most tasks, clearly demonstrate its superiority over behavior regularization. △ Less

Submitted 17 October, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

arXiv:2306.04018 [pdf, other]

PyTrial: Machine Learning Software and Benchmark for Clinical Trial Applications

Authors: Zifeng Wang, Brandon Theodorou, Tianfan Fu, Cao Xiao, Jimeng Sun

Abstract: Clinical trials are conducted to test the effectiveness and safety of potential drugs in humans for regulatory approval. Machine learning (ML) has recently emerged as a new tool to assist in clinical trials. Despite this progress, there have been few efforts to document and benchmark ML4Trial algorithms available to the ML research community. Additionally, the accessibility to clinical trial-relat… ▽ More Clinical trials are conducted to test the effectiveness and safety of potential drugs in humans for regulatory approval. Machine learning (ML) has recently emerged as a new tool to assist in clinical trials. Despite this progress, there have been few efforts to document and benchmark ML4Trial algorithms available to the ML research community. Additionally, the accessibility to clinical trial-related datasets is limited, and there is a lack of well-defined clinical tasks to facilitate the development of new algorithms. To fill this gap, we have developed PyTrial that provides benchmarks and open-source implementations of a series of ML algorithms for clinical trial design and operations. In this paper, we thoroughly investigate 34 ML algorithms for clinical trials across 6 different tasks, including patient outcome prediction, trial site selection, trial outcome prediction, patient-trial matching, trial similarity search, and synthetic data generation. We have also collected and prepared 23 ML-ready datasets as well as their working examples in Jupyter Notebooks for quick implementation and testing. PyTrial defines each task through a simple four-step process: data loading, model specification, model training, and model evaluation, all achievable with just a few lines of code. Furthermore, our modular API architecture empowers practitioners to expand the framework to incorporate new algorithms and tasks effortlessly. The code is available at https://github.com/RyanWangZf/PyTrial. △ Less

Submitted 5 October, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

arXiv:2306.02585 [pdf, other]

MotionTrack: Learning Motion Predictor for Multiple Object Tracking

Authors: Changcheng Xiao, Qiong Cao, Yujie Zhong, Long Lan, Xiang Zhang, Zhigang Luo, Dacheng Tao

Abstract: Significant progress has been achieved in multi-object tracking (MOT) through the evolution of detection and re-identification (ReID) techniques. Despite these advancements, accurately tracking objects in scenarios with homogeneous appearance and heterogeneous motion remains a challenge. This challenge arises from two main factors: the insufficient discriminability of ReID features and the predomi… ▽ More Significant progress has been achieved in multi-object tracking (MOT) through the evolution of detection and re-identification (ReID) techniques. Despite these advancements, accurately tracking objects in scenarios with homogeneous appearance and heterogeneous motion remains a challenge. This challenge arises from two main factors: the insufficient discriminability of ReID features and the predominant utilization of linear motion models in MOT. In this context, we introduce a novel motion-based tracker, MotionTrack, centered around a learnable motion predictor that relies solely on object trajectory information. This predictor comprehensively integrates two levels of granularity in motion features to enhance the modeling of temporal dynamics and facilitate precise future motion prediction for individual objects. Specifically, the proposed approach adopts a self-attention mechanism to capture token-level information and a Dynamic MLP layer to model channel-level features. MotionTrack is a simple, online tracking approach. Our experimental results demonstrate that MotionTrack yields state-of-the-art performance on datasets such as Dancetrack and SportsMOT, characterized by highly complex object motion. △ Less

Submitted 11 March, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

arXiv:2306.02013 [pdf, other]

Instabilities of longitudinal vortex rolls in katabatic Prandtl slope flows

Authors: Chengnian Xiao, Inanc Senocak

Abstract: Stationary counter-rotating longitudinal vortex pairs emerge from one-dimensional Prandtl slope flows under katabatic as well as anabatic conditions due to a linear instability when the imposed surface heat flux magnitude is sufficiently strong relative to the stable ambient stratification. For anabatic flows, these vortices have already been identified to exhibit an unique topology that bears a s… ▽ More Stationary counter-rotating longitudinal vortex pairs emerge from one-dimensional Prandtl slope flows under katabatic as well as anabatic conditions due to a linear instability when the imposed surface heat flux magnitude is sufficiently strong relative to the stable ambient stratification. For anabatic flows, these vortices have already been identified to exhibit an unique topology that bears a striking resemblance to speaker-wires since they stay coherent as a single unit without the presence of another vortex pair. Under katabatic conditions and at a constant Prandtl number, we find that the longitudinal vortices emerging at a range of different slope angles possess the similar topology as their anabatic counterparts. We determine the existence of both fundamental and subharmonic secondary instabilities depending on the slope angle for the most likely transverse base flow wavelength. Our results indicate that the most dominant instability shifts from a fundamental to subharmonic mode with increasing slope angle. At shallow slopes, this dynamic contrast with the speaker-wire vortices in anabatic slope flows at the same angle which for which the subharmonic instability is clearly dominant. These modes are responsible for the bending and movement of single or multiple speaker-wire vortices, which may merge or reconnect to lead to dynamically more unstable states, eventually leading to transition towards turbulence. We demonstrate that at sufficiently steep slopes, the dynamics of these vortex pairs are dominated by long-wave reconnections or two-dimensional mergers between adjacent pairs. △ Less

Submitted 3 June, 2023; originally announced June 2023.

Comments: arXiv admin note: text overlap with arXiv:2203.15895

arXiv:2306.01631 [pdf, other]

Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations

Authors: Pengcheng Jiang, Cao Xiao, Tianfan Fu, Jimeng Sun

Abstract: Molecule representation learning is crucial for various downstream applications, such as understanding and predicting molecular properties and side effects. In this paper, we propose a novel method called GODE, which takes into account the two-level structure of individual molecules. We recognize that molecules have an intrinsic graph structure as well as being a node in a larger molecule knowledg… ▽ More Molecule representation learning is crucial for various downstream applications, such as understanding and predicting molecular properties and side effects. In this paper, we propose a novel method called GODE, which takes into account the two-level structure of individual molecules. We recognize that molecules have an intrinsic graph structure as well as being a node in a larger molecule knowledge graph. GODE integrates graph representations of individual molecules with multidomain biochemical data from knowledge graphs. By pre-training two graph neural networks (GNNs) on different graph structures, combined with contrastive learning, GODE fuses molecular structures with their corresponding knowledge graph substructures. This fusion results in a more robust and informative representation, which enhances molecular property prediction by harnessing both chemical and biological information. When fine-tuned across 11 chemical property tasks, our model outperforms existing benchmarks, registering an average ROC-AUC uplift of 13.8% for classification tasks and an average RMSE/MAE enhancement of 35.1% for regression tasks. Impressively, it surpasses the current leading model in molecule property predictions with average advancements of 2.1% in classification and 6.4% in regression tasks. △ Less

Submitted 19 January, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

arXiv:2306.00744 [pdf, ps, other]

New monotonicity for $p$-capacitary functions in $3$-manifolds with nonnegative scalar curvature

Authors: Chao Xia, Jiabin Yin, Xingjian Zhou

Abstract: In this paper, we derive general monotone quantities and geometric inequalities associated with $p$-capacitary functions in asymptotically flat $3$-manifolds with simple topology and nonnegative scalar curvature. The inequalities become equalities on the spatial Schwarzschild manifolds outside rotationally symmetric spheres. This generalizes Miao's result \cite{M} from $p=2$ to $p\in (1, 3)$. As a… ▽ More In this paper, we derive general monotone quantities and geometric inequalities associated with $p$-capacitary functions in asymptotically flat $3$-manifolds with simple topology and nonnegative scalar curvature. The inequalities become equalities on the spatial Schwarzschild manifolds outside rotationally symmetric spheres. This generalizes Miao's result \cite{M} from $p=2$ to $p\in (1, 3)$. As applications, we recover mass-to-$p$-capacity and $p$-capacity-to-area inequalities due to Bray-Miao \cite{BM} and Xiao \cite{Xiao}. △ Less

Submitted 18 October, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: In this version, we extended the range of $k$ from $[0, 1]$ to $(-1, 1]$ in Theorem 1.1. As a consequence, we removed the assumption of nonnegative Hawking mass in Theorem 1.3

arXiv:2306.00398 [pdf, other]

Preference-grounded Token-level Guidance for Language Model Fine-tuning

Authors: Shentao Yang, Shujian Zhang, Congying Xia, Yihao Feng, Caiming Xiong, Mingyuan Zhou

Abstract: Aligning language models (LMs) with preferences is an important problem in natural language generation. A key challenge is that preferences are typically provided at the *sequence level* while LM training and generation both occur at the *token level*. There is, therefore, a *granularity mismatch* between the preference and the LM training losses, which may complicate the learning problem. In this… ▽ More Aligning language models (LMs) with preferences is an important problem in natural language generation. A key challenge is that preferences are typically provided at the *sequence level* while LM training and generation both occur at the *token level*. There is, therefore, a *granularity mismatch* between the preference and the LM training losses, which may complicate the learning problem. In this paper, we address this issue by develo** an alternate training process, where we iterate between grounding the sequence-level preference into token-level training guidance, and improving the LM with the learned guidance. For guidance learning, we design a framework that extends the pairwise-preference learning in imitation learning to both variable-length LM generation and the utilization of the preference among multiple generations. For LM training, based on the amount of supervised data, we present two *minimalist* learning objectives that utilize the learned guidance. In experiments, our method performs competitively on two distinct representative LM tasks -- discrete-prompt generation and text summarization. △ Less

Submitted 9 October, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

arXiv:2306.00349 [pdf, other]

CALICO: Self-Supervised Camera-LiDAR Contrastive Pre-training for BEV Perception

Authors: Jiachen Sun, Haizhong Zheng, Qingzhao Zhang, Atul Prakash, Z. Morley Mao, Chaowei Xiao

Abstract: Perception is crucial in the realm of autonomous driving systems, where bird's eye view (BEV)-based architectures have recently reached state-of-the-art performance. The desirability of self-supervised representation learning stems from the expensive and laborious process of annotating 2D and 3D data. Although previous research has investigated pretraining methods for both LiDAR and camera-based 3… ▽ More Perception is crucial in the realm of autonomous driving systems, where bird's eye view (BEV)-based architectures have recently reached state-of-the-art performance. The desirability of self-supervised representation learning stems from the expensive and laborious process of annotating 2D and 3D data. Although previous research has investigated pretraining methods for both LiDAR and camera-based 3D object detection, a unified pretraining framework for multimodal BEV perception is missing. In this study, we introduce CALICO, a novel framework that applies contrastive objectives to both LiDAR and camera backbones. Specifically, CALICO incorporates two stages: point-region contrast (PRC) and region-aware distillation (RAD). PRC better balances the region- and scene-level representation learning on the LiDAR modality and offers significant performance improvement compared to existing methods. RAD effectively achieves contrastive distillation on our self-trained teacher model. CALICO's efficacy is substantiated by extensive evaluations on 3D object detection and BEV map segmentation tasks, where it delivers significant performance improvements. Notably, CALICO outperforms the baseline method by 10.5% and 8.6% on NDS and mAP. Moreover, CALICO boosts the robustness of multimodal 3D object detection against adversarial attacks and corruption. Additionally, our framework can be tailored to different backbones and heads, positioning it as a promising approach for multimodal BEV perception. △ Less

Submitted 27 November, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

arXiv:2306.00107 [pdf, other]

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

Authors: Yizhi Li, Ruibin Yuan, Ge Zhang, Yinghao Ma, Xingran Chen, Hanzhi Yin, Chenghao Xiao, Chenghua Lin, Anton Ragni, Emmanouil Benetos, Norbert Gyenge, Roger Dannenberg, Ruibo Liu, Wenhu Chen, Gus Xia, Yemin Shi, Wenhao Huang, Zili Wang, Yike Guo, Jie Fu

Abstract: Self-supervised learning (SSL) has recently emerged as a promising paradigm for training generalisable models on large-scale data in the fields of vision, text, and speech. Although SSL has been proven effective in speech and audio, its application to music audio has yet to be thoroughly explored. This is partially due to the distinctive challenges associated with modelling musical knowledge, part… ▽ More Self-supervised learning (SSL) has recently emerged as a promising paradigm for training generalisable models on large-scale data in the fields of vision, text, and speech. Although SSL has been proven effective in speech and audio, its application to music audio has yet to be thoroughly explored. This is partially due to the distinctive challenges associated with modelling musical knowledge, particularly tonal and pitched characteristics of music. To address this research gap, we propose an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), which incorporates teacher models to provide pseudo labels in the masked language modelling (MLM) style acoustic pre-training. In our exploration, we identified an effective combination of teacher models, which outperforms conventional speech and audio approaches in terms of performance. This combination includes an acoustic teacher based on Residual Vector Quantisation - Variational AutoEncoder (RVQ-VAE) and a musical teacher based on the Constant-Q Transform (CQT). Furthermore, we explore a wide range of settings to overcome the instability in acoustic language model pre-training, which allows our designed paradigm to scale from 95M to 330M parameters. Experimental results indicate that our model can generalise and perform well on 14 music understanding tasks and attain state-of-the-art (SOTA) overall scores. △ Less

Submitted 22 April, 2024; v1 submitted 31 May, 2023; originally announced June 2023.

Comments: accepted by ICLR 2024

arXiv:2305.19759 [pdf, other]

Simple yet Effective Code-Switching Language Identification with Multitask Pre-Training and Transfer Learning

Authors: Shuyue Stella Li, Cihan Xiao, Tianjian Li, Bismarck Odoom

Abstract: Code-switching, also called code-mixing, is the linguistics phenomenon where in casual settings, multilingual speakers mix words from different languages in one utterance. Due to its spontaneous nature, code-switching is extremely low-resource, which makes it a challenging problem for language and speech processing tasks. In such contexts, Code-Switching Language Identification (CSLID) becomes a d… ▽ More Code-switching, also called code-mixing, is the linguistics phenomenon where in casual settings, multilingual speakers mix words from different languages in one utterance. Due to its spontaneous nature, code-switching is extremely low-resource, which makes it a challenging problem for language and speech processing tasks. In such contexts, Code-Switching Language Identification (CSLID) becomes a difficult but necessary task if we want to maximally leverage existing monolingual tools for other tasks. In this work, we propose two novel approaches toward improving language identification accuracy on an English-Mandarin child-directed speech dataset. Our methods include a stacked Residual CNN+GRU model and a multitask pre-training approach to use Automatic Speech Recognition (ASR) as an auxiliary task for CSLID. Due to the low-resource nature of code-switching, we also employ careful silver data creation using monolingual corpora in both languages and up-sampling as data augmentation. We focus on English-Mandarin code-switched data, but our method works on any language pair. Our best model achieves a balanced accuracy of 0.781 on a real English-Mandarin code-switching child-directed speech corpus and outperforms the previous baseline by 55.3%. △ Less

Submitted 31 May, 2023; originally announced May 2023.

Comments: 8 pages, 3 figures, 7 tables

arXiv:2305.19407 [pdf, other]

FRAMM: Fair Ranking with Missing Modalities for Clinical Trial Site Selection

Authors: Brandon Theodorou, Lucas Glass, Cao Xiao, Jimeng Sun

Abstract: Despite many efforts to address the disparities, the underrepresentation of gender, racial, and ethnic minorities in clinical trials remains a problem and undermines the efficacy of treatments on minorities. This paper focuses on the trial site selection task and proposes FRAMM, a deep reinforcement learning framework for fair trial site selection. We focus on addressing two real-world challenges… ▽ More Despite many efforts to address the disparities, the underrepresentation of gender, racial, and ethnic minorities in clinical trials remains a problem and undermines the efficacy of treatments on minorities. This paper focuses on the trial site selection task and proposes FRAMM, a deep reinforcement learning framework for fair trial site selection. We focus on addressing two real-world challenges that affect fair trial sites selection: the data modalities are often not complete for many potential trial sites, and the site selection needs to simultaneously optimize for both enrollment and diversity since the problem is necessarily a trade-off between the two with the only possible way to increase diversity post-selection being through limiting enrollment via caps. To address the missing data challenge, FRAMM has a modality encoder with a masked cross-attention mechanism for handling missing data, bypassing data imputation and the need for complete data in training. To handle the need for making efficient trade-offs, FRAMM uses deep reinforcement learning with a specifically designed reward function that simultaneously optimizes for both enrollment and fairness. We evaluate FRAMM using 4,392 real-world clinical trials ranging from 2016 to 2021 and show that FRAMM outperforms the leading baseline in enrollment-only settings while also achieving large gains in diversity. Specifically, it is able to produce a 9% improvement in diversity with similar enrollment levels over the leading baselines. That improved diversity is further manifested in achieving up to a 14% increase in Hispanic enrollment, 27% increase in Black enrollment, and 60% increase in Asian enrollment compared to selecting sites with an enrollment-only model. △ Less

Submitted 30 May, 2023; originally announced May 2023.

arXiv:2305.18390 [pdf, other]

Emergent Modularity in Pre-trained Transformers

Authors: Zhengyan Zhang, Zhiyuan Zeng, Yankai Lin, Chaojun Xiao, Xiaozhi Wang, Xu Han, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Jie Zhou

Abstract: This work examines the presence of modularity in pre-trained Transformers, a feature commonly found in human brains and thought to be vital for general intelligence. In analogy to human brains, we consider two main characteristics of modularity: (1) functional specialization of neurons: we evaluate whether each neuron is mainly specialized in a certain function, and find that the answer is yes. (2… ▽ More This work examines the presence of modularity in pre-trained Transformers, a feature commonly found in human brains and thought to be vital for general intelligence. In analogy to human brains, we consider two main characteristics of modularity: (1) functional specialization of neurons: we evaluate whether each neuron is mainly specialized in a certain function, and find that the answer is yes. (2) function-based neuron grou**: we explore finding a structure that groups neurons into modules by function, and each module works for its corresponding function. Given the enormous amount of possible structures, we focus on Mixture-of-Experts as a promising candidate, which partitions neurons into experts and usually activates different experts for different inputs. Experimental results show that there are functional experts, where clustered are the neurons specialized in a certain function. Moreover, perturbing the activations of functional experts significantly affects the corresponding function. Finally, we study how modularity emerges during pre-training, and find that the modular structure is stabilized at the early stage, which is faster than neuron stabilization. It suggests that Transformers first construct the modular structure and then learn fine-grained neuron functions. Our code and data are available at https://github.com/THUNLP/modularity-analysis. △ Less

Submitted 30 October, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

Comments: Findings of ACL 2023

arXiv:2305.18090 [pdf, other]

ChatGPT-powered Conversational Drug Editing Using Retrieval and Domain Feedback

Authors: Shengchao Liu, Jiongxiao Wang, Yi** Yang, Chengpeng Wang, Ling Liu, Hongyu Guo, Chaowei Xiao

Abstract: Recent advancements in conversational large language models (LLMs), such as ChatGPT, have demonstrated remarkable promise in various domains, including drug discovery. However, existing works mainly focus on investigating the capabilities of conversational LLMs on chemical reaction and retrosynthesis. While drug editing, a critical task in the drug discovery pipeline, remains largely unexplored. T… ▽ More Recent advancements in conversational large language models (LLMs), such as ChatGPT, have demonstrated remarkable promise in various domains, including drug discovery. However, existing works mainly focus on investigating the capabilities of conversational LLMs on chemical reaction and retrosynthesis. While drug editing, a critical task in the drug discovery pipeline, remains largely unexplored. To bridge this gap, we propose ChatDrug, a framework to facilitate the systematic investigation of drug editing using LLMs. ChatDrug jointly leverages a prompt module, a retrieval and domain feedback (ReDF) module, and a conversation module to streamline effective drug editing. We empirically show that ChatDrug reaches the best performance on 33 out of 39 drug editing tasks, encompassing small molecules, peptides, and proteins. We further demonstrate, through 10 case studies, that ChatDrug can successfully identify the key substructures (e.g., the molecule functional groups, peptide motifs, and protein structures) for manipulation, generating diverse and valid suggestions for drug editing. Promisingly, we also show that ChatDrug can offer insightful explanations from a domain-specific perspective, enhancing interpretability and enabling informed decision-making. This research sheds light on the potential of ChatGPT and conversational LLMs for drug editing. It paves the way for a more efficient and collaborative drug discovery pipeline, contributing to the advancement of pharmaceutical research and development. △ Less

Submitted 29 May, 2023; originally announced May 2023.

arXiv:2305.17691 [pdf, other]

Plug-and-Play Knowledge Injection for Pre-trained Language Models

Authors: Zhengyan Zhang, Zhiyuan Zeng, Yankai Lin, Huadong Wang, Deming Ye, Chaojun Xiao, Xu Han, Zhiyuan Liu, Peng Li, Maosong Sun, Jie Zhou

Abstract: Injecting external knowledge can improve the performance of pre-trained language models (PLMs) on various downstream NLP tasks. However, massive retraining is required to deploy new knowledge injection methods or knowledge bases for downstream tasks. In this work, we are the first to study how to improve the flexibility and efficiency of knowledge injection by reusing existing downstream models. T… ▽ More Injecting external knowledge can improve the performance of pre-trained language models (PLMs) on various downstream NLP tasks. However, massive retraining is required to deploy new knowledge injection methods or knowledge bases for downstream tasks. In this work, we are the first to study how to improve the flexibility and efficiency of knowledge injection by reusing existing downstream models. To this end, we explore a new paradigm plug-and-play knowledge injection, where knowledge bases are injected into frozen existing downstream models by a knowledge plugin. Correspondingly, we propose a plug-and-play injection method map-tuning, which trains a map** of knowledge embeddings to enrich model inputs with mapped embeddings while kee** model parameters frozen. Experimental results on three knowledge-driven NLP tasks show that existing injection methods are not suitable for the new paradigm, while map-tuning effectively improves the performance of downstream models. Moreover, we show that a frozen downstream model can be well adapted to different domains with different map** networks of domain knowledge. Our code and models are available at https://github.com/THUNLP/Knowledge-Plugin. △ Less

Submitted 4 December, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

Comments: ACL 2023

arXiv:2305.17660 [pdf, other]

Plug-and-Play Document Modules for Pre-trained Models

Authors: Chaojun Xiao, Zhengyan Zhang, Xu Han, Chi-Min Chan, Yankai Lin, Zhiyuan Liu, Xiangyang Li, Zhonghua Li, Zhao Cao, Maosong Sun

Abstract: Large-scale pre-trained models (PTMs) have been widely used in document-oriented NLP tasks, such as question answering. However, the encoding-task coupling requirement results in the repeated encoding of the same documents for different tasks and queries, which is highly computationally inefficient. To this end, we target to decouple document encoding from downstream tasks, and propose to represen… ▽ More Large-scale pre-trained models (PTMs) have been widely used in document-oriented NLP tasks, such as question answering. However, the encoding-task coupling requirement results in the repeated encoding of the same documents for different tasks and queries, which is highly computationally inefficient. To this end, we target to decouple document encoding from downstream tasks, and propose to represent each document as a plug-and-play document module, i.e., a document plugin, for PTMs (PlugD). By inserting document plugins into the backbone PTM for downstream tasks, we can encode a document one time to handle multiple tasks, which is more efficient than conventional encoding-task coupling methods that simultaneously encode documents and input queries using task-specific encoders. Extensive experiments on 8 datasets of 4 typical NLP tasks show that PlugD enables models to encode documents once and for all across different scenarios. Especially, PlugD can save $69\%$ computational costs while achieving comparable performance to state-of-the-art encoding-task coupling methods. Additionally, we show that PlugD can serve as an effective post-processing way to inject knowledge into task-specific models, improving model performance without any additional model training. △ Less

Submitted 28 May, 2023; originally announced May 2023.

Comments: Accepted by ACL 2023

arXiv:2305.16291 [pdf, other]

Voyager: An Open-Ended Embodied Agent with Large Language Models

Authors: Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, Anima Anandkumar

Abstract: We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. Voyager consists of three key components: 1) an automatic curriculum that maximizes exploration, 2) an ever-growing skill library of executable code for storing and retrieving complex behavio… ▽ More We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. Voyager consists of three key components: 1) an automatic curriculum that maximizes exploration, 2) an ever-growing skill library of executable code for storing and retrieving complex behaviors, and 3) a new iterative prompting mechanism that incorporates environment feedback, execution errors, and self-verification for program improvement. Voyager interacts with GPT-4 via blackbox queries, which bypasses the need for model parameter fine-tuning. The skills developed by Voyager are temporally extended, interpretable, and compositional, which compounds the agent's abilities rapidly and alleviates catastrophic forgetting. Empirically, Voyager shows strong in-context lifelong learning capability and exhibits exceptional proficiency in playing Minecraft. It obtains 3.3x more unique items, travels 2.3x longer distances, and unlocks key tech tree milestones up to 15.3x faster than prior SOTA. Voyager is able to utilize the learned skill library in a new Minecraft world to solve novel tasks from scratch, while other techniques struggle to generalize. We open-source our full codebase and prompts at https://voyager.minedojo.org/. △ Less

Submitted 19 October, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: Project website and open-source codebase: https://voyager.minedojo.org/

arXiv:2305.14950 [pdf, other]

Adversarial Demonstration Attacks on Large Language Models

Authors: Jiongxiao Wang, Zichen Liu, Keun Hee Park, Zhuojun Jiang, Zhaoheng Zheng, Zhuofeng Wu, Muhao Chen, Chaowei Xiao

Abstract: With the emergence of more powerful large language models (LLMs), such as ChatGPT and GPT-4, in-context learning (ICL) has gained significant prominence in leveraging these models for specific tasks by utilizing data-label pairs as precondition prompts. While incorporating demonstrations can greatly enhance the performance of LLMs across various tasks, it may introduce a new security concern: atta… ▽ More With the emergence of more powerful large language models (LLMs), such as ChatGPT and GPT-4, in-context learning (ICL) has gained significant prominence in leveraging these models for specific tasks by utilizing data-label pairs as precondition prompts. While incorporating demonstrations can greatly enhance the performance of LLMs across various tasks, it may introduce a new security concern: attackers can manipulate only the demonstrations without changing the input to perform an attack. In this paper, we investigate the security concern of ICL from an adversarial perspective, focusing on the impact of demonstrations. We propose a novel attack method named advICL, which aims to manipulate only the demonstration without changing the input to mislead the models. Our results demonstrate that as the number of demonstrations increases, the robustness of in-context learning would decrease. Additionally, we also identify the intrinsic property of the demonstrations is that they can be used (prepended) with different inputs. As a result, it introduces a more practical threat model in which an attacker can attack the test input example even without knowing and manipulating it. To achieve it, we propose the transferable version of advICL, named Transferable-advICL. Our experiment shows that the adversarial demonstration generated by Transferable-advICL can successfully attack the unseen test input examples. We hope that our study reveals the critical security risks associated with ICL and underscores the need for extensive research on the robustness of ICL, particularly given its increasing significance in the advancement of LLMs. △ Less

Submitted 14 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

arXiv:2305.14910 [pdf, other]

From Shortcuts to Triggers: Backdoor Defense with Denoised PoE

Authors: Qin Liu, Fei Wang, Chaowei Xiao, Muhao Chen

Abstract: Language models are often at risk of diverse backdoor attacks, especially data poisoning. Thus, it is important to investigate defense solutions for addressing them. Existing backdoor defense methods mainly focus on backdoor attacks with explicit triggers, leaving a universal defense against various backdoor attacks with diverse triggers largely unexplored. In this paper, we propose an end-to-end… ▽ More Language models are often at risk of diverse backdoor attacks, especially data poisoning. Thus, it is important to investigate defense solutions for addressing them. Existing backdoor defense methods mainly focus on backdoor attacks with explicit triggers, leaving a universal defense against various backdoor attacks with diverse triggers largely unexplored. In this paper, we propose an end-to-end ensemble-based backdoor defense framework, DPoE (Denoised Product-of-Experts), which is inspired by the shortcut nature of backdoor attacks, to defend various backdoor attacks. DPoE consists of two models: a shallow model that captures the backdoor shortcuts and a main model that is prevented from learning the backdoor shortcuts. To address the label flip caused by backdoor attackers, DPoE incorporates a denoising design. Experiments on SST-2 dataset show that DPoE significantly improves the defense performance against various types of backdoor triggers including word-level, sentence-level, and syntactic triggers. Furthermore, DPoE is also effective under a more challenging but practical setting that mixes multiple types of trigger. △ Less

Submitted 2 April, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: Accepted by NAACL 2024 Main Conference

arXiv:2305.14710 [pdf, other]

Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models

Authors: Jiashu Xu, Mingyu Derek Ma, Fei Wang, Chaowei Xiao, Muhao Chen

Abstract: We investigate security concerns of the emergent instruction tuning paradigm, that models are trained on crowdsourced datasets with task instructions to achieve superior performance. Our studies demonstrate that an attacker can inject backdoors by issuing very few malicious instructions (~1000 tokens) and control model behavior through data poisoning, without even the need to modify data instances… ▽ More We investigate security concerns of the emergent instruction tuning paradigm, that models are trained on crowdsourced datasets with task instructions to achieve superior performance. Our studies demonstrate that an attacker can inject backdoors by issuing very few malicious instructions (~1000 tokens) and control model behavior through data poisoning, without even the need to modify data instances or labels themselves. Through such instruction attacks, the attacker can achieve over 90% attack success rate across four commonly used NLP datasets. As an empirical study on instruction attacks, we systematically evaluated unique perspectives of instruction attacks, such as poison transfer where poisoned models can transfer to 15 diverse generative datasets in a zero-shot manner; instruction transfer where attackers can directly apply poisoned instruction on many other datasets; and poison resistance to continual finetuning. Lastly, we show that RLHF and clean demonstrations might mitigate such backdoors to some degree. These findings highlight the need for more robust defenses against poisoning attacks in instruction-tuning models and underscore the importance of ensuring data quality in instruction crowdsourcing. △ Less

Submitted 3 April, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: NAACL 2024

arXiv:2305.13753 [pdf, other]

A Graph-Based Collision Resolution Scheme for Asynchronous Unsourced Random Access

Authors: Tianya Li, Yongpeng Wu, Wenjun Zhang, Xiang-Gen Xia, Chengshan Xiao

Abstract: This paper investigates the multiple-input-multiple-output (MIMO) massive unsourced random access in an asynchronous orthogonal frequency division multiplexing (OFDM) system, with both timing and frequency offsets (TFO) and non-negligible user collisions. The proposed coding framework splits the data into two parts encoded by sparse regression code (SPARC) and low-density parity check (LDPC) code.… ▽ More This paper investigates the multiple-input-multiple-output (MIMO) massive unsourced random access in an asynchronous orthogonal frequency division multiplexing (OFDM) system, with both timing and frequency offsets (TFO) and non-negligible user collisions. The proposed coding framework splits the data into two parts encoded by sparse regression code (SPARC) and low-density parity check (LDPC) code. Multistage orthogonal pilots are transmitted in the first part to reduce collision density. Unlike existing schemes requiring a quantization codebook with a large size for estimating TFO, we establish a \textit{graph-based channel reconstruction and collision resolution (GB-CR$^2$)} algorithm to iteratively reconstruct channels, resolve collisions, and compensate for TFO rotations on the formulated graph jointly among multiple stages. We further propose to leverage the geometric characteristics of signal constellations to correct TFO estimations. Exhaustive simulations demonstrate remarkable performance superiority in channel estimation and data recovery with substantial complexity reduction compared to state-of-the-art schemes. △ Less

Submitted 18 August, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: 6 pages, 6 figures, accepted for the presentation at IEEE GLOBECOM 2023

arXiv:2305.13323 [pdf, other]

doi 10.1103/PhysRevD.108.063002

Rescaling strange-cluster stars and its implications on gravitational-wave echoes

Authors: Chen Zhang, Yong Gao, Cheng-Jun Xia, Renxin Xu

Abstract: Solid states of strange-cluster matter called strangeon matter can form strangeon stars that are highly compact. We show that strangeon matter and strangeon stars can be recast into dimensionless forms by a simple reparametrization and rescaling, through which we manage to maximally reduce the number of degrees of freedom. With this dimensionless scheme, we find that strangeon stars are generally… ▽ More Solid states of strange-cluster matter called strangeon matter can form strangeon stars that are highly compact. We show that strangeon matter and strangeon stars can be recast into dimensionless forms by a simple reparametrization and rescaling, through which we manage to maximally reduce the number of degrees of freedom. With this dimensionless scheme, we find that strangeon stars are generally compact enough to feature a photon sphere that is essential to foster gravitational-wave (GW) echoes. Rescaling the dimension back, we illustrate its implications on the expanded dimensional parameter space, and calculate the GW echo frequencies associated with strangeon stars, showing that the minimum echo frequency is $\sim 8$ kHz for empirical parameter space that satisfies the GW170817 constraint, and can reduce to $\mathcal O(100)$ Hertz at the extended limit. △ Less

Submitted 2 October, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

Comments: 8 pages, 6 figures. Published version. Typo fixed

Journal ref: Phys. Rev. D 108, 063002 (2023)

arXiv:2305.12788 [pdf, other]

GraphCare: Enhancing Healthcare Predictions with Personalized Knowledge Graphs

Authors: Pengcheng Jiang, Cao Xiao, Adam Cross, Jimeng Sun

Abstract: Clinical predictive models often rely on patients' electronic health records (EHR), but integrating medical knowledge to enhance predictions and decision-making is challenging. This is because personalized predictions require personalized knowledge graphs (KGs), which are difficult to generate from patient EHR data. To address this, we propose \textsc{GraphCare}, an open-world framework that uses… ▽ More Clinical predictive models often rely on patients' electronic health records (EHR), but integrating medical knowledge to enhance predictions and decision-making is challenging. This is because personalized predictions require personalized knowledge graphs (KGs), which are difficult to generate from patient EHR data. To address this, we propose \textsc{GraphCare}, an open-world framework that uses external KGs to improve EHR-based predictions. Our method extracts knowledge from large language models (LLMs) and external biomedical KGs to build patient-specific KGs, which are then used to train our proposed Bi-attention AugmenTed (BAT) graph neural network (GNN) for healthcare predictions. On two public datasets, MIMIC-III and MIMIC-IV, \textsc{GraphCare} surpasses baselines in four vital healthcare prediction tasks: mortality, readmission, length of stay (LOS), and drug recommendation. On MIMIC-III, it boosts AUROC by 17.6\% and 6.6\% for mortality and readmission, and F1-score by 7.9\% and 10.8\% for LOS and drug recommendation, respectively. Notably, \textsc{GraphCare} demonstrates a substantial edge in scenarios with limited data availability. Our findings highlight the potential of using external KGs in healthcare prediction tasks and demonstrate the promise of \textsc{GraphCare} in generating personalized KGs for promoting personalized medicine. △ Less

Submitted 17 January, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: ICLR 2024

arXiv:2305.12081 [pdf, other]

MediTab: Scaling Medical Tabular Data Predictors via Data Consolidation, Enrichment, and Refinement

Authors: Zifeng Wang, Chufan Gao, Cao Xiao, Jimeng Sun

Abstract: Tabular data prediction has been employed in medical applications such as patient health risk prediction. However, existing methods usually revolve around the algorithm design while overlooking the significance of data engineering. Medical tabular datasets frequently exhibit significant heterogeneity across different sources, with limited sample sizes per source. As such, previous predictors are o… ▽ More Tabular data prediction has been employed in medical applications such as patient health risk prediction. However, existing methods usually revolve around the algorithm design while overlooking the significance of data engineering. Medical tabular datasets frequently exhibit significant heterogeneity across different sources, with limited sample sizes per source. As such, previous predictors are often trained on manually curated small datasets that struggle to generalize across different tabular datasets during inference. This paper proposes to scale medical tabular data predictors (MediTab) to various tabular inputs with varying features. The method uses a data engine that leverages large language models (LLMs) to consolidate tabular samples to overcome the barrier across tables with distinct schema. It also aligns out-domain data with the target task using a "learn, annotate, and refinement" pipeline. The expanded training data then enables the pre-trained MediTab to infer for arbitrary tabular input in the domain without fine-tuning, resulting in significant improvements over supervised baselines: it reaches an average ranking of 1.57 and 1.00 on 7 patient outcome prediction datasets and 3 trial outcome prediction datasets, respectively. In addition, MediTab exhibits impressive zero-shot performances: it outperforms supervised XGBoost models by 8.9% and 17.2% on average in two prediction tasks, respectively. △ Less

Submitted 30 April, 2024; v1 submitted 19 May, 2023; originally announced May 2023.

Comments: IJCAI 2024

arXiv:2305.11366 [pdf, other]

AutoTrial: Prompting Language Models for Clinical Trial Design

Authors: Zifeng Wang, Cao Xiao, Jimeng Sun

Abstract: Clinical trials are critical for drug development. Constructing the appropriate eligibility criteria (i.e., the inclusion/exclusion criteria for patient recruitment) is essential for the trial's success. Proper design of clinical trial protocols should consider similar precedent trials and their eligibility criteria to ensure sufficient patient coverage. In this paper, we present a method named Au… ▽ More Clinical trials are critical for drug development. Constructing the appropriate eligibility criteria (i.e., the inclusion/exclusion criteria for patient recruitment) is essential for the trial's success. Proper design of clinical trial protocols should consider similar precedent trials and their eligibility criteria to ensure sufficient patient coverage. In this paper, we present a method named AutoTrial to aid the design of clinical eligibility criteria using language models. It allows (1) controllable generation under instructions via a hybrid of discrete and neural prompting, (2) scalable knowledge incorporation via in-context learning, and (3) explicit reasoning chains to provide rationales for understanding the outputs. Experiments on over 70K clinical trials verify that AutoTrial generates high-quality criteria texts that are fluent and coherent and with high accuracy in capturing the relevant clinical concepts to the target trial. It is noteworthy that our method, with a much smaller parameter size, gains around 60% winning rate against the GPT-3.5 baselines via human evaluations. △ Less

Submitted 7 October, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

Comments: EMNLP 2023 Main

arXiv:2305.05847 [pdf, other]

doi 10.3389/fspas.2023.1334642

Ultra low-mass and small-radius white dwarfs made of heavy elements

Authors: Cheng-Jun Xia, Yong-Feng Huang, Hong-Bo Li, Li**g Shao, Ren-Xin Xu

Abstract: Seven ultra low-mass and small-radius white dwarfs (LSPM J0815+1633, LP 240-30, BD+20 5125B, LP 462-12, WD J1257+5428, 2MASS J13453297+4200437, and SDSS J085557.46+053524.5) have been recently identified with masses ranging from $\sim$0.02 $M_\odot$ to $\sim$0.08 $M_\odot$ and radii from $\sim$ 4270 km to 10670 km. The mass-radius measurements of these white dwarfs pose challenges to traditional w… ▽ More Seven ultra low-mass and small-radius white dwarfs (LSPM J0815+1633, LP 240-30, BD+20 5125B, LP 462-12, WD J1257+5428, 2MASS J13453297+4200437, and SDSS J085557.46+053524.5) have been recently identified with masses ranging from $\sim$0.02 $M_\odot$ to $\sim$0.08 $M_\odot$ and radii from $\sim$ 4270 km to 10670 km. The mass-radius measurements of these white dwarfs pose challenges to traditional white dwarf models assuming they are mostly made of nuclei lighter than $^{56}$Fe. In this work we consider the possibility that those white dwarfs are made of heavier elements. Due to the small charge-to-mass ratios in heavy elements, the electron number density in white dwarf matter is effectively reduced, which reduces the pressure with additional contributions of lattice energy and electron polarization corrections. This consequently leads to white dwarfs with much smaller masses and radii, which coincide with the seven ultra low-mass and small-radius white dwarfs. The corresponding equation of state and matter contents of dense stellar matter with and without reaching the cold-catalyzed ground state are presented, which are obtained using the latest Atomic Mass Evaluation (AME 2020). Further observations are necessary to unveil the actual matter contents in those white dwarfs via, e.g., spectroscopy, asteroseismology, and discoveries of other ultra low-mass and small-radius white dwarfs. △ Less

Submitted 9 May, 2023; originally announced May 2023.

Journal ref: Front. Astron. Space Sci. 10 (2023) 1334642

arXiv:2305.05687 [pdf, other]

doi 10.3847/1538-4357/accc89

Coronal Heating as Determined by the Solar Flare Frequency Distribution Obtained by Aggregating Case Studies

Authors: James Paul Mason, Alexandra Werth, Colin G. West, Allison A. Youngblood, Donald L. Woodraska, Courtney Peck, Kevin Lacjak, Florian G. Frick, Moutamen Gabir, Reema A. Alsinan, Thomas Jacobsen, Mohammad Alrubaie, Kayla M. Chizmar, Benjamin P. Lau, Lizbeth Montoya Dominguez, David Price, Dylan R. Butler, Connor J. Biron, Nikita Feoktistov, Kai Dewey, N. E. Loomis, Michal Bodzianowski, Connor Kuybus, Henry Dietrick, Aubrey M. Wolfe , et al. (977 additional authors not shown)

Abstract: Flare frequency distributions represent a key approach to addressing one of the largest problems in solar and stellar physics: determining the mechanism that counter-intuitively heats coronae to temperatures that are orders of magnitude hotter than the corresponding photospheres. It is widely accepted that the magnetic field is responsible for the heating, but there are two competing mechanisms th… ▽ More Flare frequency distributions represent a key approach to addressing one of the largest problems in solar and stellar physics: determining the mechanism that counter-intuitively heats coronae to temperatures that are orders of magnitude hotter than the corresponding photospheres. It is widely accepted that the magnetic field is responsible for the heating, but there are two competing mechanisms that could explain it: nanoflares or Alfvén waves. To date, neither can be directly observed. Nanoflares are, by definition, extremely small, but their aggregate energy release could represent a substantial heating mechanism, presuming they are sufficiently abundant. One way to test this presumption is via the flare frequency distribution, which describes how often flares of various energies occur. If the slope of the power law fitting the flare frequency distribution is above a critical threshold, $α=2$ as established in prior literature, then there should be a sufficient abundance of nanoflares to explain coronal heating. We performed $>$600 case studies of solar flares, made possible by an unprecedented number of data analysts via three semesters of an undergraduate physics laboratory course. This allowed us to include two crucial, but nontrivial, analysis methods: pre-flare baseline subtraction and computation of the flare energy, which requires determining flare start and stop times. We aggregated the results of these analyses into a statistical study to determine that $α= 1.63 \pm 0.03$. This is below the critical threshold, suggesting that Alfvén waves are an important driver of coronal heating. △ Less

Submitted 9 May, 2023; originally announced May 2023.

Comments: 1,002 authors, 14 pages, 4 figures, 3 tables, published by The Astrophysical Journal on 2023-05-09, volume 948, page 71

arXiv:2305.02911 [pdf, other]

UPDExplainer: an Interpretable Transformer-based Framework for Urban Physical Disorder Detection Using Street View Imagery

Authors: Chuanbo Hu, Shan Jia, Fan Zhang, Changjiang Xiao, Mindi Ruan, Jacob Thrasher, Xin Li

Abstract: Urban Physical Disorder (UPD), such as old or abandoned buildings, broken sidewalks, litter, and graffiti, has a negative impact on residents' quality of life. They can also increase crime rates, cause social disorder, and pose a public health risk. Currently, there is a lack of efficient and reliable methods for detecting and understanding UPD. To bridge this gap, we propose UPDExplainer, an inte… ▽ More Urban Physical Disorder (UPD), such as old or abandoned buildings, broken sidewalks, litter, and graffiti, has a negative impact on residents' quality of life. They can also increase crime rates, cause social disorder, and pose a public health risk. Currently, there is a lack of efficient and reliable methods for detecting and understanding UPD. To bridge this gap, we propose UPDExplainer, an interpretable transformer-based framework for UPD detection. We first develop a UPD detection model based on the Swin Transformer architecture, which leverages readily accessible street view images to learn discriminative representations. In order to provide clear and comprehensible evidence and analysis, we subsequently introduce a UPD factor identification and ranking module that combines visual explanation maps with semantic segmentation maps. This novel integrated approach enables us to identify the exact objects within street view images that are responsible for physical disorders and gain insights into the underlying causes. Experimental results on the re-annotated Place Pulse 2.0 dataset demonstrate promising detection performance of the proposed method, with an accuracy of 79.9%. For a comprehensive evaluation of the method's ranking performance, we report the mean Average Precision (mAP), R-Precision (RPrec), and Normalized Discounted Cumulative Gain (NDCG), with success rates of 75.51%, 80.61%, and 82.58%, respectively. We also present a case study of detecting and ranking physical disorders in the southern region of downtown Los Angeles, California, to demonstrate the practicality and effectiveness of our framework. △ Less

Submitted 4 May, 2023; originally announced May 2023.

arXiv:2305.02394 [pdf, other]

Defending against Insertion-based Textual Backdoor Attacks via Attribution

Authors: Jiazhao Li, Zhuofeng Wu, Wei **, Chaowei Xiao, V. G. Vinod Vydiswaran

Abstract: Textual backdoor attack, as a novel attack model, has been shown to be effective in adding a backdoor to the model during training. Defending against such backdoor attacks has become urgent and important. In this paper, we propose AttDef, an efficient attribution-based pipeline to defend against two insertion-based poisoning attacks, BadNL and InSent. Specifically, we regard the tokens with larger… ▽ More Textual backdoor attack, as a novel attack model, has been shown to be effective in adding a backdoor to the model during training. Defending against such backdoor attacks has become urgent and important. In this paper, we propose AttDef, an efficient attribution-based pipeline to defend against two insertion-based poisoning attacks, BadNL and InSent. Specifically, we regard the tokens with larger attribution scores as potential triggers since larger attribution words contribute more to the false prediction results and therefore are more likely to be poison triggers. Additionally, we further utilize an external pre-trained language model to distinguish whether input is poisoned or not. We show that our proposed method can generalize sufficiently well in two common attack scenarios (poisoning training data and testing data), which consistently improves previous methods. For instance, AttDef can successfully mitigate both attacks with an average accuracy of 79.97% (56.59% up) and 48.34% (3.99% up) under pre-training and post-training attack defense respectively, achieving the new state-of-the-art performance on prediction recovery over four benchmark datasets. △ Less

Submitted 6 August, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

Comments: Findings of ACL 2023. Camera-ready version

Report number: 15 pages

Journal ref: Findings of ACL 2023, July 2023, Page 8818-8833, Toronto, Canada

arXiv:2305.01855 [pdf, other]

doi 10.1145/3607827.3616839

Multimodal Data Augmentation for Image Captioning using Diffusion Models

Authors: Changrong Xiao, Sean Xin Xu, Kunpeng Zhang

Abstract: Image captioning, an important vision-language task, often requires a tremendous number of finely labeled image-caption pairs for learning the underlying alignment between images and texts. In this paper, we proposed a multimodal data augmentation method, leveraging a recent text-to-image model called Stable Diffusion, to expand the training set via high-quality generation of image-caption pairs.… ▽ More Image captioning, an important vision-language task, often requires a tremendous number of finely labeled image-caption pairs for learning the underlying alignment between images and texts. In this paper, we proposed a multimodal data augmentation method, leveraging a recent text-to-image model called Stable Diffusion, to expand the training set via high-quality generation of image-caption pairs. Extensive experiments on the MS COCO dataset demonstrate the advantages of our approach over several benchmark methods, and particularly a significant boost when having fewer training instances. In addition, models trained on our augmented datasets also outperform prior unpaired image captioning methods by a large margin. Finally, further improvement regarding the training efficiency and effectiveness can be obtained after intentionally filtering the generated data based on quality assessment. △ Less

Submitted 2 May, 2023; originally announced May 2023.

arXiv:2305.01210 [pdf, other]

Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation

Authors: Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, Lingming Zhang

Abstract: Program synthesis has been long studied with recent approaches focused on directly using the power of Large Language Models (LLMs) to generate code. Programming benchmarks, with curated synthesis problems and test-cases, are used to measure the performance of various LLMs on code synthesis. However, these test-cases can be limited in both quantity and quality for fully assessing the functional cor… ▽ More Program synthesis has been long studied with recent approaches focused on directly using the power of Large Language Models (LLMs) to generate code. Programming benchmarks, with curated synthesis problems and test-cases, are used to measure the performance of various LLMs on code synthesis. However, these test-cases can be limited in both quantity and quality for fully assessing the functional correctness of the generated code. Such limitation in the existing benchmarks begs the following question: In the era of LLMs, is the code generated really correct? To answer this, we propose EvalPlus -- a code synthesis evaluation framework to rigorously benchmark the functional correctness of LLM-synthesized code. EvalPlus augments a given evaluation dataset with large amounts of test-cases newly produced by an automatic test input generator, powered by both LLM- and mutation-based strategies. While EvalPlus is general, we extend the test-cases of the popular HumanEval benchmark by 80x to build HumanEval+. Our extensive evaluation across 26 popular LLMs (e.g., GPT-4 and ChatGPT) demonstrates that HumanEval+ is able to catch significant amounts of previously undetected wrong code synthesized by LLMs, reducing the pass@k by up-to 19.3-28.9%. We also surprisingly found that test insufficiency can lead to mis-ranking. For example, both WizardCoder-CodeLlama and Phind-CodeLlama now outperform ChatGPT on HumanEval+, while none of them could on HumanEval. Our work not only indicates that prior popular code synthesis evaluation results do not accurately reflect the true performance of LLMs for code synthesis, but also opens up a new direction to improve such programming benchmarks through automated testing. We have open-sourced our tools, enhanced datasets as well as all LLM-generated code at https://github.com/evalplus/evalplus to facilitate and accelerate future LLM-for-code research. △ Less

Submitted 30 October, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

arXiv:2304.14475 [pdf, other]

ChatGPT as an Attack Tool: Stealthy Textual Backdoor Attack via Blackbox Generative Model Trigger

Authors: Jiazhao Li, Yi** Yang, Zhuofeng Wu, V. G. Vinod Vydiswaran, Chaowei Xiao

Abstract: Textual backdoor attacks pose a practical threat to existing systems, as they can compromise the model by inserting imperceptible triggers into inputs and manipulating labels in the training dataset. With cutting-edge generative models such as GPT-4 pushing rewriting to extraordinary levels, such attacks are becoming even harder to detect. We conduct a comprehensive investigation of the role of bl… ▽ More Textual backdoor attacks pose a practical threat to existing systems, as they can compromise the model by inserting imperceptible triggers into inputs and manipulating labels in the training dataset. With cutting-edge generative models such as GPT-4 pushing rewriting to extraordinary levels, such attacks are becoming even harder to detect. We conduct a comprehensive investigation of the role of black-box generative models as a backdoor attack tool, highlighting the importance of researching relative defense strategies. In this paper, we reveal that the proposed generative model-based attack, BGMAttack, could effectively deceive textual classifiers. Compared with the traditional attack methods, BGMAttack makes the backdoor trigger less conspicuous by leveraging state-of-the-art generative models. Our extensive evaluation of attack effectiveness across five datasets, complemented by three distinct human cognition assessments, reveals that Figure 4 achieves comparable attack performance while maintaining superior stealthiness relative to baseline methods. △ Less

Submitted 27 April, 2023; originally announced April 2023.

arXiv:2304.14190 [pdf, other]

Quadric Representations for LiDAR Odometry, Map** and Localization

Authors: Chao Xia, Chenfeng Xu, Patrick Rim, Mingyu Ding, Nanning Zheng, Kurt Keutzer, Masayoshi Tomizuka, Wei Zhan

Abstract: Current LiDAR odometry, map** and localization methods leverage point-wise representations of 3D scenes and achieve high accuracy in autonomous driving tasks. However, the space-inefficiency of methods that use point-wise representations limits their development and usage in practical applications. In particular, scan-submap matching and global map representation methods are restricted by the in… ▽ More Current LiDAR odometry, map** and localization methods leverage point-wise representations of 3D scenes and achieve high accuracy in autonomous driving tasks. However, the space-inefficiency of methods that use point-wise representations limits their development and usage in practical applications. In particular, scan-submap matching and global map representation methods are restricted by the inefficiency of nearest neighbor searching (NNS) for large-volume point clouds. To improve space-time efficiency, we propose a novel method of describing scenes using quadric surfaces, which are far more compact representations of 3D objects than conventional point clouds. In contrast to point cloud-based methods, our quadric representation-based method decomposes a 3D scene into a collection of sparse quadric patches, which improves storage efficiency and avoids the slow point-wise NNS process. Our method first segments a given point cloud into patches and fits each of them to a quadric implicit function. Each function is then coupled with other geometric descriptors of the patch, such as its center position and covariance matrix. Collectively, these patch representations fully describe a 3D scene, which can be used in place of the original point cloud and employed in LiDAR odometry, map** and localization algorithms. We further design a novel incremental growing method for quadric representations, which eliminates the need to repeatedly re-fit quadric surfaces from the original point cloud. Extensive odometry, map** and localization experiments on large-volume point clouds in the KITTI and UrbanLoco datasets demonstrate that our method maintains low latency and memory utility while achieving competitive, and even superior, accuracy. △ Less

Submitted 27 April, 2023; originally announced April 2023.

Showing 201–250 of 1,014 results for author: Xia, C