Search | arXiv e-print repository

Capillary hypersurfaces, Heintze-Karcher's inequality and Zermelo's navigation

Abstract: In this paper, we establish a Heintze-Karcher-type inequality for capillary hypersurfaces in a unit ball. To achieve this, we introduce a special Finsler metric given by Zermelo's navigation and study the geodesic normal flow with respect to this Finsler metric. Our results indicate that the relationship between capillary hypersufaces and hypersurfaces with free boundary is similar to the one betw… ▽ More In this paper, we establish a Heintze-Karcher-type inequality for capillary hypersurfaces in a unit ball. To achieve this, we introduce a special Finsler metric given by Zermelo's navigation and study the geodesic normal flow with respect to this Finsler metric. Our results indicate that the relationship between capillary hypersufaces and hypersurfaces with free boundary is similar to the one between Finsler geometry and Riemannian geometry. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: 18 pages

arXiv:2401.06431 [pdf, other]

Human-AI Collaborative Essay Scoring: A Dual-Process Framework with LLMs

Authors: Changrong Xiao, Wenxing Ma, Qing** Song, Sean Xin Xu, Kunpeng Zhang, Yufang Wang, Qi Fu

Abstract: Receiving timely and personalized feedback is essential for second-language learners, especially when human instructors are unavailable. This study explores the effectiveness of Large Language Models (LLMs), including both proprietary and open-source models, for Automated Essay Scoring (AES). Through extensive experiments with public and private datasets, we find that while LLMs do not surpass con… ▽ More Receiving timely and personalized feedback is essential for second-language learners, especially when human instructors are unavailable. This study explores the effectiveness of Large Language Models (LLMs), including both proprietary and open-source models, for Automated Essay Scoring (AES). Through extensive experiments with public and private datasets, we find that while LLMs do not surpass conventional state-of-the-art (SOTA) grading models in performance, they exhibit notable consistency, generalizability, and explainability. We propose an open-source LLM-based AES system, inspired by the dual-process theory. Our system offers accurate grading and high-quality feedback, at least comparable to that of fine-tuned proprietary LLMs, in addition to its ability to alleviate misgrading. Furthermore, we conduct human-AI co-grading experiments with both novice and expert graders. We find that our system not only automates the grading process but also enhances the performance and efficiency of human graders, particularly for essays where the model has lower confidence. These results highlight the potential of LLMs to facilitate effective human-AI collaboration in the educational context, potentially transforming learning experiences through AI-generated feedback. △ Less

Submitted 14 June, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

arXiv:2401.06293 [pdf, other]

MultiSlot ReRanker: A Generic Model-based Re-Ranking Framework in Recommendation Systems

Authors: Qiang Charles Xiao, Ajith Muralidharan, Birjodh Tiwana, Johnson Jia, Fedor Borisyuk, Aman Gupta, Dawn Woodard

Abstract: In this paper, we propose a generic model-based re-ranking framework, MultiSlot ReRanker, which simultaneously optimizes relevance, diversity, and freshness. Specifically, our Sequential Greedy Algorithm (SGA) is efficient enough (linear time complexity) for large-scale production recommendation engines. It achieved a lift of $+6\%$ to $ +10\%$ offline Area Under the receiver operating characteris… ▽ More In this paper, we propose a generic model-based re-ranking framework, MultiSlot ReRanker, which simultaneously optimizes relevance, diversity, and freshness. Specifically, our Sequential Greedy Algorithm (SGA) is efficient enough (linear time complexity) for large-scale production recommendation engines. It achieved a lift of $+6\%$ to $ +10\%$ offline Area Under the receiver operating characteristic Curve (AUC) which is mainly due to explicitly modeling mutual influences among items of a list, and leveraging the second pass ranking scores of multiple objectives. In addition, we have generalized the offline replay theory to multi-slot re-ranking scenarios, with trade-offs among multiple objectives. The offline replay results can be further improved by Pareto Optimality. Moreover, we've built a multi-slot re-ranking simulator based on OpenAI Gym integrated with the Ray framework. It can be easily configured for different assumptions to quickly benchmark both reinforcement learning and supervised learning algorithms. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: 10 pages

arXiv:2401.05561 [pdf, other]

TrustLLM: Trustworthiness in Large Language Models

Authors: Lichao Sun, Yue Huang, Haoran Wang, Siyuan Wu, Qihui Zhang, Yuan Li, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, Zhengliang Liu, Yixin Liu, Yijue Wang, Zhikun Zhang, Bertie Vidgen, Bhavya Kailkhura, Caiming Xiong, Chaowei Xiao, Chunyuan Li, Eric Xing, Furong Huang, Hao Liu, Heng Ji, Hongyi Wang , et al. (45 additional authors not shown)

Abstract: Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in… ▽ More Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and utility (i.e., functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. However, a few open-source LLMs come very close to proprietary ones. Thirdly, it is important to note that some LLMs may be overly calibrated towards exhibiting trustworthiness, to the extent that they compromise their utility by mistakenly treating benign prompts as harmful and consequently not responding. Finally, we emphasize the importance of ensuring transparency not only in the models themselves but also in the technologies that underpin trustworthiness. Knowing the specific trustworthy technologies that have been employed is crucial for analyzing their effectiveness. △ Less

Submitted 17 March, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

Comments: This work is still under work and we welcome your contribution

arXiv:2401.04897 [pdf]

HYPIC: A fast hybrid EM PIC-MCC code for ion cyclotron resonance energization in cylindrical coordinate system

Authors: Mingyang Wu, Andong Xu, Chijie Xiao

Abstract: Ion cyclotron resonance energization (ICRE) such as ion cyclotron resonance heating (ICRH) is widely applied to magnetic confinement fusion and high-power electric propulsion. Since ICRE involves cyclotron resonance processes, a kinetic model is required. Both conventional particle-in-cell (PIC) simulations and solving the Boltzmann equation require enormous computation and memory. The hybrid simu… ▽ More Ion cyclotron resonance energization (ICRE) such as ion cyclotron resonance heating (ICRH) is widely applied to magnetic confinement fusion and high-power electric propulsion. Since ICRE involves cyclotron resonance processes, a kinetic model is required. Both conventional particle-in-cell (PIC) simulations and solving the Boltzmann equation require enormous computation and memory. The hybrid simulation incorporating of adiabatic electrons and PIC ions allows both a substantial reduction in computation and the inclusion of cyclotron resonance effects. Under the adiabatic electron approximation, we have developed a two-dimensional (r,z) hybrid electromagnetic (EM) PIC-MCC (Monte-Carlo collision) simulation program, named HYPIC. The advantages of HYPIC are the inclusion of ion kinetic effects, electrostatic (ES) and EM effects, and collisional effects of ions and electrons, with a small computation. The HYPIC program is able to fast simulate the antenna-plasma interactions and the ion cyclotron resonance energization and/or ion cyclotron resonance heating processes in linear devices, such as high-power electric propulsion, magnetic mirror, and field-reversed-configuration (FRC), etc. △ Less

Submitted 9 January, 2024; originally announced January 2024.

arXiv:2401.04587 [pdf, ps, other]

$\bar{B}_s^0 \to D_{s1}(2460)^+ K^-, D_{s1}(2536)^+ K^-$ and the nature of the two $D_{s1}$ resonance

Authors: Jia-Xin Lin, Hua-Xing Chen, Wei-Hong Liang, Chu-Wen Xiao, Eulogio Oset

Abstract: Starting from the molecular picture for the $D_{s1}(2460)$ and $D_{s1}(2536)$ resonances, which are dynamically generated by the interaction of coupled channels, the most important of which are the $D^*K$ for the $D_{s1}(2460)$ and $DK^*$ for the $D_{s1}(2536)$, we evaluate the ratio of decay widths for the $\bar{B}_s^0 \to D_{s1}(2460)^+ K^-$ and $\bar{B}_s^0 \to D_{s1}(2536)^+ K^-$ decays, the l… ▽ More Starting from the molecular picture for the $D_{s1}(2460)$ and $D_{s1}(2536)$ resonances, which are dynamically generated by the interaction of coupled channels, the most important of which are the $D^*K$ for the $D_{s1}(2460)$ and $DK^*$ for the $D_{s1}(2536)$, we evaluate the ratio of decay widths for the $\bar{B}_s^0 \to D_{s1}(2460)^+ K^-$ and $\bar{B}_s^0 \to D_{s1}(2536)^+ K^-$ decays, the latter of which has been recently investigated by the LHCb collaboration, and we obtain a ratio of the order of unity. The present results should provide an incentive for the related decay into the $D_{s1}(2460)$ resonance to be performed, which would provide valuable information on the nature of these two resonances. △ Less

Submitted 22 April, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

Comments: 14 pages, 4 figures, 2 tables; V2: discussions added, references added, version to appear in Eur.Phys.J.C

arXiv:2401.04044 [pdf, other]

FFSplit: Split Feed-Forward Network For Optimizing Accuracy-Efficiency Trade-off in Language Model Inference

Authors: Zirui Liu, Qingquan Song, Qiang Charles Xiao, Sathiya Keerthi Selvaraj, Rahul Mazumder, Aman Gupta, Xia Hu

Abstract: The large number of parameters in Pretrained Language Models enhance their performance, but also make them resource-intensive, making it challenging to deploy them on commodity hardware like a single GPU. Due to the memory and power limitations of these devices, model compression techniques are often used to decrease both the model's size and its inference latency. This usually results in a trade-… ▽ More The large number of parameters in Pretrained Language Models enhance their performance, but also make them resource-intensive, making it challenging to deploy them on commodity hardware like a single GPU. Due to the memory and power limitations of these devices, model compression techniques are often used to decrease both the model's size and its inference latency. This usually results in a trade-off between model accuracy and efficiency. Therefore, optimizing this balance is essential for effectively deploying LLMs on commodity hardware. A significant portion of the efficiency challenge is the Feed-forward network (FFN) component, which accounts for roughly $\frac{2}{3}$ total parameters and inference latency. In this paper, we first observe that only a few neurons of FFN module have large output norm for any input tokens, a.k.a. heavy hitters, while the others are sparsely triggered by different tokens. Based on this observation, we explicitly split the FFN into two parts according to the heavy hitters. We improve the efficiency-accuracy trade-off of existing compression methods by allocating more resource to FFN parts with heavy hitters. In practice, our method can reduce model size by 43.1\% and bring $1.25\sim1.56\times$ wall clock time speedup on different hardware with negligible accuracy drop. △ Less

Submitted 8 January, 2024; originally announced January 2024.

arXiv:2401.03772 [pdf, other]

doi 10.1007/JHEP03(2024)076

Boosted Dark Matter From Centaurus A and Its Detection

Authors: Chen Xia, Chuan-Yang Xing, Yan-Hao Xu

Abstract: Dark matter can be boosted by high energy particles in astrophysical environments through elastic scattering. We study the production of boosted dark matter via scattering with electrons in the relativistic jet of the closest active galactic nucleus, Centaurus A, and its detection in the Super-Kamiokande experiment. Since there are a huge number of electrons in the jet and dark matter is extremely… ▽ More Dark matter can be boosted by high energy particles in astrophysical environments through elastic scattering. We study the production of boosted dark matter via scattering with electrons in the relativistic jet of the closest active galactic nucleus, Centaurus A, and its detection in the Super-Kamiokande experiment. Since there are a huge number of electrons in the jet and dark matter is extremely dense around the supermassive black hole that powers the jet, the number of boosted dark matter is tremendously large. Compared to boosted dark matter from blazars, the dark matter flux from Centaurus A is enhanced due to the proximity of Centaurus A. The constraint on dark matter-electron scattering cross section set by Super-Kamiokande is more stringent, down to $\sim 10^{-36} \, \mathrm{cm}^2$ for $\mathrm{MeV}$ dark matter. △ Less

Submitted 14 March, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

Comments: 18 pages, 5 figures, version appeared in JHEP

Journal ref: JHEP03(2024)076

arXiv:2401.02967 [pdf]

Large enhancement of spin-orbit torques under a MHz modulation due to phonon-magnon coupling

Authors: Hanying Zhang, Qianwen Zhao, Baiqing Jiang, Yuan Wang, Tunan Xie, Kaihua Lou, ChaoChao Xia, C. Bi

Abstract: The discovery of spin-orbit torques (SOTs) generated through the spin Hall or Rashba effects provides an alternative write approach for magnetic random-access memory (MRAM), igniting the development of spin-orbitronics in recent years. Quantitative characterization of SOTs highly relies on the SOT-driven ferromagnetic resonance (ST-FMR), where a modulated microwave current is used to generate ac S… ▽ More The discovery of spin-orbit torques (SOTs) generated through the spin Hall or Rashba effects provides an alternative write approach for magnetic random-access memory (MRAM), igniting the development of spin-orbitronics in recent years. Quantitative characterization of SOTs highly relies on the SOT-driven ferromagnetic resonance (ST-FMR), where a modulated microwave current is used to generate ac SOTs and the modulation-frequency is usually less than 100 kHz (the limit of conventional lock-in amplifiers). Here we have investigated the SOT of typical SOT material/ferromagnet bilayers in an extended modulation-frequency range, up to MHz, by develo** the ST-FMR measurement. Remarkably, we found that the measured SOTs are enhanced about three times in the MHz range, which cannot be explained according to present SOT theory. We attribute the enhancement of SOT to additional magnon excitations due to phonon-magnon coupling, which is also reflected in the slight changes of resonant field and linewidth in the acquired ST-FMR spectra, corresponding to the modifications of effective magnetization and dam** constant, respectively. Our results indicate that the write current of SOT-MRAM may be reduced with the assistant of phonon-magnon coupling. △ Less

Submitted 1 December, 2023; originally announced January 2024.

arXiv:2401.01204 [pdf, other]

PPBFL: A Privacy Protected Blockchain-based Federated Learning Model

Authors: Yang Li, Chunhe Xia, Wanshuang Lin, Tianbo Wang

Abstract: With the rapid development of machine learning and a growing concern for data privacy, federated learning has become a focal point of attention. However, attacks on model parameters and a lack of incentive mechanisms hinder the effectiveness of federated learning. Therefore, we propose A Privacy Protected Blockchain-based Federated Learning Model (PPBFL) to enhance the security of federated learni… ▽ More With the rapid development of machine learning and a growing concern for data privacy, federated learning has become a focal point of attention. However, attacks on model parameters and a lack of incentive mechanisms hinder the effectiveness of federated learning. Therefore, we propose A Privacy Protected Blockchain-based Federated Learning Model (PPBFL) to enhance the security of federated learning and encourage active participation of nodes in model training. Blockchain technology ensures the integrity of model parameters stored in the InterPlanetary File System (IPFS), providing protection against tampering. Within the blockchain, we introduce a Proof of Training Work (PoTW) consensus algorithm tailored for federated learning, aiming to incentive training nodes. This algorithm rewards nodes with greater computational power, promoting increased participation and effort in the federated learning process. A novel adaptive differential privacy algorithm is simultaneously applied to local and global models. This safeguards the privacy of local data at training clients, preventing malicious nodes from launching inference attacks. Additionally, it enhances the security of the global model, preventing potential security degradation resulting from the combination of numerous local models. The possibility of security degradation is derived from the composition theorem. By introducing reverse noise in the global model, a zero-bias estimate of differential privacy noise between local and global models is achieved. Furthermore, we propose a new mix transactions mechanism utilizing ring signature technology to better protect the identity privacy of local training clients. Security analysis and experimental results demonstrate that PPBFL, compared to baseline methods, not only exhibits superior model performance but also achieves higher security. △ Less

Submitted 8 January, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

arXiv:2401.00747 [pdf, other]

Polynomial-time Approximation Scheme for Equilibriums of Games

Authors: Hongbo Sun, Chongkun Xia, Junbo Tan, Bo Yuan, Xueqian Wang, Bin Liang

Abstract: Whether a PTAS (polynomial-time approximation scheme) exists for equilibriums of games has been an open question, which relates to questions in three fields, the practicality of methods in algorithmic game theory, the equation PPAD=FP about the two complexity classes in computational complexity theory, and non-stationarity and curse of multiagency in MARL (multi-agent reinforcement learning). This… ▽ More Whether a PTAS (polynomial-time approximation scheme) exists for equilibriums of games has been an open question, which relates to questions in three fields, the practicality of methods in algorithmic game theory, the equation PPAD=FP about the two complexity classes in computational complexity theory, and non-stationarity and curse of multiagency in MARL (multi-agent reinforcement learning). This paper introduces our discovery of the sufficient and necessary conditions for iterations based on dynamic programming and line search to approximate perfect equilibriums of dynamic games, out of which we construct a method proved to be a FPTAS (fully PTAS) for non-singular perfect equilibriums of dynamic games, where for almost any given dynamic game, all its perfect equilibriums are non-singular, indicating that FP$\subseteq$PPAD$\subseteq$Almost-FP. Our discovery consists of cone interior dynamic programming and primal-dual unbiased regret minimization, which fit into existing theories by degeneration in a structure-preserving manner. The former enables a dynamic programming operator to iteratively converge to a perfect equilibrium based on a concept called policy cone. The latter enables an interior-point line search to approximate a Nash equilibrium based on two concepts called primal-dual bias and unbiased central variety, solving a subproblem of the former. Validity of our discovery is cross-corroborated by a combination of theorem proofs, graphs of the three main concepts, and experimental results. △ Less

Submitted 3 June, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

Comments: 23 pages, 7 figures, code and animation are available at https://github.com/shb20tsinghua/PTAS_Game/tree/main

MSC Class: 90C39; 90C51; 91A15

arXiv:2312.16498 [pdf, other]

A Non-Uniform Low-Light Image Enhancement Method with Multi-Scale Attention Transformer and Luminance Consistency Loss

Authors: Xiao Fang, Xin Gao, Baofeng Li, Feng Zhai, Yu Qin, Zhihang Meng, Jiansheng Lu, Chun Xiao

Abstract: Low-light image enhancement aims to improve the perception of images collected in dim environments and provide high-quality data support for image recognition tasks. When dealing with photos captured under non-uniform illumination, existing methods cannot adaptively extract the differentiated luminance information, which will easily cause over-exposure and under-exposure. From the perspective of u… ▽ More Low-light image enhancement aims to improve the perception of images collected in dim environments and provide high-quality data support for image recognition tasks. When dealing with photos captured under non-uniform illumination, existing methods cannot adaptively extract the differentiated luminance information, which will easily cause over-exposure and under-exposure. From the perspective of unsupervised learning, we propose a multi-scale attention Transformer named MSATr, which sufficiently extracts local and global features for light balance to improve the visual quality. Specifically, we present a multi-scale window division scheme, which uses exponential sequences to adjust the window size of each layer. Within different-sized windows, the self-attention computation can be refined, ensuring the pixel-level feature processing capability of the model. For feature interaction across windows, a global transformer branch is constructed to provide comprehensive brightness perception and alleviate exposure problems. Furthermore, we propose a loop training strategy, using the diverse images generated by weighted mixing and a luminance consistency loss to improve the model's generalization ability effectively. Extensive experiments on several benchmark datasets quantitatively and qualitatively prove that our MSATr is superior to state-of-the-art low-light image enhancement methods, and the enhanced images have more natural brightness and outstanding details. The code is released at https://github.com/fang001021/MSATr. △ Less

Submitted 27 December, 2023; originally announced December 2023.

arXiv:2312.14887 [pdf, other]

Spontaneous onset of three-dimensional motion with subsequent spatial and temporal reduction in convective flow systems

Authors: Patrick J. Stofanak, Cheng-Nian Xiao, Inanc Senocak

Abstract: We study the spontaneous emergence of three-dimensional motion from a quiescent, pure conduction state in stably stratified, convective flow within a triangular enclosure, which eventually self-organizes into a two-dimensional steady state. This phenomenon demonstrates that the optimal disturbance path to reach the final state is more complex than the state itself, indicating the "fastest" route i… ▽ More We study the spontaneous emergence of three-dimensional motion from a quiescent, pure conduction state in stably stratified, convective flow within a triangular enclosure, which eventually self-organizes into a two-dimensional steady state. This phenomenon demonstrates that the optimal disturbance path to reach the final state is more complex than the state itself, indicating the "fastest" route involves a higher-dimensional intermediate state. This provides a model for transient spatio-temporal chaos in nonlinear dynamical systems and a challenge for classical hydrodynamic stability theory. △ Less

Submitted 22 December, 2023; originally announced December 2023.

Comments: 8 pages, 9 figures

arXiv:2312.13303 [pdf, other]

RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios

Authors: Wenhao Ding, Yulong Cao, Ding Zhao, Chaowei Xiao, Marco Pavone

Abstract: Simulation plays a crucial role in the development of autonomous vehicles (AVs) due to the potential risks associated with real-world testing. Although significant progress has been made in the visual aspects of simulators, generating complex behavior among agents remains a formidable challenge. It is not only imperative to ensure realism in the scenarios generated but also essential to incorporat… ▽ More Simulation plays a crucial role in the development of autonomous vehicles (AVs) due to the potential risks associated with real-world testing. Although significant progress has been made in the visual aspects of simulators, generating complex behavior among agents remains a formidable challenge. It is not only imperative to ensure realism in the scenarios generated but also essential to incorporate preferences and conditions to facilitate controllable generation for AV training and evaluation. Traditional methods, mainly relying on memorizing the distribution of training datasets, often fall short in generating unseen scenarios. Inspired by the success of retrieval augmented generation in large language models, we present RealGen, a novel retrieval-based in-context learning framework for traffic scenario generation. RealGen synthesizes new scenarios by combining behaviors from multiple retrieved examples in a gradient-free way, which may originate from templates or tagged scenarios. This in-context learning framework endows versatile generative capabilities, including the ability to edit scenarios, compose various behaviors, and produce critical scenarios. Evaluations show that RealGen offers considerable flexibility and controllability, marking a new direction in the field of controllable traffic scenario generation. Check our project website for more information: https://realgen.github.io. △ Less

Submitted 19 December, 2023; originally announced December 2023.

arXiv:2312.12096 [pdf, other]

DLCA-Recon: Dynamic Loose Clothing Avatar Reconstruction from Monocular Videos

Authors: Chunjie Luo, Fei Luo, Yusen Wang, Enxu Zhao, Chunxia Xiao

Abstract: Reconstructing a dynamic human with loose clothing is an important but difficult task. To address this challenge, we propose a method named DLCA-Recon to create human avatars from monocular videos. The distance from loose clothing to the underlying body rapidly changes in every frame when the human freely moves and acts. Previous methods lack effective geometric initialization and constraints for… ▽ More Reconstructing a dynamic human with loose clothing is an important but difficult task. To address this challenge, we propose a method named DLCA-Recon to create human avatars from monocular videos. The distance from loose clothing to the underlying body rapidly changes in every frame when the human freely moves and acts. Previous methods lack effective geometric initialization and constraints for guiding the optimization of deformation to explain this dramatic change, resulting in the discontinuous and incomplete reconstruction surface. To model the deformation more accurately, we propose to initialize an estimated 3D clothed human in the canonical space, as it is easier for deformation fields to learn from the clothed human than from SMPL. With both representations of explicit mesh and implicit SDF, we utilize the physical connection information between consecutive frames and propose a dynamic deformation field (DDF) to optimize deformation fields. DDF accounts for contributive forces on loose clothing to enhance the interpretability of deformations and effectively capture the free movement of loose clothing. Moreover, we propagate SMPL skinning weights to each individual and refine pose and skinning weights during the optimization to improve skinning transformation. Based on more reasonable initialization and DDF, we can simulate real-world physics more accurately. Extensive experiments on public and our own datasets validate that our method can produce superior results for humans with loose clothing compared to the SOTA methods. △ Less

Submitted 20 December, 2023; v1 submitted 19 December, 2023; originally announced December 2023.

arXiv:2312.11662 [pdf, other]

doi 10.1103/PhysRevB.109.155114

Layer Hall counterflow as a model probe of magic-angle twisted bilayer graphene

Authors: Jihang Zhu, Dawei Zhai, Cong Xiao, Wang Yao

Abstract: The recent constructions of flat moiré minibands in specifically twisted multilayer graphene and twisted transition metal dichalcogenides (TMDs) have facilitated the observation of strong correlations with a convenient tunability. These correlations in flat bands result in the band dispersion heavily influenced by carrier densities, leading to filling-dependent quasiparticle band renormalizations.… ▽ More The recent constructions of flat moiré minibands in specifically twisted multilayer graphene and twisted transition metal dichalcogenides (TMDs) have facilitated the observation of strong correlations with a convenient tunability. These correlations in flat bands result in the band dispersion heavily influenced by carrier densities, leading to filling-dependent quasiparticle band renormalizations. Particularly, in magic-angle twisted bilayer graphene (MATBG), the band structure--including the quasiparticle energy and wavefunction--is crucial in understanding the correlated properties. Previous theoretical studies have demonstrated the presence of a time-reversal-even charge Hall counterflow in response to a direct current (DC) electric field in twisted bilayers as chiral structures. In this study, we show that such layer Hall counterflow can serve as a sensitive probe for MATBG model parameters, which are currently ambiguous as a result of unavoidable structural relaxation and twist-angle disorder. We present the layer Hall counterflow and the associated in-plane magnetization for three different MATBG continuum models, based on which many-body interacting models have been widely applied to study strong correlations in MATBG. At the single-particle level, our findings indicate notable differences in layer-projected Hall conductivity, both in magnitude and sign, between different MATBG continuum models. Furthermore, our self-consistent Hartree calculations, performed on each of these single-particle continuum models, reveal renormalized layer-projected Hall conductivity by the self-consistent Hartree field. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Journal ref: Phys. Rev. B 109, 155114 (2024)

arXiv:2312.06686 [pdf, other]

Robo360: A 3D Omnispective Multi-Material Robotic Manipulation Dataset

Authors: Litian Liang, Liuyu Bian, Caiwei Xiao, Jialin Zhang, Linghao Chen, Isabella Liu, Fanbo Xiang, Zhiao Huang, Hao Su

Abstract: Building robots that can automate labor-intensive tasks has long been the core motivation behind the advancements in computer vision and the robotics community. Recent interest in leveraging 3D algorithms, particularly neural fields, has led to advancements in robot perception and physical understanding in manipulation scenarios. However, the real world's complexity poses significant challenges. T… ▽ More Building robots that can automate labor-intensive tasks has long been the core motivation behind the advancements in computer vision and the robotics community. Recent interest in leveraging 3D algorithms, particularly neural fields, has led to advancements in robot perception and physical understanding in manipulation scenarios. However, the real world's complexity poses significant challenges. To tackle these challenges, we present Robo360, a dataset that features robotic manipulation with a dense view coverage, which enables high-quality 3D neural representation learning, and a diverse set of objects with various physical and optical properties and facilitates research in various object manipulation and physical world modeling tasks. We confirm the effectiveness of our dataset using existing dynamic NeRF and evaluate its potential in learning multi-view policies. We hope that Robo360 can open new research directions yet to be explored at the intersection of understanding the physical world in 3D and robot control. △ Less

Submitted 9 December, 2023; originally announced December 2023.

arXiv:2312.05964 [pdf, other]

ConSequence: Synthesizing Logically Constrained Sequences for Electronic Health Record Generation

Authors: Brandon Theodorou, Shrusti Jain, Cao Xiao, Jimeng Sun

Abstract: Generative models can produce synthetic patient records for analytical tasks when real data is unavailable or limited. However, current methods struggle with adhering to domain-specific knowledge and removing invalid data. We present ConSequence, an effective approach to integrating domain knowledge into sequential generative neural network outputs. Our rule-based formulation includes temporal agg… ▽ More Generative models can produce synthetic patient records for analytical tasks when real data is unavailable or limited. However, current methods struggle with adhering to domain-specific knowledge and removing invalid data. We present ConSequence, an effective approach to integrating domain knowledge into sequential generative neural network outputs. Our rule-based formulation includes temporal aggregation and antecedent evaluation modules, ensured by an efficient matrix multiplication formulation, to satisfy hard and soft logical constraints across time steps. Existing constraint methods often fail to guarantee constraint satisfaction, lack the ability to handle temporal constraints, and hinder the learning and computational efficiency of the model. In contrast, our approach efficiently handles all types of constraints with guaranteed logical coherence. We demonstrate ConSequence's effectiveness in generating electronic health records, outperforming competitors in achieving complete temporal and spatial constraint satisfaction without compromising runtime performance or generative quality. Specifically, ConSequence successfully prevents all rule violations while improving the model quality in reducing its test perplexity by 5% and incurring less than a 13% slowdown in generation speed compared to an unconstrained model. △ Less

Submitted 20 December, 2023; v1 submitted 10 December, 2023; originally announced December 2023.

arXiv:2312.05474 [pdf, ps, other]

The duals of narrow-sense BCH codes with length $\frac{q^m-1}λ$

Authors: Xiaoqiang Wang, Chengliang Xiao, Dabin Zheng

Abstract: BCH codes are an interesting class of cyclic codes due to their efficient encoding and decoding algorithms. In the past sixty years, a lot of progress on the study of BCH codes has been made, but little is known about the properties of their duals. Recently, in order to study the duals of BCH codes and the lower bounds on their minimum distances, a new concept called dually-BCH code was proposed b… ▽ More BCH codes are an interesting class of cyclic codes due to their efficient encoding and decoding algorithms. In the past sixty years, a lot of progress on the study of BCH codes has been made, but little is known about the properties of their duals. Recently, in order to study the duals of BCH codes and the lower bounds on their minimum distances, a new concept called dually-BCH code was proposed by authors in \cite{GDL21}. In this paper, the lower bounds on the minimum distances of the duals of narrow-sense BCH codes with length $\frac{q^m-1}λ$ over $\mathbb{F}_q$ are developed, where $λ$ is a positive integer satisfying $λ\, |\, q-1$, or $λ=q^s-1$ and $s\, |\,m$. In addition, the sufficient and necessary conditions in terms of the designed distances for these codes being dually-BCH codes are presented. Many considered codes in \cite{GDL21} and \cite{Wang23} are the special cases of the codes showed in this paper. Our lower bounds on the minimum distances of the duals of BCH codes include the bounds stated in \cite{GDL21} as a special case. Several examples show that the lower bounds are good in some cases. △ Less

Submitted 9 December, 2023; originally announced December 2023.

arXiv:2312.05275 [pdf, other]

Exploring the Limits of ChatGPT in Software Security Applications

Authors: Fangzhou Wu, Qingzhao Zhang, Ati Priya Bajaj, Tiffany Bao, Ning Zhang, Ruoyu "Fish" Wang, Chaowei Xiao

Abstract: Large language models (LLMs) have undergone rapid evolution and achieved remarkable results in recent times. OpenAI's ChatGPT, backed by GPT-3.5 or GPT-4, has gained instant popularity due to its strong capability across a wide range of tasks, including natural language tasks, coding, mathematics, and engaging conversations. However, the impacts and limits of such LLMs in system security domain ar… ▽ More Large language models (LLMs) have undergone rapid evolution and achieved remarkable results in recent times. OpenAI's ChatGPT, backed by GPT-3.5 or GPT-4, has gained instant popularity due to its strong capability across a wide range of tasks, including natural language tasks, coding, mathematics, and engaging conversations. However, the impacts and limits of such LLMs in system security domain are less explored. In this paper, we delve into the limits of LLMs (i.e., ChatGPT) in seven software security applications including vulnerability detection/repair, debugging, debloating, decompilation, patching, root cause analysis, symbolic execution, and fuzzing. Our exploration reveals that ChatGPT not only excels at generating code, which is the conventional application of language models, but also demonstrates strong capability in understanding user-provided commands in natural languages, reasoning about control and data flows within programs, generating complex data structures, and even decompiling assembly code. Notably, GPT-4 showcases significant improvements over GPT-3.5 in most security tasks. Also, certain limitations of ChatGPT in security-related tasks are identified, such as its constrained ability to process long code contexts. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2312.04730 [pdf, other]

DeceptPrompt: Exploiting LLM-driven Code Generation via Adversarial Natural Language Instructions

Authors: Fangzhou Wu, Xiaogeng Liu, Chaowei Xiao

Abstract: With the advancement of Large Language Models (LLMs), significant progress has been made in code generation, enabling LLMs to transform natural language into programming code. These Code LLMs have been widely accepted by massive users and organizations. However, a dangerous nature is hidden in the code, which is the existence of fatal vulnerabilities. While some LLM providers have attempted to add… ▽ More With the advancement of Large Language Models (LLMs), significant progress has been made in code generation, enabling LLMs to transform natural language into programming code. These Code LLMs have been widely accepted by massive users and organizations. However, a dangerous nature is hidden in the code, which is the existence of fatal vulnerabilities. While some LLM providers have attempted to address these issues by aligning with human guidance, these efforts fall short of making Code LLMs practical and robust. Without a deep understanding of the performance of the LLMs under the practical worst cases, it would be concerning to apply them to various real-world applications. In this paper, we answer the critical issue: Are existing Code LLMs immune to generating vulnerable code? If not, what is the possible maximum severity of this issue in practical deployment scenarios? In this paper, we introduce DeceptPrompt, a novel algorithm that can generate adversarial natural language instructions that drive the Code LLMs to generate functionality correct code with vulnerabilities. DeceptPrompt is achieved through a systematic evolution-based algorithm with a fine grain loss design. The unique advantage of DeceptPrompt enables us to find natural prefix/suffix with totally benign and non-directional semantic meaning, meanwhile, having great power in inducing the Code LLMs to generate vulnerable code. This feature can enable us to conduct the almost-worstcase red-teaming on these LLMs in a real scenario, where users are using natural language. Our extensive experiments and analyses on DeceptPrompt not only validate the effectiveness of our approach but also shed light on the huge weakness of LLMs in the code generation task. When applying the optimized prefix/suffix, the attack success rate (ASR) will improve by average 50% compared with no prefix/suffix applying. △ Less

Submitted 12 December, 2023; v1 submitted 7 December, 2023; originally announced December 2023.

arXiv:2312.03222 [pdf, other]

Predicting Scores of Various Aesthetic Attribute Sets by Learning from Overall Score Labels

Authors: Heng Huang, Xin **, Yaqi Liu, Hao Lou, Chaoen Xiao, Shuai Cui, Xinning Li, Dongqing Zou

Abstract: Now many mobile phones embed deep-learning models for evaluation or guidance on photography. These models cannot provide detailed results like human pose scores or scene color scores because of the rare of corresponding aesthetic attribute data. However, the annotation of image aesthetic attribute scores requires experienced artists and professional photographers, which hinders the collection of l… ▽ More Now many mobile phones embed deep-learning models for evaluation or guidance on photography. These models cannot provide detailed results like human pose scores or scene color scores because of the rare of corresponding aesthetic attribute data. However, the annotation of image aesthetic attribute scores requires experienced artists and professional photographers, which hinders the collection of large-scale fully-annotated datasets. In this paper, we propose to replace image attribute labels with feature extractors. First, a novel aesthetic attribute evaluation framework based on attribute features is proposed to predict attribute scores and overall scores. We call it the F2S (attribute features to attribute scores) model. We use networks from different tasks to provide attribute features to our F2S models. Then, we define an aesthetic attribute contribution to describe the role of aesthetic attributes throughout an image and use it with the attribute scores and the overall scores to train our F2S model. Sufficient experiments on publicly available datasets demonstrate that our F2S model achieves comparable performance with those trained on the datasets with fully-annotated aesthetic attribute score labels. Our method makes it feasible to learn meaningful attribute scores for various aesthetic attribute sets in different types of images with only overall aesthetic scores. △ Less

Submitted 5 December, 2023; originally announced December 2023.

arXiv:2312.01678 [pdf, other]

Jellyfish: A Large Language Model for Data Preprocessing

Authors: Haochen Zhang, Yuyang Dong, Chuan Xiao, Masafumi Oyamada

Abstract: This paper explores the utilization of LLMs for data preprocessing (DP), a crucial step in the data mining pipeline that transforms raw data into a clean format conducive to easy processing. Whereas the use of LLMs has sparked interest in devising universal solutions to DP, recent initiatives in this domain typically rely on GPT APIs, raising inevitable data breach concerns. Unlike these approache… ▽ More This paper explores the utilization of LLMs for data preprocessing (DP), a crucial step in the data mining pipeline that transforms raw data into a clean format conducive to easy processing. Whereas the use of LLMs has sparked interest in devising universal solutions to DP, recent initiatives in this domain typically rely on GPT APIs, raising inevitable data breach concerns. Unlike these approaches, we consider instruction-tuning local LLMs (7 -- 13B models) as universal DP task solvers that operate on a local, single, and low-priced GPU, ensuring data security and enabling further customization. We select a collection of datasets across four representative DP tasks and construct instruction tuning data using data configuration, knowledge injection, and reasoning data distillation techniques tailored to DP. By tuning Mistral-7B, Llama 3-8B, and OpenOrca-Platypus2-13B, our models, namely, Jellyfish-7B/8B/13B, deliver competitiveness compared to GPT-3.5/4 models and strong generalizability to unseen tasks while barely compromising the base models' abilities in NLP tasks. Meanwhile, Jellyfish offers enhanced reasoning capabilities compared to GPT-3.5. Our models are available at: https://huggingface.co/NECOUDBFM/Jellyfish . Our instruction dataset is available at: https://huggingface.co/datasets/NECOUDBFM/Jellyfish-Instruct . △ Less

Submitted 21 June, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

Comments: a.k.a. "Jellyfish: Instruction-Tuning Local Large Language Models for Data Preprocessing''

arXiv:2312.00438 [pdf, other]

Dolphins: Multimodal Language Model for Driving

Authors: Yingzi Ma, Yulong Cao, Jiachen Sun, Marco Pavone, Chaowei Xiao

Abstract: The quest for fully autonomous vehicles (AVs) capable of navigating complex real-world scenarios with human-like understanding and responsiveness. In this paper, we introduce Dolphins, a novel vision-language model architected to imbibe human-like abilities as a conversational driving assistant. Dolphins is adept at processing multimodal inputs comprising video (or image) data, text instructions,… ▽ More The quest for fully autonomous vehicles (AVs) capable of navigating complex real-world scenarios with human-like understanding and responsiveness. In this paper, we introduce Dolphins, a novel vision-language model architected to imbibe human-like abilities as a conversational driving assistant. Dolphins is adept at processing multimodal inputs comprising video (or image) data, text instructions, and historical control signals to generate informed outputs corresponding to the provided instructions. Building upon the open-sourced pretrained Vision-Language Model, OpenFlamingo, we first enhance Dolphins's reasoning capabilities through an innovative Grounded Chain of Thought (GCoT) process. Then we tailored Dolphins to the driving domain by constructing driving-specific instruction data and conducting instruction tuning. Through the utilization of the BDD-X dataset, we designed and consolidated four distinct AV tasks into Dolphins to foster a holistic understanding of intricate driving scenarios. As a result, the distinctive features of Dolphins are characterized into two dimensions: (1) the ability to provide a comprehensive understanding of complex and long-tailed open-world driving scenarios and solve a spectrum of AV tasks, and (2) the emergence of human-like capabilities including gradient-free instant adaptation via in-context learning and error recovery via reflection. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: The project page is available at https://vlm-driver.github.io/

arXiv:2312.00035 [pdf, other]

FBChain: A Blockchain-based Federated Learning Model with Efficiency and Secure Communication

Authors: Yang Li, Chunhe Xia, Wei Liu, Weidong Zhou, Chen Chen, Tianbo Wang

Abstract: Privacy and security in the parameter transmission process of federated learning are currently among the most prominent concerns. However, there are two thorny problems caused by unprotected communication methods: "parameter-leakage" and "inefficient-communication". This article proposes Blockchain-based Federated Learning (FBChain) model for federated learning parameter communication to overcome… ▽ More Privacy and security in the parameter transmission process of federated learning are currently among the most prominent concerns. However, there are two thorny problems caused by unprotected communication methods: "parameter-leakage" and "inefficient-communication". This article proposes Blockchain-based Federated Learning (FBChain) model for federated learning parameter communication to overcome the above two problems. First, we utilize the immutability of blockchain to store the global model and hash value of local model parameters in case of tampering during the communication process, protect data privacy by encrypting parameters, and verify data consistency by comparing the hash values of local parameters, thus addressing the "parameter-leakage" problem. Second, the Proof of Weighted Link Speed (PoWLS) consensus algorithm comprehensively selects nodes with the higher weighted link speed to aggregate global model and package blocks, thereby solving the "inefficient-communication" problem. Experimental results demonstrate the effectiveness of our proposed FBChain model and its ability to improve model communication efficiency in federated learning. △ Less

Submitted 20 November, 2023; originally announced December 2023.

arXiv:2311.18585 [pdf, ps, other]

doi 10.1007/s00526-024-02733-5

Rigidity and quantitative stability for partially overdetermined problems and capillary CMC hypersurfaces

Authors: Xiaohan Jia, Zheng Lu, Chao Xia, Xuwen Zhang

Abstract: In this paper, we first prove a rigidity result for a Serrin-type partially overdetermined problem in the half-space, which gives a characterization of capillary spherical caps by the overdetermined problem. In the second part, we prove quantitative stability results for the Serrin-type partially overdetermined problem, as well as capillary almost constant mean curvature hypersurfaces in the half-… ▽ More In this paper, we first prove a rigidity result for a Serrin-type partially overdetermined problem in the half-space, which gives a characterization of capillary spherical caps by the overdetermined problem. In the second part, we prove quantitative stability results for the Serrin-type partially overdetermined problem, as well as capillary almost constant mean curvature hypersurfaces in the half-space. △ Less

Submitted 30 November, 2023; originally announced November 2023.

Journal ref: Calc. Var. Partial Differential Equations 63 (2024), no.5, Paper No. 125, 24 pp

arXiv:2311.18581 [pdf, ps, other]

A characterization of capillary spherical caps by a partially overdetermined problem in a half ball

Authors: Xiaohan Jia, Zheng Lu, Chao Xia, Xuwen Zhang

Abstract: In this note, we study a Serrin-type partially overdetermined problem proposed by Guo-Xia (Calc. Var. Partial Differential Equations 58: no. 160, 2019. https://doi.org/10.1007/s00526-019-1603-3, and prove a rigidity result that characterizes capillary spherical caps in a half ball. In this note, we study a Serrin-type partially overdetermined problem proposed by Guo-Xia (Calc. Var. Partial Differential Equations 58: no. 160, 2019. https://doi.org/10.1007/s00526-019-1603-3, and prove a rigidity result that characterizes capillary spherical caps in a half ball. △ Less

Submitted 30 November, 2023; originally announced November 2023.

arXiv:2311.17736 [pdf, other]

Molecular-type $QQss\bar{s}$ pentaquarks predicted by an extended hidden gauge symmetry approach

Authors: Zhong-Yu Wang, Chu-Wen Xiao, Zhi-Feng Sun, Xiang Liu

Abstract: In this work, we investigate the double-heavy molecular pentaquark states with the quark contents $ccss\bar{s}$, $bbss\bar{s}$, and $bcss\bar{s}$ by using the coupled channel approach. The extended local hidden gauge Lagrangians are used to obtain the meson-baryon interactions by exchanging the vector mesons. We predict some candidates for the molecular states with the quantum numbers… ▽ More In this work, we investigate the double-heavy molecular pentaquark states with the quark contents $ccss\bar{s}$, $bbss\bar{s}$, and $bcss\bar{s}$ by using the coupled channel approach. The extended local hidden gauge Lagrangians are used to obtain the meson-baryon interactions by exchanging the vector mesons. We predict some candidates for the molecular states with the quantum numbers $I(J^{P}) = 0(1/2^{-}, 3/2^{-}, 5/2^{-})$, whose binding energies are of the order of $20-30$ MeV and whose widths are all less than $8$ MeV. These predicted exotic double-heavy molecular pentaquark states may be accessible in future experiments such as LHCb. △ Less

Submitted 8 February, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

Comments: Minor corrections are made

arXiv:2311.16173 [pdf, other]

Conditions for Length Generalization in Learning Reasoning Skills

Authors: Changnan Xiao, Bing Liu

Abstract: Reasoning is a fundamental capability of AI agents. Recently, large language models (LLMs) have shown remarkable abilities to perform reasoning tasks. However, numerous evaluations of the reasoning capabilities of LLMs have also showed some limitations. An outstanding limitation is length generalization, meaning that when trained on reasoning problems of smaller lengths or sizes, the resulting mod… ▽ More Reasoning is a fundamental capability of AI agents. Recently, large language models (LLMs) have shown remarkable abilities to perform reasoning tasks. However, numerous evaluations of the reasoning capabilities of LLMs have also showed some limitations. An outstanding limitation is length generalization, meaning that when trained on reasoning problems of smaller lengths or sizes, the resulting models struggle with problems of larger sizes or lengths. This potentially indicates some theoretical limitations of generalization in learning reasoning skills. These evaluations and their observations motivated us to perform a theoretical study of the length generalization problem. This work focuses on reasoning tasks that can be formulated as Markov dynamic processes (MDPs) and/or directed acyclic graphs (DAGs). It identifies and proves conditions that decide whether the length generalization problem can be solved or not for a reasoning task in a particular representation. Experiments are also conducted to verify the theoretical results. △ Less

Submitted 6 December, 2023; v1 submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.13947 [pdf]

High-Ratio Compression for Machine-Generated Data

Authors: Jiu**g Zhang, Zhitao Shen, Shiyu Yang, Lingkai Meng, Chuan Xiao, Wei Jia, Yue Li, Qinhui Sun, Wenjie Zhang, Xuemin Lin

Abstract: Machine-generated data is rapidly growing and poses challenges for data-intensive systems, especially as the growth of data outpaces the growth of storage space. To cope with the storage issue, compression plays a critical role in storage engines, particularly for data-intensive applications, where high compression ratios and efficient random access are essential. However, existing compression tec… ▽ More Machine-generated data is rapidly growing and poses challenges for data-intensive systems, especially as the growth of data outpaces the growth of storage space. To cope with the storage issue, compression plays a critical role in storage engines, particularly for data-intensive applications, where high compression ratios and efficient random access are essential. However, existing compression techniques tend to focus on general-purpose and data block approaches, but overlook the inherent structure of machine-generated data and hence result in low-compression ratios or limited lookup efficiency. To address these limitations, we introduce the Pattern-Based Compression (PBC) algorithm, which specifically targets patterns in machine-generated data to achieve Pareto-optimality in most cases. Unlike traditional data block-based methods, PBC compresses data on a per-record basis, facilitating rapid random access. Our experimental evaluation demonstrates that PBC, on average, achieves a compression ratio twice as high as state-of-the-art techniques while maintaining competitive compression and decompression speeds.We also integrate PBC to a production database system and achieve improvement on both comparison ratio and throughput. △ Less

Submitted 23 November, 2023; originally announced November 2023.

arXiv:2311.13263 [pdf, other]

CMFDFormer: Transformer-based Copy-Move Forgery Detection with Continual Learning

Authors: Yaqi Liu, Chao Xia, Song Xiao, Qingxiao Guan, Wenqian Dong, Yifan Zhang, Nenghai Yu

Abstract: Copy-move forgery detection aims at detecting duplicated regions in a suspected forged image, and deep learning based copy-move forgery detection methods are in the ascendant. These deep learning based methods heavily rely on synthetic training data, and the performance will degrade when facing new tasks. In this paper, we propose a Transformer-style copy-move forgery detection network named as CM… ▽ More Copy-move forgery detection aims at detecting duplicated regions in a suspected forged image, and deep learning based copy-move forgery detection methods are in the ascendant. These deep learning based methods heavily rely on synthetic training data, and the performance will degrade when facing new tasks. In this paper, we propose a Transformer-style copy-move forgery detection network named as CMFDFormer, and provide a novel PCSD (Pooled Cube and Strip Distillation) continual learning framework to help CMFDFormer handle new tasks. CMFDFormer consists of a MiT (Mix Transformer) backbone network and a PHD (Pluggable Hybrid Decoder) mask prediction network. The MiT backbone network is a Transformer-style network which is adopted on the basis of comprehensive analyses with CNN-style and MLP-style backbones. The PHD network is constructed based on self-correlation computation, hierarchical feature integration, a multi-scale cycle fully-connected block and a mask reconstruction block. The PHD network is applicable to feature extractors of different styles for hierarchical multi-scale information extraction, achieving comparable performance. Last but not least, we propose a PCSD continual learning framework to improve the forgery detectability and avoid catastrophic forgetting when handling new tasks. Our continual learning framework restricts intermediate features from the PHD network, and takes advantage of both cube pooling and strip pooling. Extensive experiments on publicly available datasets demonstrate the good performance of CMFDFormer and the effectiveness of the PCSD continual learning framework. △ Less

Submitted 10 March, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

Comments: 12pages,7 figures

arXiv:2311.12244 [pdf, other]

Provable Representation with Efficient Planning for Partial Observable Reinforcement Learning

Authors: Hongming Zhang, Tongzheng Ren, Chenjun Xiao, Dale Schuurmans, Bo Dai

Abstract: In most real-world reinforcement learning applications, state information is only partially observable, which breaks the Markov decision process assumption and leads to inferior performance for algorithms that conflate observations with state. Partially Observable Markov Decision Processes (POMDPs), on the other hand, provide a general framework that allows for partial observability to be accounte… ▽ More In most real-world reinforcement learning applications, state information is only partially observable, which breaks the Markov decision process assumption and leads to inferior performance for algorithms that conflate observations with state. Partially Observable Markov Decision Processes (POMDPs), on the other hand, provide a general framework that allows for partial observability to be accounted for in learning, exploration and planning, but presents significant computational and statistical challenges. To address these difficulties, we develop a representation-based perspective that leads to a coherent framework and tractable algorithmic approach for practical reinforcement learning from partial observations. We provide a theoretical analysis for justifying the statistical efficiency of the proposed algorithm, and also empirically demonstrate the proposed algorithm can surpass state-of-the-art performance with partial observations across various benchmarks, advancing reliable reinforcement learning towards more practical applications. △ Less

Submitted 10 June, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

Comments: The first two authors contribute equally

arXiv:2311.11710 [pdf, other]

Interlayer electric multipoles induced by in-plane field from quantum geometric origins

Authors: Huiyuan Zheng, Dawei Zhai, Cong Xiao, Wang Yao

Abstract: We show that interlayer charge transfer in 2D materials can be driven by an in-plane electric field, giving rise to electrical multipole generation in linear and second order of in-plane field. The linear and nonlinear effects have quantum geometric origins in the Berry curvature and quantum metric respectively, defined in extended parameter spaces characteristic of layered materials. We elucidate… ▽ More We show that interlayer charge transfer in 2D materials can be driven by an in-plane electric field, giving rise to electrical multipole generation in linear and second order of in-plane field. The linear and nonlinear effects have quantum geometric origins in the Berry curvature and quantum metric respectively, defined in extended parameter spaces characteristic of layered materials. We elucidate their symmetry characters, and demonstrate sizable dipole and quadrupole polarizations respectively in twisted bilayers and trilayers of transition metal dichalcogenides. Furthermore, we show that the effect is strongly enhanced during the topological phase transition tuned by interlayer translation. The effects point to a new electric control on layer quantum degree of freedom. △ Less

Submitted 20 November, 2023; originally announced November 2023.

Comments: 13 pages, 4 figures

arXiv:2311.11333 [pdf, other]

Stable $(r+1)$-th capillary hypersurfaces

Authors: **yu Guo, Haizhong Li, Chao Xia

Abstract: In this paper, we propose a new definition of stable $(r+1)$-th capillary hypersurfaces from variational perspective for any $1\leq r\leq n-1$. More precisely, we define stable $(r+1)$-th capillary hypersurfaces to be smooth local minimizers of a new energy functional under volume-preserving and contact angle-preserving variations. Using the new concept of the stable $(r+1)$-th capillary hypersurf… ▽ More In this paper, we propose a new definition of stable $(r+1)$-th capillary hypersurfaces from variational perspective for any $1\leq r\leq n-1$. More precisely, we define stable $(r+1)$-th capillary hypersurfaces to be smooth local minimizers of a new energy functional under volume-preserving and contact angle-preserving variations. Using the new concept of the stable $(r+1)$-th capillary hypersurfaces, we generalize the stability results of Souam \cite{Souam} in an Euclidean half-space and Guo-Wang-Xia \cite{GWX} in a horoball in hyperbolic space for capillary hypersurface to $(r+1)$-th capillary hypersurface case. △ Less

Submitted 19 November, 2023; originally announced November 2023.

Comments: 31 pages,2 figures

arXiv:2311.09827 [pdf, other]

Cognitive Overload: Jailbreaking Large Language Models with Overloaded Logical Thinking

Authors: Nan Xu, Fei Wang, Ben Zhou, Bang Zheng Li, Chaowei Xiao, Muhao Chen

Abstract: While large language models (LLMs) have demonstrated increasing power, they have also given rise to a wide range of harmful behaviors. As representatives, jailbreak attacks can provoke harmful or unethical responses from LLMs, even after safety alignment. In this paper, we investigate a novel category of jailbreak attacks specifically designed to target the cognitive structure and processes of LLM… ▽ More While large language models (LLMs) have demonstrated increasing power, they have also given rise to a wide range of harmful behaviors. As representatives, jailbreak attacks can provoke harmful or unethical responses from LLMs, even after safety alignment. In this paper, we investigate a novel category of jailbreak attacks specifically designed to target the cognitive structure and processes of LLMs. Specifically, we analyze the safety vulnerability of LLMs in the face of (1) multilingual cognitive overload, (2) veiled expression, and (3) effect-to-cause reasoning. Different from previous jailbreak attacks, our proposed cognitive overload is a black-box attack with no need for knowledge of model architecture or access to model weights. Experiments conducted on AdvBench and MasterKey reveal that various LLMs, including both popular open-source model Llama 2 and the proprietary model ChatGPT, can be compromised through cognitive overload. Motivated by cognitive psychology work on managing cognitive load, we further investigate defending cognitive overload attack from two perspectives. Empirical studies show that our cognitive overload from three perspectives can jailbreak all studied LLMs successfully, while existing defense strategies can hardly mitigate the caused malicious uses effectively. △ Less

Submitted 29 February, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

arXiv:2311.09763 [pdf, other]

Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations

Authors: Wenjie Mo, Jiashu Xu, Qin Liu, Jiongxiao Wang, Jun Yan, Chaowei Xiao, Muhao Chen

Abstract: Existing studies in backdoor defense have predominantly focused on the training phase, overlooking the critical aspect of testing time defense. This gap becomes particularly pronounced in the context of Large Language Models (LLMs) deployed as Web Services, which typically offer only black-box access, rendering training-time defenses impractical. To bridge this gap, our work introduces defensive d… ▽ More Existing studies in backdoor defense have predominantly focused on the training phase, overlooking the critical aspect of testing time defense. This gap becomes particularly pronounced in the context of Large Language Models (LLMs) deployed as Web Services, which typically offer only black-box access, rendering training-time defenses impractical. To bridge this gap, our work introduces defensive demonstrations, an innovative backdoor defense strategy for blackbox large language models. Our method involves identifying the task and retrieving task-relevant demonstrations from an uncontaminated pool. These demonstrations are then combined with user queries and presented to the model during testing, without requiring any modifications/tuning to the black-box model or insights into its internal mechanisms. Defensive demonstrations are designed to counteract the adverse effects of triggers, aiming to recalibrate and correct the behavior of poisoned models during test-time evaluations. Extensive experiments show that defensive demonstrations are effective in defending both instance-level and instruction-level backdoor attacks, not only rectifying the behavior of poisoned models but also surpassing existing baselines in most scenarios. △ Less

Submitted 16 November, 2023; originally announced November 2023.

arXiv:2311.09641 [pdf, other]

RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with Human Feedback in Large Language Models

Authors: Jiongxiao Wang, Junlin Wu, Muhao Chen, Yevgeniy Vorobeychik, Chaowei Xiao

Abstract: Reinforcement Learning with Human Feedback (RLHF) is a methodology designed to align Large Language Models (LLMs) with human preferences, playing an important role in LLMs alignment. Despite its advantages, RLHF relies on human annotators to rank the text, which can introduce potential security vulnerabilities if any adversarial annotator (i.e., attackers) manipulates the ranking score by up-ranki… ▽ More Reinforcement Learning with Human Feedback (RLHF) is a methodology designed to align Large Language Models (LLMs) with human preferences, playing an important role in LLMs alignment. Despite its advantages, RLHF relies on human annotators to rank the text, which can introduce potential security vulnerabilities if any adversarial annotator (i.e., attackers) manipulates the ranking score by up-ranking any malicious text to steer the LLM adversarially. To assess the red-teaming of RLHF against human preference data poisoning, we propose RankPoison, a poisoning attack method on candidates' selection of preference rank flip** to reach certain malicious behaviors (e.g., generating longer sequences, which can increase the computational cost). With poisoned dataset generated by RankPoison, we can perform poisoning attacks on LLMs to generate longer tokens without hurting the original safety alignment performance. Moreover, applying RankPoison, we also successfully implement a backdoor attack where LLMs can generate longer answers under questions with the trigger word. Our findings highlight critical security challenges in RLHF, underscoring the necessity for more robust alignment methods for LLMs. △ Less

Submitted 19 June, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

arXiv:2311.07316 [pdf, other]

Role of crystal-field-splitting and longe-range-hop**s on superconducting pairing symmetry of La$_3$Ni$_2$O$_7$

Authors: Hongquan Liu, Chengliang Xia, Shengjie Zhou, Hanghui Chen

Abstract: We study the bilayer two-orbital model for superconducting pairing symmetry of La$_3$Ni$_2$O$_7$ under pressure. By combining density-functional-theory (DFT), maximally-localized-Wannier-function, and linearized Eliashberg equation with random-phase-approximation, we find that the superconducting pairing symmetry of La$_3$Ni$_2$O$_7$ is robustly $d_{xy}$ if its DFT band structure is accurately rep… ▽ More We study the bilayer two-orbital model for superconducting pairing symmetry of La$_3$Ni$_2$O$_7$ under pressure. By combining density-functional-theory (DFT), maximally-localized-Wannier-function, and linearized Eliashberg equation with random-phase-approximation, we find that the superconducting pairing symmetry of La$_3$Ni$_2$O$_7$ is robustly $d_{xy}$ if its DFT band structure is accurately reproduced in the downfolded model. We further show that fine-tuning of crystal-field-splitting between two Ni-$e_g$ orbitals qualitatively affects superconducting pairing symmetry of the bilayer two-orbital model, which changes from $d_{xy}$ to $s_{\pm}$ as the crystal-field-splitting exceeds a critical value. When the model only includes nearest-neighbor and second-nearest-neighbor hop**s, the crystal-field-splitting obtained by fitting to the DFT band structure is larger than the critical value and thus leads to $s_{\pm}$ superconducting pairing symmetry. When all nonzero long-range-hop**s are also included in the model, the fitted crystal-field-splitting is reduced and smaller than the critical value, which makes $d_{xy}$ superconducting pairing symmetry more favorable than $s_{\pm}$ symmetry. Our work demonstrates that in downfolded effective models, the details of band structure can play a crucial role in determining pairing symmetry in multi-orbital unconventional superconductors (such as La$_3$Ni$_2$O$_7$). △ Less

Submitted 13 November, 2023; originally announced November 2023.

Comments: 11 pages and 4 figures

arXiv:2311.06517 [pdf, other]

BClean: A Bayesian Data Cleaning System

Authors: Jianbin Qin, Sifan Huang, Yaoshu Wang, **g Zhu, Yifan Zhang, Yukai Miao, Rui Mao, Makoto Onizuka, Chuan Xiao

Abstract: There is a considerable body of work on data cleaning which employs various principles to rectify erroneous data and transform a dirty dataset into a cleaner one. One of prevalent approaches is probabilistic methods, including Bayesian methods. However, existing probabilistic methods often assume a simplistic distribution (e.g., Gaussian distribution), which is frequently underfitted in practice,… ▽ More There is a considerable body of work on data cleaning which employs various principles to rectify erroneous data and transform a dirty dataset into a cleaner one. One of prevalent approaches is probabilistic methods, including Bayesian methods. However, existing probabilistic methods often assume a simplistic distribution (e.g., Gaussian distribution), which is frequently underfitted in practice, or they necessitate experts to provide a complex prior distribution (e.g., via a programming language). This requirement is both labor-intensive and costly, rendering these methods less suitable for real-world applications. In this paper, we propose BClean, a Bayesian Cleaning system that features automatic Bayesian network construction and user interaction. We recast the data cleaning problem as a Bayesian inference that fully exploits the relationships between attributes in the observed dataset and any prior information provided by users. To this end, we present an automatic Bayesian network construction method that extends a structure learning-based functional dependency discovery method with similarity functions to capture the relationships between attributes. Furthermore, our system allows users to modify the generated Bayesian network in order to specify prior information or correct inaccuracies identified by the automatic generation process. We also design an effective scoring model (called the compensative scoring model) necessary for the Bayesian inference. To enhance the efficiency of data cleaning, we propose several approximation strategies for the Bayesian inference, including graph partitioning, domain pruning, and pre-detection. By evaluating on both real-world and synthetic datasets, we demonstrate that BClean is capable of achieving an F-measure of up to 0.9 in data cleaning, outperforming existing Bayesian methods by 2% and other data cleaning methods by 15%. △ Less

Submitted 11 November, 2023; originally announced November 2023.

Comments: Our source code is available at https://github.com/yyssl88/BClean

arXiv:2311.06330 [pdf, other]

Smart Agent-Based Modeling: On the Use of Large Language Models in Computer Simulations

Authors: Zengqing Wu, Run Peng, Xu Han, Shuyuan Zheng, Yixin Zhang, Chuan Xiao

Abstract: Computer simulations offer a robust toolset for exploring complex systems across various disciplines. A particularly impactful approach within this realm is Agent-Based Modeling (ABM), which harnesses the interactions of individual agents to emulate intricate system dynamics. ABM's strength lies in its bottom-up methodology, illuminating emergent phenomena by modeling the behaviors of individual c… ▽ More Computer simulations offer a robust toolset for exploring complex systems across various disciplines. A particularly impactful approach within this realm is Agent-Based Modeling (ABM), which harnesses the interactions of individual agents to emulate intricate system dynamics. ABM's strength lies in its bottom-up methodology, illuminating emergent phenomena by modeling the behaviors of individual components of a system. Yet, ABM has its own set of challenges, notably its struggle with modeling natural language instructions and common sense in mathematical equations or rules. This paper seeks to transcend these boundaries by integrating Large Language Models (LLMs) like GPT into ABM. This amalgamation gives birth to a novel framework, Smart Agent-Based Modeling (SABM). Building upon the concept of smart agents -- entities characterized by their intelligence, adaptability, and computation ability -- we explore in the direction of utilizing LLM-powered agents to simulate real-world scenarios with increased nuance and realism. In this comprehensive exploration, we elucidate the state of the art of ABM, introduce SABM's potential and methodology, and present three case studies (source codes available at https://github.com/Roihn/SABM), demonstrating the SABM methodology and validating its effectiveness in modeling real-world systems. Furthermore, we cast a vision towards several aspects of the future of SABM, anticipating a broader horizon for its applications. Through this endeavor, we aspire to redefine the boundaries of computer simulations, enabling a more profound understanding of complex systems. △ Less

Submitted 14 December, 2023; v1 submitted 10 November, 2023; originally announced November 2023.

Comments: Source codes are available at https://github.com/Roihn/SABM

arXiv:2311.05856 [pdf, other]

Vortex-Antivortex Lattices in a Holographic Superconductor

Authors: Jia-Hao Su, Chuan-Yin Xia, Wei-Can Yang, Hua-Bi Zeng

Abstract: We employ the Einstein-Abelian-Higgs theory to investigate the structure of vortex-antivortex lattices within a superconductor driven by spatial periodic magnetic fields. By adjusting the parameters of the external magnetic field, including the period ($\mathcal{T}$) and the amplitude ($B_0$), various distinct vortex states emerge. These states encompass the Wigner crystallization state, the vorte… ▽ More We employ the Einstein-Abelian-Higgs theory to investigate the structure of vortex-antivortex lattices within a superconductor driven by spatial periodic magnetic fields. By adjusting the parameters of the external magnetic field, including the period ($\mathcal{T}$) and the amplitude ($B_0$), various distinct vortex states emerge. These states encompass the Wigner crystallization state, the vortex cluster state, and the suppressed state. Additionally, we present a comprehensive phase diagram to demarcate the specific regions where these structures emerge, contributing to our understanding of superconductivity in complex magnetic environments. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2311.01219 [pdf, other]

Scaling Law for Time-Reversal-Odd Nonlinear Transport

Authors: Yue-Xin Huang, Cong Xiao, Shengyuan A. Yang, Xiao Li

Abstract: Time-reversal-odd ($\mathcal{T}$-odd) nonlinear current response has been theoretically proposed and experimentally confirmed recently. However, the role of disorder scattering in the response, especially whether it contributes to the $σ_{xx}$-independent term, has not been clarified. In this work, we derive a general scaling law for this effect, which accounts for multiple scattering sources. We… ▽ More Time-reversal-odd ($\mathcal{T}$-odd) nonlinear current response has been theoretically proposed and experimentally confirmed recently. However, the role of disorder scattering in the response, especially whether it contributes to the $σ_{xx}$-independent term, has not been clarified. In this work, we derive a general scaling law for this effect, which accounts for multiple scattering sources. We show that the nonlinear conductivity is generally a quartic function in $σ_{xx}$. Besides intrinsic contribution, extrinsic contributions from scattering also enter the zeroth order term, and their values can be comparable to or even larger than the intrinsic one. Terms beyond zeroth order are all extrinsic. Cubic and quartic terms must involve skew scattering and they signal competition between at least two scattering sources. The behavior of zeroth order extrinsic terms is explicitly demonstrated in a Dirac model. Our finding reveals the significant role of disorder scattering in $\mathcal{T}$-odd nonlinear transport, and establishes a foundation for analyzing experimental result. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: 5 pages, 1 figure

arXiv:2311.01162 [pdf, other]

Heintze-Karcher inequality for anisotropic free boundary hypersurfaces in convex domains

Authors: Xiaohan Jia, Guofang Wang, Chao Xia, Xuwen Zhang

Abstract: In this paper, we prove an optimal Heintze-Karcher-type inequality for anisotropic free boundary hypersurfaces in general convex domains. The equality is achieved for anisotropic free boundary Wulff shapes in a convex cone. As applications, we prove various Alexandrov-type theorems. In this paper, we prove an optimal Heintze-Karcher-type inequality for anisotropic free boundary hypersurfaces in general convex domains. The equality is achieved for anisotropic free boundary Wulff shapes in a convex cone. As applications, we prove various Alexandrov-type theorems. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: 15 pages, 1 figure

MSC Class: 53C45; 53A10; 53C42

arXiv:2311.00267 [pdf, other]

Rethinking Decision Transformer via Hierarchical Reinforcement Learning

Authors: Yi Ma, Chenjun Xiao, Hebin Liang, Jianye Hao

Abstract: Decision Transformer (DT) is an innovative algorithm leveraging recent advances of the transformer architecture in reinforcement learning (RL). However, a notable limitation of DT is its reliance on recalling trajectories from datasets, losing the capability to seamlessly stitch sub-optimal trajectories together. In this work we introduce a general sequence modeling framework for studying sequenti… ▽ More Decision Transformer (DT) is an innovative algorithm leveraging recent advances of the transformer architecture in reinforcement learning (RL). However, a notable limitation of DT is its reliance on recalling trajectories from datasets, losing the capability to seamlessly stitch sub-optimal trajectories together. In this work we introduce a general sequence modeling framework for studying sequential decision making through the lens of Hierarchical RL. At the time of making decisions, a high-level policy first proposes an ideal prompt for the current state, a low-level policy subsequently generates an action conditioned on the given prompt. We show DT emerges as a special case of this framework with certain choices of high-level and low-level policies, and discuss the potential failure of these choices. Inspired by these observations, we study how to jointly optimize the high-level and low-level policies to enable the stitching ability, which further leads to the development of new offline RL algorithms. Our empirical results clearly show that the proposed algorithms significantly surpass DT on several control and navigation benchmarks. We hope our contributions can inspire the integration of transformer architectures within the field of RL. △ Less

Submitted 31 October, 2023; originally announced November 2023.

arXiv:2310.19617 [pdf, other]

Data-driven Modeling of a Coronal Magnetic Flux Rope: from Birth to Death

Authors: J. H. Guo, Y. W. Ni, Y. Guo, C. Xia, B. Schmieder, S. Poedts, Z. Zhong, Y. H. Zhou, F. Yu, P. F. Chen

Abstract: Magnetic flux ropes are a bundle of twisted magnetic field lines produced by internal electric currents, which are responsible for solar eruptions and are the major drivers of geomagnetic storms. As such, it is crucial to develop a numerical model that can capture the entire evolution of a flux rope, from its birth to death, in order to predict whether adverse space weather events might occur or n… ▽ More Magnetic flux ropes are a bundle of twisted magnetic field lines produced by internal electric currents, which are responsible for solar eruptions and are the major drivers of geomagnetic storms. As such, it is crucial to develop a numerical model that can capture the entire evolution of a flux rope, from its birth to death, in order to predict whether adverse space weather events might occur or not. In this paper, we develop a data-driven modeling that combines a time-dependent magneto-frictional approach with a thermodynamic magnetohydrodynamic model. Our numerical modeling successfully reproduces the formation and confined eruption of an observed flux rope, and unveils the physical details behind the observations. Regarding the long-term evolution of the active region, our simulation results indicate that the flux cancellation due to collisional shearing plays a critical role in the formation of the flux rope, corresponding to a substantial increase in magnetic free energy and helicity. Regarding the eruption stage, the deformation of the flux rope during its eruption can cause an increase in the downward tension force, which suppresses it from further rising. This finding may shed light on why some torus-unstable flux ropes lead to failed eruptions after large-angle rotations. Moreover, we find that twisted fluxes can accumulate during the confined eruptions, which would breed the subsequent eruptive flares. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: 30 pages, 10 figures, Accepted for ApJ

arXiv:2310.16193 [pdf, other]

Length is a Curse and a Blessing for Document-level Semantics

Authors: Chenghao Xiao, Yizhi Li, G Thomas Hudson, Chenghua Lin, Noura Al Moubayed

Abstract: In recent years, contrastive learning (CL) has been extensively utilized to recover sentence and document-level encoding capability from pre-trained language models. In this work, we question the length generalizability of CL-based models, i.e., their vulnerability towards length-induced semantic shift. We verify not only that length vulnerability is a significant yet overlooked research gap, but… ▽ More In recent years, contrastive learning (CL) has been extensively utilized to recover sentence and document-level encoding capability from pre-trained language models. In this work, we question the length generalizability of CL-based models, i.e., their vulnerability towards length-induced semantic shift. We verify not only that length vulnerability is a significant yet overlooked research gap, but we can devise unsupervised CL methods solely depending on the semantic signal provided by document length. We first derive the theoretical foundations underlying length attacks, showing that elongating a document would intensify the high intra-document similarity that is already brought by CL. Moreover, we found that isotropy promised by CL is highly dependent on the length range of text exposed in training. Inspired by these findings, we introduce a simple yet universal document representation learning framework, LA(SER)$^{3}$: length-agnostic self-reference for semantically robust sentence representation learning, achieving state-of-the-art unsupervised performance on the standard information retrieval benchmark. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: Accepted at EMNLP 2023. Our code is publicly available at https://github.com/gowitheflow-1998/LA-SER-cubed

arXiv:2310.15724 [pdf, other]

Variator: Accelerating Pre-trained Models with Plug-and-Play Compression Modules

Authors: Chaojun Xiao, Yuqi Luo, Wenbin Zhang, Pengle Zhang, Xu Han, Yankai Lin, Zhengyan Zhang, Ruobing Xie, Zhiyuan Liu, Maosong Sun, Jie Zhou

Abstract: Pre-trained language models (PLMs) have achieved remarkable results on NLP tasks but at the expense of huge parameter sizes and the consequent computational costs. In this paper, we propose Variator, a parameter-efficient acceleration method that enhances computational efficiency through plug-and-play compression plugins. Compression plugins are designed to reduce the sequence length via compressi… ▽ More Pre-trained language models (PLMs) have achieved remarkable results on NLP tasks but at the expense of huge parameter sizes and the consequent computational costs. In this paper, we propose Variator, a parameter-efficient acceleration method that enhances computational efficiency through plug-and-play compression plugins. Compression plugins are designed to reduce the sequence length via compressing multiple hidden vectors into one and trained with original PLMs frozen. Different from traditional model acceleration methods, which compress PLMs to smaller sizes, Variator offers two distinct advantages: (1) In real-world applications, the plug-and-play nature of our compression plugins enables dynamic selection of different compression plugins with varying acceleration ratios based on the current workload. (2) The compression plugin comprises a few compact neural network layers with minimal parameters, significantly saving storage and memory overhead, particularly in scenarios with a growing number of tasks. We validate the effectiveness of Variator on seven datasets. Experimental results show that Variator can save 53% computational costs using only 0.9% additional parameters with a performance drop of less than 2%. Moreover, when the model scales to billions of parameters, Variator matches the strong performance of uncompressed PLMs. △ Less

Submitted 20 February, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

Comments: Accepted by Findings of EMNLP

arXiv:2310.15602 [pdf, other]

doi 10.1145/3583780.3615125

MUSER: A Multi-View Similar Case Retrieval Dataset

Authors: Qingquan Li, Yiran Hu, Feng Yao, Chaojun Xiao, Zhiyuan Liu, Maosong Sun, Weixing Shen

Abstract: Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness. However, existing SCR datasets only focus on the fact description section when judging the similarity between cases, ignoring other valuable sections (e.g., the court's opinion) that can provide insightful reasoning process behind. Furthermore, the case similarities are t… ▽ More Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness. However, existing SCR datasets only focus on the fact description section when judging the similarity between cases, ignoring other valuable sections (e.g., the court's opinion) that can provide insightful reasoning process behind. Furthermore, the case similarities are typically measured solely by the textual semantics of the fact descriptions, which may fail to capture the full complexity of legal cases from the perspective of legal knowledge. In this work, we present MUSER, a similar case retrieval dataset based on multi-view similarity measurement and comprehensive legal element with sentence-level legal element annotations. Specifically, we select three perspectives (legal fact, dispute focus, and law statutory) and build a comprehensive and structured label schema of legal elements for each of them, to enable accurate and knowledgeable evaluation of case similarities. The constructed dataset originates from Chinese civil cases and contains 100 query cases and 4,024 candidate cases. We implement several text classification algorithms for legal element prediction and various retrieval methods for retrieving similar cases on MUSER. The experimental results indicate that incorporating legal elements can benefit the performance of SCR models, but further efforts are still required to address the remaining challenges posed by MUSER. The source code and dataset are released at https://github.com/THUlawtech/MUSER. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: Accepted by CIKM 2023 Resource Track

Journal ref: CIKM 2023

arXiv:2310.15460 [pdf, other]

HL-DPoS: An Enhanced Anti-Long-Range Attack DPoS Algorithm

Authors: Yang Li, Chunhe Xia, Chunyan Li, Yuan Zhao, Chen Chen, Tianbo Wang

Abstract: The consensus algorithm is crucial in blockchain for ensuring the validity and security of transactions across the decentralized network. However, achieving consensus among nodes and packaging blocks in blockchain networks is a complex task that requires efficient and secure consensus algorithms. The DPoS consensus algorithm has emerged as a popular choice due to its fast transaction processing an… ▽ More The consensus algorithm is crucial in blockchain for ensuring the validity and security of transactions across the decentralized network. However, achieving consensus among nodes and packaging blocks in blockchain networks is a complex task that requires efficient and secure consensus algorithms. The DPoS consensus algorithm has emerged as a popular choice due to its fast transaction processing and high throughput. Despite these advantages, the algorithm still suffers from weaknesses such as centralization and vulnerability to long-range attacks, which can compromise the integrity of the blockchain network. To combat these problems, we developed an Enhanced Anti-Long-Range Attack DPoS algorithm (HL-DPoS). First, we split nodes into pieces to reduce centralization issues while giving witness nodes the power to report and benefit from malicious node's reports, maintaining high efficiency and high security. Second, we propose a validation method in HL-DPoS that compares consensuses transactions with the longest chain to detect long-range attacks. Algorithm analysis and simulation experiment results demonstrate that our HL-DPoS consensus algorithm improves security while achieving better consensus performance. △ Less

Submitted 23 October, 2023; originally announced October 2023.

arXiv:2310.14323 [pdf, other]

Strangelets at finite temperature: nucleon emission rates, interface and shell effects

Authors: Hao-Song You, Huai-Min Chen, Jian-Feng Xu, Cheng-Jun Xia, Guang-Xiong Peng, Ren-Xin Xu

Abstract: We investigate the properties of strangelets at finite temperature $T$, where an equivparticle model is adopted with both the linear confinement and leading-order perturbative interactions accounted for using density-dependent quark masses. The shell effects are examined by solving the Dirac equations for quarks in the mean-field approximation, which diminish with temperature as the occupation pro… ▽ More We investigate the properties of strangelets at finite temperature $T$, where an equivparticle model is adopted with both the linear confinement and leading-order perturbative interactions accounted for using density-dependent quark masses. The shell effects are examined by solving the Dirac equations for quarks in the mean-field approximation, which diminish with temperature as the occupation probability of each single-particle levels fixed by the Fermi-Dirac statistics, i.e., shell dampening. Consequently, instead of decreasing with temperature, the surface tension extracted from a liquid-drop formula increases with $T$ until reaching its peak at $T\approx 20$-40 MeV with vanishing shell corrections, where the formula roughly reproduces the free energy per baryon of all strangelets. The curvature term, nevertheless, decreases with $T$ despite the presence of shell effects. The neutron and proton emission rates are fixed microscopically according to the external nucleon gas densities that are in equilibrium with strangelets, which generally increase with $T$ ($\lesssim 50$ MeV) for stable strangelets but decrease for those that are unstable against nucleon emission at $T=0$. The energy, free energy, entropy, charge-to-mass ratio, strangeness per baryon, and root-mean-square radius of $β$-stable strangelets obtained with various parameter sets are presented as well. The results indicated in this work are useful for understanding the products of binary compact star mergers and heavy-ion collisions. △ Less

Submitted 22 October, 2023; originally announced October 2023.

Showing 101–150 of 1,014 results for author: Xia, C