-
Capillary hypersurfaces, Heintze-Karcher's inequality and Zermelo's navigation
Authors:
Guofang Wang,
Chao Xia
Abstract:
In this paper, we establish a Heintze-Karcher-type inequality for capillary hypersurfaces in a unit ball. To achieve this, we introduce a special Finsler metric given by Zermelo's navigation and study the geodesic normal flow with respect to this Finsler metric. Our results indicate that the relationship between capillary hypersufaces and hypersurfaces with free boundary is similar to the one betw…
▽ More
In this paper, we establish a Heintze-Karcher-type inequality for capillary hypersurfaces in a unit ball. To achieve this, we introduce a special Finsler metric given by Zermelo's navigation and study the geodesic normal flow with respect to this Finsler metric. Our results indicate that the relationship between capillary hypersufaces and hypersurfaces with free boundary is similar to the one between Finsler geometry and Riemannian geometry.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
Human-AI Collaborative Essay Scoring: A Dual-Process Framework with LLMs
Authors:
Changrong Xiao,
Wenxing Ma,
Qing** Song,
Sean Xin Xu,
Kunpeng Zhang,
Yufang Wang,
Qi Fu
Abstract:
Receiving timely and personalized feedback is essential for second-language learners, especially when human instructors are unavailable. This study explores the effectiveness of Large Language Models (LLMs), including both proprietary and open-source models, for Automated Essay Scoring (AES). Through extensive experiments with public and private datasets, we find that while LLMs do not surpass con…
▽ More
Receiving timely and personalized feedback is essential for second-language learners, especially when human instructors are unavailable. This study explores the effectiveness of Large Language Models (LLMs), including both proprietary and open-source models, for Automated Essay Scoring (AES). Through extensive experiments with public and private datasets, we find that while LLMs do not surpass conventional state-of-the-art (SOTA) grading models in performance, they exhibit notable consistency, generalizability, and explainability. We propose an open-source LLM-based AES system, inspired by the dual-process theory. Our system offers accurate grading and high-quality feedback, at least comparable to that of fine-tuned proprietary LLMs, in addition to its ability to alleviate misgrading. Furthermore, we conduct human-AI co-grading experiments with both novice and expert graders. We find that our system not only automates the grading process but also enhances the performance and efficiency of human graders, particularly for essays where the model has lower confidence. These results highlight the potential of LLMs to facilitate effective human-AI collaboration in the educational context, potentially transforming learning experiences through AI-generated feedback.
△ Less
Submitted 14 June, 2024; v1 submitted 12 January, 2024;
originally announced January 2024.
-
MultiSlot ReRanker: A Generic Model-based Re-Ranking Framework in Recommendation Systems
Authors:
Qiang Charles Xiao,
Ajith Muralidharan,
Birjodh Tiwana,
Johnson Jia,
Fedor Borisyuk,
Aman Gupta,
Dawn Woodard
Abstract:
In this paper, we propose a generic model-based re-ranking framework, MultiSlot ReRanker, which simultaneously optimizes relevance, diversity, and freshness. Specifically, our Sequential Greedy Algorithm (SGA) is efficient enough (linear time complexity) for large-scale production recommendation engines. It achieved a lift of $+6\%$ to $ +10\%$ offline Area Under the receiver operating characteris…
▽ More
In this paper, we propose a generic model-based re-ranking framework, MultiSlot ReRanker, which simultaneously optimizes relevance, diversity, and freshness. Specifically, our Sequential Greedy Algorithm (SGA) is efficient enough (linear time complexity) for large-scale production recommendation engines. It achieved a lift of $+6\%$ to $ +10\%$ offline Area Under the receiver operating characteristic Curve (AUC) which is mainly due to explicitly modeling mutual influences among items of a list, and leveraging the second pass ranking scores of multiple objectives. In addition, we have generalized the offline replay theory to multi-slot re-ranking scenarios, with trade-offs among multiple objectives. The offline replay results can be further improved by Pareto Optimality. Moreover, we've built a multi-slot re-ranking simulator based on OpenAI Gym integrated with the Ray framework. It can be easily configured for different assumptions to quickly benchmark both reinforcement learning and supervised learning algorithms.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
TrustLLM: Trustworthiness in Large Language Models
Authors:
Lichao Sun,
Yue Huang,
Haoran Wang,
Siyuan Wu,
Qihui Zhang,
Yuan Li,
Chujie Gao,
Yixin Huang,
Wenhan Lyu,
Yixuan Zhang,
Xiner Li,
Zhengliang Liu,
Yixin Liu,
Yijue Wang,
Zhikun Zhang,
Bertie Vidgen,
Bhavya Kailkhura,
Caiming Xiong,
Chaowei Xiao,
Chunyuan Li,
Eric Xing,
Furong Huang,
Hao Liu,
Heng Ji,
Hongyi Wang
, et al. (45 additional authors not shown)
Abstract:
Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in…
▽ More
Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and utility (i.e., functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. However, a few open-source LLMs come very close to proprietary ones. Thirdly, it is important to note that some LLMs may be overly calibrated towards exhibiting trustworthiness, to the extent that they compromise their utility by mistakenly treating benign prompts as harmful and consequently not responding. Finally, we emphasize the importance of ensuring transparency not only in the models themselves but also in the technologies that underpin trustworthiness. Knowing the specific trustworthy technologies that have been employed is crucial for analyzing their effectiveness.
△ Less
Submitted 17 March, 2024; v1 submitted 10 January, 2024;
originally announced January 2024.
-
HYPIC: A fast hybrid EM PIC-MCC code for ion cyclotron resonance energization in cylindrical coordinate system
Authors:
Mingyang Wu,
Andong Xu,
Chijie Xiao
Abstract:
Ion cyclotron resonance energization (ICRE) such as ion cyclotron resonance heating (ICRH) is widely applied to magnetic confinement fusion and high-power electric propulsion. Since ICRE involves cyclotron resonance processes, a kinetic model is required. Both conventional particle-in-cell (PIC) simulations and solving the Boltzmann equation require enormous computation and memory. The hybrid simu…
▽ More
Ion cyclotron resonance energization (ICRE) such as ion cyclotron resonance heating (ICRH) is widely applied to magnetic confinement fusion and high-power electric propulsion. Since ICRE involves cyclotron resonance processes, a kinetic model is required. Both conventional particle-in-cell (PIC) simulations and solving the Boltzmann equation require enormous computation and memory. The hybrid simulation incorporating of adiabatic electrons and PIC ions allows both a substantial reduction in computation and the inclusion of cyclotron resonance effects. Under the adiabatic electron approximation, we have developed a two-dimensional (r,z) hybrid electromagnetic (EM) PIC-MCC (Monte-Carlo collision) simulation program, named HYPIC. The advantages of HYPIC are the inclusion of ion kinetic effects, electrostatic (ES) and EM effects, and collisional effects of ions and electrons, with a small computation. The HYPIC program is able to fast simulate the antenna-plasma interactions and the ion cyclotron resonance energization and/or ion cyclotron resonance heating processes in linear devices, such as high-power electric propulsion, magnetic mirror, and field-reversed-configuration (FRC), etc.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
$\bar{B}_s^0 \to D_{s1}(2460)^+ K^-, D_{s1}(2536)^+ K^-$ and the nature of the two $D_{s1}$ resonance
Authors:
Jia-Xin Lin,
Hua-Xing Chen,
Wei-Hong Liang,
Chu-Wen Xiao,
Eulogio Oset
Abstract:
Starting from the molecular picture for the $D_{s1}(2460)$ and $D_{s1}(2536)$ resonances, which are dynamically generated by the interaction of coupled channels, the most important of which are the $D^*K$ for the $D_{s1}(2460)$ and $DK^*$ for the $D_{s1}(2536)$, we evaluate the ratio of decay widths for the $\bar{B}_s^0 \to D_{s1}(2460)^+ K^-$ and $\bar{B}_s^0 \to D_{s1}(2536)^+ K^-$ decays, the l…
▽ More
Starting from the molecular picture for the $D_{s1}(2460)$ and $D_{s1}(2536)$ resonances, which are dynamically generated by the interaction of coupled channels, the most important of which are the $D^*K$ for the $D_{s1}(2460)$ and $DK^*$ for the $D_{s1}(2536)$, we evaluate the ratio of decay widths for the $\bar{B}_s^0 \to D_{s1}(2460)^+ K^-$ and $\bar{B}_s^0 \to D_{s1}(2536)^+ K^-$ decays, the latter of which has been recently investigated by the LHCb collaboration, and we obtain a ratio of the order of unity. The present results should provide an incentive for the related decay into the $D_{s1}(2460)$ resonance to be performed, which would provide valuable information on the nature of these two resonances.
△ Less
Submitted 22 April, 2024; v1 submitted 9 January, 2024;
originally announced January 2024.
-
FFSplit: Split Feed-Forward Network For Optimizing Accuracy-Efficiency Trade-off in Language Model Inference
Authors:
Zirui Liu,
Qingquan Song,
Qiang Charles Xiao,
Sathiya Keerthi Selvaraj,
Rahul Mazumder,
Aman Gupta,
Xia Hu
Abstract:
The large number of parameters in Pretrained Language Models enhance their performance, but also make them resource-intensive, making it challenging to deploy them on commodity hardware like a single GPU. Due to the memory and power limitations of these devices, model compression techniques are often used to decrease both the model's size and its inference latency. This usually results in a trade-…
▽ More
The large number of parameters in Pretrained Language Models enhance their performance, but also make them resource-intensive, making it challenging to deploy them on commodity hardware like a single GPU. Due to the memory and power limitations of these devices, model compression techniques are often used to decrease both the model's size and its inference latency. This usually results in a trade-off between model accuracy and efficiency. Therefore, optimizing this balance is essential for effectively deploying LLMs on commodity hardware. A significant portion of the efficiency challenge is the Feed-forward network (FFN) component, which accounts for roughly $\frac{2}{3}$ total parameters and inference latency. In this paper, we first observe that only a few neurons of FFN module have large output norm for any input tokens, a.k.a. heavy hitters, while the others are sparsely triggered by different tokens. Based on this observation, we explicitly split the FFN into two parts according to the heavy hitters. We improve the efficiency-accuracy trade-off of existing compression methods by allocating more resource to FFN parts with heavy hitters. In practice, our method can reduce model size by 43.1\% and bring $1.25\sim1.56\times$ wall clock time speedup on different hardware with negligible accuracy drop.
△ Less
Submitted 8 January, 2024;
originally announced January 2024.
-
Boosted Dark Matter From Centaurus A and Its Detection
Authors:
Chen Xia,
Chuan-Yang Xing,
Yan-Hao Xu
Abstract:
Dark matter can be boosted by high energy particles in astrophysical environments through elastic scattering. We study the production of boosted dark matter via scattering with electrons in the relativistic jet of the closest active galactic nucleus, Centaurus A, and its detection in the Super-Kamiokande experiment. Since there are a huge number of electrons in the jet and dark matter is extremely…
▽ More
Dark matter can be boosted by high energy particles in astrophysical environments through elastic scattering. We study the production of boosted dark matter via scattering with electrons in the relativistic jet of the closest active galactic nucleus, Centaurus A, and its detection in the Super-Kamiokande experiment. Since there are a huge number of electrons in the jet and dark matter is extremely dense around the supermassive black hole that powers the jet, the number of boosted dark matter is tremendously large. Compared to boosted dark matter from blazars, the dark matter flux from Centaurus A is enhanced due to the proximity of Centaurus A. The constraint on dark matter-electron scattering cross section set by Super-Kamiokande is more stringent, down to $\sim 10^{-36} \, \mathrm{cm}^2$ for $\mathrm{MeV}$ dark matter.
△ Less
Submitted 14 March, 2024; v1 submitted 8 January, 2024;
originally announced January 2024.
-
Large enhancement of spin-orbit torques under a MHz modulation due to phonon-magnon coupling
Authors:
Hanying Zhang,
Qianwen Zhao,
Baiqing Jiang,
Yuan Wang,
Tunan Xie,
Kaihua Lou,
ChaoChao Xia,
C. Bi
Abstract:
The discovery of spin-orbit torques (SOTs) generated through the spin Hall or Rashba effects provides an alternative write approach for magnetic random-access memory (MRAM), igniting the development of spin-orbitronics in recent years. Quantitative characterization of SOTs highly relies on the SOT-driven ferromagnetic resonance (ST-FMR), where a modulated microwave current is used to generate ac S…
▽ More
The discovery of spin-orbit torques (SOTs) generated through the spin Hall or Rashba effects provides an alternative write approach for magnetic random-access memory (MRAM), igniting the development of spin-orbitronics in recent years. Quantitative characterization of SOTs highly relies on the SOT-driven ferromagnetic resonance (ST-FMR), where a modulated microwave current is used to generate ac SOTs and the modulation-frequency is usually less than 100 kHz (the limit of conventional lock-in amplifiers). Here we have investigated the SOT of typical SOT material/ferromagnet bilayers in an extended modulation-frequency range, up to MHz, by develo** the ST-FMR measurement. Remarkably, we found that the measured SOTs are enhanced about three times in the MHz range, which cannot be explained according to present SOT theory. We attribute the enhancement of SOT to additional magnon excitations due to phonon-magnon coupling, which is also reflected in the slight changes of resonant field and linewidth in the acquired ST-FMR spectra, corresponding to the modifications of effective magnetization and dam** constant, respectively. Our results indicate that the write current of SOT-MRAM may be reduced with the assistant of phonon-magnon coupling.
△ Less
Submitted 1 December, 2023;
originally announced January 2024.
-
PPBFL: A Privacy Protected Blockchain-based Federated Learning Model
Authors:
Yang Li,
Chunhe Xia,
Wanshuang Lin,
Tianbo Wang
Abstract:
With the rapid development of machine learning and a growing concern for data privacy, federated learning has become a focal point of attention. However, attacks on model parameters and a lack of incentive mechanisms hinder the effectiveness of federated learning. Therefore, we propose A Privacy Protected Blockchain-based Federated Learning Model (PPBFL) to enhance the security of federated learni…
▽ More
With the rapid development of machine learning and a growing concern for data privacy, federated learning has become a focal point of attention. However, attacks on model parameters and a lack of incentive mechanisms hinder the effectiveness of federated learning. Therefore, we propose A Privacy Protected Blockchain-based Federated Learning Model (PPBFL) to enhance the security of federated learning and encourage active participation of nodes in model training. Blockchain technology ensures the integrity of model parameters stored in the InterPlanetary File System (IPFS), providing protection against tampering. Within the blockchain, we introduce a Proof of Training Work (PoTW) consensus algorithm tailored for federated learning, aiming to incentive training nodes. This algorithm rewards nodes with greater computational power, promoting increased participation and effort in the federated learning process. A novel adaptive differential privacy algorithm is simultaneously applied to local and global models. This safeguards the privacy of local data at training clients, preventing malicious nodes from launching inference attacks. Additionally, it enhances the security of the global model, preventing potential security degradation resulting from the combination of numerous local models. The possibility of security degradation is derived from the composition theorem. By introducing reverse noise in the global model, a zero-bias estimate of differential privacy noise between local and global models is achieved. Furthermore, we propose a new mix transactions mechanism utilizing ring signature technology to better protect the identity privacy of local training clients. Security analysis and experimental results demonstrate that PPBFL, compared to baseline methods, not only exhibits superior model performance but also achieves higher security.
△ Less
Submitted 8 January, 2024; v1 submitted 2 January, 2024;
originally announced January 2024.
-
Polynomial-time Approximation Scheme for Equilibriums of Games
Authors:
Hongbo Sun,
Chongkun Xia,
Junbo Tan,
Bo Yuan,
Xueqian Wang,
Bin Liang
Abstract:
Whether a PTAS (polynomial-time approximation scheme) exists for equilibriums of games has been an open question, which relates to questions in three fields, the practicality of methods in algorithmic game theory, the equation PPAD=FP about the two complexity classes in computational complexity theory, and non-stationarity and curse of multiagency in MARL (multi-agent reinforcement learning). This…
▽ More
Whether a PTAS (polynomial-time approximation scheme) exists for equilibriums of games has been an open question, which relates to questions in three fields, the practicality of methods in algorithmic game theory, the equation PPAD=FP about the two complexity classes in computational complexity theory, and non-stationarity and curse of multiagency in MARL (multi-agent reinforcement learning). This paper introduces our discovery of the sufficient and necessary conditions for iterations based on dynamic programming and line search to approximate perfect equilibriums of dynamic games, out of which we construct a method proved to be a FPTAS (fully PTAS) for non-singular perfect equilibriums of dynamic games, where for almost any given dynamic game, all its perfect equilibriums are non-singular, indicating that FP$\subseteq$PPAD$\subseteq$Almost-FP. Our discovery consists of cone interior dynamic programming and primal-dual unbiased regret minimization, which fit into existing theories by degeneration in a structure-preserving manner. The former enables a dynamic programming operator to iteratively converge to a perfect equilibrium based on a concept called policy cone. The latter enables an interior-point line search to approximate a Nash equilibrium based on two concepts called primal-dual bias and unbiased central variety, solving a subproblem of the former. Validity of our discovery is cross-corroborated by a combination of theorem proofs, graphs of the three main concepts, and experimental results.
△ Less
Submitted 3 June, 2024; v1 submitted 1 January, 2024;
originally announced January 2024.
-
A Non-Uniform Low-Light Image Enhancement Method with Multi-Scale Attention Transformer and Luminance Consistency Loss
Authors:
Xiao Fang,
Xin Gao,
Baofeng Li,
Feng Zhai,
Yu Qin,
Zhihang Meng,
Jiansheng Lu,
Chun Xiao
Abstract:
Low-light image enhancement aims to improve the perception of images collected in dim environments and provide high-quality data support for image recognition tasks. When dealing with photos captured under non-uniform illumination, existing methods cannot adaptively extract the differentiated luminance information, which will easily cause over-exposure and under-exposure. From the perspective of u…
▽ More
Low-light image enhancement aims to improve the perception of images collected in dim environments and provide high-quality data support for image recognition tasks. When dealing with photos captured under non-uniform illumination, existing methods cannot adaptively extract the differentiated luminance information, which will easily cause over-exposure and under-exposure. From the perspective of unsupervised learning, we propose a multi-scale attention Transformer named MSATr, which sufficiently extracts local and global features for light balance to improve the visual quality. Specifically, we present a multi-scale window division scheme, which uses exponential sequences to adjust the window size of each layer. Within different-sized windows, the self-attention computation can be refined, ensuring the pixel-level feature processing capability of the model. For feature interaction across windows, a global transformer branch is constructed to provide comprehensive brightness perception and alleviate exposure problems. Furthermore, we propose a loop training strategy, using the diverse images generated by weighted mixing and a luminance consistency loss to improve the model's generalization ability effectively. Extensive experiments on several benchmark datasets quantitatively and qualitatively prove that our MSATr is superior to state-of-the-art low-light image enhancement methods, and the enhanced images have more natural brightness and outstanding details. The code is released at https://github.com/fang001021/MSATr.
△ Less
Submitted 27 December, 2023;
originally announced December 2023.
-
Spontaneous onset of three-dimensional motion with subsequent spatial and temporal reduction in convective flow systems
Authors:
Patrick J. Stofanak,
Cheng-Nian Xiao,
Inanc Senocak
Abstract:
We study the spontaneous emergence of three-dimensional motion from a quiescent, pure conduction state in stably stratified, convective flow within a triangular enclosure, which eventually self-organizes into a two-dimensional steady state. This phenomenon demonstrates that the optimal disturbance path to reach the final state is more complex than the state itself, indicating the "fastest" route i…
▽ More
We study the spontaneous emergence of three-dimensional motion from a quiescent, pure conduction state in stably stratified, convective flow within a triangular enclosure, which eventually self-organizes into a two-dimensional steady state. This phenomenon demonstrates that the optimal disturbance path to reach the final state is more complex than the state itself, indicating the "fastest" route involves a higher-dimensional intermediate state. This provides a model for transient spatio-temporal chaos in nonlinear dynamical systems and a challenge for classical hydrodynamic stability theory.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios
Authors:
Wenhao Ding,
Yulong Cao,
Ding Zhao,
Chaowei Xiao,
Marco Pavone
Abstract:
Simulation plays a crucial role in the development of autonomous vehicles (AVs) due to the potential risks associated with real-world testing. Although significant progress has been made in the visual aspects of simulators, generating complex behavior among agents remains a formidable challenge. It is not only imperative to ensure realism in the scenarios generated but also essential to incorporat…
▽ More
Simulation plays a crucial role in the development of autonomous vehicles (AVs) due to the potential risks associated with real-world testing. Although significant progress has been made in the visual aspects of simulators, generating complex behavior among agents remains a formidable challenge. It is not only imperative to ensure realism in the scenarios generated but also essential to incorporate preferences and conditions to facilitate controllable generation for AV training and evaluation. Traditional methods, mainly relying on memorizing the distribution of training datasets, often fall short in generating unseen scenarios. Inspired by the success of retrieval augmented generation in large language models, we present RealGen, a novel retrieval-based in-context learning framework for traffic scenario generation. RealGen synthesizes new scenarios by combining behaviors from multiple retrieved examples in a gradient-free way, which may originate from templates or tagged scenarios. This in-context learning framework endows versatile generative capabilities, including the ability to edit scenarios, compose various behaviors, and produce critical scenarios. Evaluations show that RealGen offers considerable flexibility and controllability, marking a new direction in the field of controllable traffic scenario generation. Check our project website for more information: https://realgen.github.io.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
DLCA-Recon: Dynamic Loose Clothing Avatar Reconstruction from Monocular Videos
Authors:
Chunjie Luo,
Fei Luo,
Yusen Wang,
Enxu Zhao,
Chunxia Xiao
Abstract:
Reconstructing a dynamic human with loose clothing is an important but difficult task. To address this challenge, we propose a method named DLCA-Recon to create human avatars from monocular videos. The distance from loose clothing to the underlying body rapidly changes in every frame when the human freely moves and acts. Previous methods lack effective geometric initialization and constraints for…
▽ More
Reconstructing a dynamic human with loose clothing is an important but difficult task. To address this challenge, we propose a method named DLCA-Recon to create human avatars from monocular videos. The distance from loose clothing to the underlying body rapidly changes in every frame when the human freely moves and acts. Previous methods lack effective geometric initialization and constraints for guiding the optimization of deformation to explain this dramatic change, resulting in the discontinuous and incomplete reconstruction surface. To model the deformation more accurately, we propose to initialize an estimated 3D clothed human in the canonical space, as it is easier for deformation fields to learn from the clothed human than from SMPL. With both representations of explicit mesh and implicit SDF, we utilize the physical connection information between consecutive frames and propose a dynamic deformation field (DDF) to optimize deformation fields. DDF accounts for contributive forces on loose clothing to enhance the interpretability of deformations and effectively capture the free movement of loose clothing. Moreover, we propagate SMPL skinning weights to each individual and refine pose and skinning weights during the optimization to improve skinning transformation. Based on more reasonable initialization and DDF, we can simulate real-world physics more accurately. Extensive experiments on public and our own datasets validate that our method can produce superior results for humans with loose clothing compared to the SOTA methods.
△ Less
Submitted 20 December, 2023; v1 submitted 19 December, 2023;
originally announced December 2023.
-
Layer Hall counterflow as a model probe of magic-angle twisted bilayer graphene
Authors:
Jihang Zhu,
Dawei Zhai,
Cong Xiao,
Wang Yao
Abstract:
The recent constructions of flat moiré minibands in specifically twisted multilayer graphene and twisted transition metal dichalcogenides (TMDs) have facilitated the observation of strong correlations with a convenient tunability. These correlations in flat bands result in the band dispersion heavily influenced by carrier densities, leading to filling-dependent quasiparticle band renormalizations.…
▽ More
The recent constructions of flat moiré minibands in specifically twisted multilayer graphene and twisted transition metal dichalcogenides (TMDs) have facilitated the observation of strong correlations with a convenient tunability. These correlations in flat bands result in the band dispersion heavily influenced by carrier densities, leading to filling-dependent quasiparticle band renormalizations. Particularly, in magic-angle twisted bilayer graphene (MATBG), the band structure--including the quasiparticle energy and wavefunction--is crucial in understanding the correlated properties. Previous theoretical studies have demonstrated the presence of a time-reversal-even charge Hall counterflow in response to a direct current (DC) electric field in twisted bilayers as chiral structures. In this study, we show that such layer Hall counterflow can serve as a sensitive probe for MATBG model parameters, which are currently ambiguous as a result of unavoidable structural relaxation and twist-angle disorder. We present the layer Hall counterflow and the associated in-plane magnetization for three different MATBG continuum models, based on which many-body interacting models have been widely applied to study strong correlations in MATBG. At the single-particle level, our findings indicate notable differences in layer-projected Hall conductivity, both in magnitude and sign, between different MATBG continuum models. Furthermore, our self-consistent Hartree calculations, performed on each of these single-particle continuum models, reveal renormalized layer-projected Hall conductivity by the self-consistent Hartree field.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Robo360: A 3D Omnispective Multi-Material Robotic Manipulation Dataset
Authors:
Litian Liang,
Liuyu Bian,
Caiwei Xiao,
Jialin Zhang,
Linghao Chen,
Isabella Liu,
Fanbo Xiang,
Zhiao Huang,
Hao Su
Abstract:
Building robots that can automate labor-intensive tasks has long been the core motivation behind the advancements in computer vision and the robotics community. Recent interest in leveraging 3D algorithms, particularly neural fields, has led to advancements in robot perception and physical understanding in manipulation scenarios. However, the real world's complexity poses significant challenges. T…
▽ More
Building robots that can automate labor-intensive tasks has long been the core motivation behind the advancements in computer vision and the robotics community. Recent interest in leveraging 3D algorithms, particularly neural fields, has led to advancements in robot perception and physical understanding in manipulation scenarios. However, the real world's complexity poses significant challenges. To tackle these challenges, we present Robo360, a dataset that features robotic manipulation with a dense view coverage, which enables high-quality 3D neural representation learning, and a diverse set of objects with various physical and optical properties and facilitates research in various object manipulation and physical world modeling tasks. We confirm the effectiveness of our dataset using existing dynamic NeRF and evaluate its potential in learning multi-view policies. We hope that Robo360 can open new research directions yet to be explored at the intersection of understanding the physical world in 3D and robot control.
△ Less
Submitted 9 December, 2023;
originally announced December 2023.
-
ConSequence: Synthesizing Logically Constrained Sequences for Electronic Health Record Generation
Authors:
Brandon Theodorou,
Shrusti Jain,
Cao Xiao,
Jimeng Sun
Abstract:
Generative models can produce synthetic patient records for analytical tasks when real data is unavailable or limited. However, current methods struggle with adhering to domain-specific knowledge and removing invalid data. We present ConSequence, an effective approach to integrating domain knowledge into sequential generative neural network outputs. Our rule-based formulation includes temporal agg…
▽ More
Generative models can produce synthetic patient records for analytical tasks when real data is unavailable or limited. However, current methods struggle with adhering to domain-specific knowledge and removing invalid data. We present ConSequence, an effective approach to integrating domain knowledge into sequential generative neural network outputs. Our rule-based formulation includes temporal aggregation and antecedent evaluation modules, ensured by an efficient matrix multiplication formulation, to satisfy hard and soft logical constraints across time steps. Existing constraint methods often fail to guarantee constraint satisfaction, lack the ability to handle temporal constraints, and hinder the learning and computational efficiency of the model. In contrast, our approach efficiently handles all types of constraints with guaranteed logical coherence. We demonstrate ConSequence's effectiveness in generating electronic health records, outperforming competitors in achieving complete temporal and spatial constraint satisfaction without compromising runtime performance or generative quality. Specifically, ConSequence successfully prevents all rule violations while improving the model quality in reducing its test perplexity by 5% and incurring less than a 13% slowdown in generation speed compared to an unconstrained model.
△ Less
Submitted 20 December, 2023; v1 submitted 10 December, 2023;
originally announced December 2023.
-
The duals of narrow-sense BCH codes with length $\frac{q^m-1}λ$
Authors:
Xiaoqiang Wang,
Chengliang Xiao,
Dabin Zheng
Abstract:
BCH codes are an interesting class of cyclic codes due to their efficient encoding and decoding algorithms. In the past sixty years, a lot of progress on the study of BCH codes has been made, but little is known about the properties of their duals. Recently, in order to study the duals of BCH codes and the lower bounds on their minimum distances, a new concept called dually-BCH code was proposed b…
▽ More
BCH codes are an interesting class of cyclic codes due to their efficient encoding and decoding algorithms. In the past sixty years, a lot of progress on the study of BCH codes has been made, but little is known about the properties of their duals. Recently, in order to study the duals of BCH codes and the lower bounds on their minimum distances, a new concept called dually-BCH code was proposed by authors in \cite{GDL21}. In this paper, the lower bounds on the minimum distances of the duals of narrow-sense BCH codes with length $\frac{q^m-1}λ$ over $\mathbb{F}_q$ are developed, where $λ$ is a positive integer satisfying $λ\, |\, q-1$, or $λ=q^s-1$ and $s\, |\,m$. In addition, the sufficient and necessary conditions in terms of the designed distances for these codes being dually-BCH codes are presented. Many considered codes in \cite{GDL21} and \cite{Wang23} are the special cases of the codes showed in this paper.
Our lower bounds on the minimum distances of the duals of BCH codes include the bounds stated in \cite{GDL21} as a special case. Several examples show that the lower bounds are good in some cases.
△ Less
Submitted 9 December, 2023;
originally announced December 2023.
-
Exploring the Limits of ChatGPT in Software Security Applications
Authors:
Fangzhou Wu,
Qingzhao Zhang,
Ati Priya Bajaj,
Tiffany Bao,
Ning Zhang,
Ruoyu "Fish" Wang,
Chaowei Xiao
Abstract:
Large language models (LLMs) have undergone rapid evolution and achieved remarkable results in recent times. OpenAI's ChatGPT, backed by GPT-3.5 or GPT-4, has gained instant popularity due to its strong capability across a wide range of tasks, including natural language tasks, coding, mathematics, and engaging conversations. However, the impacts and limits of such LLMs in system security domain ar…
▽ More
Large language models (LLMs) have undergone rapid evolution and achieved remarkable results in recent times. OpenAI's ChatGPT, backed by GPT-3.5 or GPT-4, has gained instant popularity due to its strong capability across a wide range of tasks, including natural language tasks, coding, mathematics, and engaging conversations. However, the impacts and limits of such LLMs in system security domain are less explored. In this paper, we delve into the limits of LLMs (i.e., ChatGPT) in seven software security applications including vulnerability detection/repair, debugging, debloating, decompilation, patching, root cause analysis, symbolic execution, and fuzzing. Our exploration reveals that ChatGPT not only excels at generating code, which is the conventional application of language models, but also demonstrates strong capability in understanding user-provided commands in natural languages, reasoning about control and data flows within programs, generating complex data structures, and even decompiling assembly code. Notably, GPT-4 showcases significant improvements over GPT-3.5 in most security tasks. Also, certain limitations of ChatGPT in security-related tasks are identified, such as its constrained ability to process long code contexts.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
DeceptPrompt: Exploiting LLM-driven Code Generation via Adversarial Natural Language Instructions
Authors:
Fangzhou Wu,
Xiaogeng Liu,
Chaowei Xiao
Abstract:
With the advancement of Large Language Models (LLMs), significant progress has been made in code generation, enabling LLMs to transform natural language into programming code. These Code LLMs have been widely accepted by massive users and organizations. However, a dangerous nature is hidden in the code, which is the existence of fatal vulnerabilities. While some LLM providers have attempted to add…
▽ More
With the advancement of Large Language Models (LLMs), significant progress has been made in code generation, enabling LLMs to transform natural language into programming code. These Code LLMs have been widely accepted by massive users and organizations. However, a dangerous nature is hidden in the code, which is the existence of fatal vulnerabilities. While some LLM providers have attempted to address these issues by aligning with human guidance, these efforts fall short of making Code LLMs practical and robust. Without a deep understanding of the performance of the LLMs under the practical worst cases, it would be concerning to apply them to various real-world applications. In this paper, we answer the critical issue: Are existing Code LLMs immune to generating vulnerable code? If not, what is the possible maximum severity of this issue in practical deployment scenarios? In this paper, we introduce DeceptPrompt, a novel algorithm that can generate adversarial natural language instructions that drive the Code LLMs to generate functionality correct code with vulnerabilities. DeceptPrompt is achieved through a systematic evolution-based algorithm with a fine grain loss design. The unique advantage of DeceptPrompt enables us to find natural prefix/suffix with totally benign and non-directional semantic meaning, meanwhile, having great power in inducing the Code LLMs to generate vulnerable code. This feature can enable us to conduct the almost-worstcase red-teaming on these LLMs in a real scenario, where users are using natural language. Our extensive experiments and analyses on DeceptPrompt not only validate the effectiveness of our approach but also shed light on the huge weakness of LLMs in the code generation task. When applying the optimized prefix/suffix, the attack success rate (ASR) will improve by average 50% compared with no prefix/suffix applying.
△ Less
Submitted 12 December, 2023; v1 submitted 7 December, 2023;
originally announced December 2023.
-
Predicting Scores of Various Aesthetic Attribute Sets by Learning from Overall Score Labels
Authors:
Heng Huang,
Xin **,
Yaqi Liu,
Hao Lou,
Chaoen Xiao,
Shuai Cui,
Xinning Li,
Dongqing Zou
Abstract:
Now many mobile phones embed deep-learning models for evaluation or guidance on photography. These models cannot provide detailed results like human pose scores or scene color scores because of the rare of corresponding aesthetic attribute data. However, the annotation of image aesthetic attribute scores requires experienced artists and professional photographers, which hinders the collection of l…
▽ More
Now many mobile phones embed deep-learning models for evaluation or guidance on photography. These models cannot provide detailed results like human pose scores or scene color scores because of the rare of corresponding aesthetic attribute data. However, the annotation of image aesthetic attribute scores requires experienced artists and professional photographers, which hinders the collection of large-scale fully-annotated datasets. In this paper, we propose to replace image attribute labels with feature extractors. First, a novel aesthetic attribute evaluation framework based on attribute features is proposed to predict attribute scores and overall scores. We call it the F2S (attribute features to attribute scores) model. We use networks from different tasks to provide attribute features to our F2S models. Then, we define an aesthetic attribute contribution to describe the role of aesthetic attributes throughout an image and use it with the attribute scores and the overall scores to train our F2S model. Sufficient experiments on publicly available datasets demonstrate that our F2S model achieves comparable performance with those trained on the datasets with fully-annotated aesthetic attribute score labels. Our method makes it feasible to learn meaningful attribute scores for various aesthetic attribute sets in different types of images with only overall aesthetic scores.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Jellyfish: A Large Language Model for Data Preprocessing
Authors:
Haochen Zhang,
Yuyang Dong,
Chuan Xiao,
Masafumi Oyamada
Abstract:
This paper explores the utilization of LLMs for data preprocessing (DP), a crucial step in the data mining pipeline that transforms raw data into a clean format conducive to easy processing. Whereas the use of LLMs has sparked interest in devising universal solutions to DP, recent initiatives in this domain typically rely on GPT APIs, raising inevitable data breach concerns. Unlike these approache…
▽ More
This paper explores the utilization of LLMs for data preprocessing (DP), a crucial step in the data mining pipeline that transforms raw data into a clean format conducive to easy processing. Whereas the use of LLMs has sparked interest in devising universal solutions to DP, recent initiatives in this domain typically rely on GPT APIs, raising inevitable data breach concerns. Unlike these approaches, we consider instruction-tuning local LLMs (7 -- 13B models) as universal DP task solvers that operate on a local, single, and low-priced GPU, ensuring data security and enabling further customization. We select a collection of datasets across four representative DP tasks and construct instruction tuning data using data configuration, knowledge injection, and reasoning data distillation techniques tailored to DP. By tuning Mistral-7B, Llama 3-8B, and OpenOrca-Platypus2-13B, our models, namely, Jellyfish-7B/8B/13B, deliver competitiveness compared to GPT-3.5/4 models and strong generalizability to unseen tasks while barely compromising the base models' abilities in NLP tasks. Meanwhile, Jellyfish offers enhanced reasoning capabilities compared to GPT-3.5.
Our models are available at: https://huggingface.co/NECOUDBFM/Jellyfish .
Our instruction dataset is available at: https://huggingface.co/datasets/NECOUDBFM/Jellyfish-Instruct .
△ Less
Submitted 21 June, 2024; v1 submitted 4 December, 2023;
originally announced December 2023.
-
Dolphins: Multimodal Language Model for Driving
Authors:
Yingzi Ma,
Yulong Cao,
Jiachen Sun,
Marco Pavone,
Chaowei Xiao
Abstract:
The quest for fully autonomous vehicles (AVs) capable of navigating complex real-world scenarios with human-like understanding and responsiveness. In this paper, we introduce Dolphins, a novel vision-language model architected to imbibe human-like abilities as a conversational driving assistant. Dolphins is adept at processing multimodal inputs comprising video (or image) data, text instructions,…
▽ More
The quest for fully autonomous vehicles (AVs) capable of navigating complex real-world scenarios with human-like understanding and responsiveness. In this paper, we introduce Dolphins, a novel vision-language model architected to imbibe human-like abilities as a conversational driving assistant. Dolphins is adept at processing multimodal inputs comprising video (or image) data, text instructions, and historical control signals to generate informed outputs corresponding to the provided instructions. Building upon the open-sourced pretrained Vision-Language Model, OpenFlamingo, we first enhance Dolphins's reasoning capabilities through an innovative Grounded Chain of Thought (GCoT) process. Then we tailored Dolphins to the driving domain by constructing driving-specific instruction data and conducting instruction tuning. Through the utilization of the BDD-X dataset, we designed and consolidated four distinct AV tasks into Dolphins to foster a holistic understanding of intricate driving scenarios. As a result, the distinctive features of Dolphins are characterized into two dimensions: (1) the ability to provide a comprehensive understanding of complex and long-tailed open-world driving scenarios and solve a spectrum of AV tasks, and (2) the emergence of human-like capabilities including gradient-free instant adaptation via in-context learning and error recovery via reflection.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
FBChain: A Blockchain-based Federated Learning Model with Efficiency and Secure Communication
Authors:
Yang Li,
Chunhe Xia,
Wei Liu,
Weidong Zhou,
Chen Chen,
Tianbo Wang
Abstract:
Privacy and security in the parameter transmission process of federated learning are currently among the most prominent concerns. However, there are two thorny problems caused by unprotected communication methods: "parameter-leakage" and "inefficient-communication". This article proposes Blockchain-based Federated Learning (FBChain) model for federated learning parameter communication to overcome…
▽ More
Privacy and security in the parameter transmission process of federated learning are currently among the most prominent concerns. However, there are two thorny problems caused by unprotected communication methods: "parameter-leakage" and "inefficient-communication". This article proposes Blockchain-based Federated Learning (FBChain) model for federated learning parameter communication to overcome the above two problems. First, we utilize the immutability of blockchain to store the global model and hash value of local model parameters in case of tampering during the communication process, protect data privacy by encrypting parameters, and verify data consistency by comparing the hash values of local parameters, thus addressing the "parameter-leakage" problem. Second, the Proof of Weighted Link Speed (PoWLS) consensus algorithm comprehensively selects nodes with the higher weighted link speed to aggregate global model and package blocks, thereby solving the "inefficient-communication" problem. Experimental results demonstrate the effectiveness of our proposed FBChain model and its ability to improve model communication efficiency in federated learning.
△ Less
Submitted 20 November, 2023;
originally announced December 2023.
-
Rigidity and quantitative stability for partially overdetermined problems and capillary CMC hypersurfaces
Authors:
Xiaohan Jia,
Zheng Lu,
Chao Xia,
Xuwen Zhang
Abstract:
In this paper, we first prove a rigidity result for a Serrin-type partially overdetermined problem in the half-space, which gives a characterization of capillary spherical caps by the overdetermined problem. In the second part, we prove quantitative stability results for the Serrin-type partially overdetermined problem, as well as capillary almost constant mean curvature hypersurfaces in the half-…
▽ More
In this paper, we first prove a rigidity result for a Serrin-type partially overdetermined problem in the half-space, which gives a characterization of capillary spherical caps by the overdetermined problem. In the second part, we prove quantitative stability results for the Serrin-type partially overdetermined problem, as well as capillary almost constant mean curvature hypersurfaces in the half-space.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
A characterization of capillary spherical caps by a partially overdetermined problem in a half ball
Authors:
Xiaohan Jia,
Zheng Lu,
Chao Xia,
Xuwen Zhang
Abstract:
In this note, we study a Serrin-type partially overdetermined problem proposed by Guo-Xia (Calc. Var. Partial Differential Equations 58: no. 160, 2019. https://doi.org/10.1007/s00526-019-1603-3, and prove a rigidity result that characterizes capillary spherical caps in a half ball.
In this note, we study a Serrin-type partially overdetermined problem proposed by Guo-Xia (Calc. Var. Partial Differential Equations 58: no. 160, 2019. https://doi.org/10.1007/s00526-019-1603-3, and prove a rigidity result that characterizes capillary spherical caps in a half ball.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
Molecular-type $QQss\bar{s}$ pentaquarks predicted by an extended hidden gauge symmetry approach
Authors:
Zhong-Yu Wang,
Chu-Wen Xiao,
Zhi-Feng Sun,
Xiang Liu
Abstract:
In this work, we investigate the double-heavy molecular pentaquark states with the quark contents $ccss\bar{s}$, $bbss\bar{s}$, and $bcss\bar{s}$ by using the coupled channel approach. The extended local hidden gauge Lagrangians are used to obtain the meson-baryon interactions by exchanging the vector mesons. We predict some candidates for the molecular states with the quantum numbers…
▽ More
In this work, we investigate the double-heavy molecular pentaquark states with the quark contents $ccss\bar{s}$, $bbss\bar{s}$, and $bcss\bar{s}$ by using the coupled channel approach. The extended local hidden gauge Lagrangians are used to obtain the meson-baryon interactions by exchanging the vector mesons. We predict some candidates for the molecular states with the quantum numbers $I(J^{P}) = 0(1/2^{-}, 3/2^{-}, 5/2^{-})$, whose binding energies are of the order of $20-30$ MeV and whose widths are all less than $8$ MeV. These predicted exotic double-heavy molecular pentaquark states may be accessible in future experiments such as LHCb.
△ Less
Submitted 8 February, 2024; v1 submitted 29 November, 2023;
originally announced November 2023.
-
Conditions for Length Generalization in Learning Reasoning Skills
Authors:
Changnan Xiao,
Bing Liu
Abstract:
Reasoning is a fundamental capability of AI agents. Recently, large language models (LLMs) have shown remarkable abilities to perform reasoning tasks. However, numerous evaluations of the reasoning capabilities of LLMs have also showed some limitations. An outstanding limitation is length generalization, meaning that when trained on reasoning problems of smaller lengths or sizes, the resulting mod…
▽ More
Reasoning is a fundamental capability of AI agents. Recently, large language models (LLMs) have shown remarkable abilities to perform reasoning tasks. However, numerous evaluations of the reasoning capabilities of LLMs have also showed some limitations. An outstanding limitation is length generalization, meaning that when trained on reasoning problems of smaller lengths or sizes, the resulting models struggle with problems of larger sizes or lengths. This potentially indicates some theoretical limitations of generalization in learning reasoning skills. These evaluations and their observations motivated us to perform a theoretical study of the length generalization problem. This work focuses on reasoning tasks that can be formulated as Markov dynamic processes (MDPs) and/or directed acyclic graphs (DAGs). It identifies and proves conditions that decide whether the length generalization problem can be solved or not for a reasoning task in a particular representation. Experiments are also conducted to verify the theoretical results.
△ Less
Submitted 6 December, 2023; v1 submitted 21 November, 2023;
originally announced November 2023.
-
High-Ratio Compression for Machine-Generated Data
Authors:
Jiu**g Zhang,
Zhitao Shen,
Shiyu Yang,
Lingkai Meng,
Chuan Xiao,
Wei Jia,
Yue Li,
Qinhui Sun,
Wenjie Zhang,
Xuemin Lin
Abstract:
Machine-generated data is rapidly growing and poses challenges for data-intensive systems, especially as the growth of data outpaces the growth of storage space. To cope with the storage issue, compression plays a critical role in storage engines, particularly for data-intensive applications, where high compression ratios and efficient random access are essential. However, existing compression tec…
▽ More
Machine-generated data is rapidly growing and poses challenges for data-intensive systems, especially as the growth of data outpaces the growth of storage space. To cope with the storage issue, compression plays a critical role in storage engines, particularly for data-intensive applications, where high compression ratios and efficient random access are essential. However, existing compression techniques tend to focus on general-purpose and data block approaches, but overlook the inherent structure of machine-generated data and hence result in low-compression ratios or limited lookup efficiency. To address these limitations, we introduce the Pattern-Based Compression (PBC) algorithm, which specifically targets patterns in machine-generated data to achieve Pareto-optimality in most cases. Unlike traditional data block-based methods, PBC compresses data on a per-record basis, facilitating rapid random access. Our experimental evaluation demonstrates that PBC, on average, achieves a compression ratio twice as high as state-of-the-art techniques while maintaining competitive compression and decompression speeds.We also integrate PBC to a production database system and achieve improvement on both comparison ratio and throughput.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.
-
CMFDFormer: Transformer-based Copy-Move Forgery Detection with Continual Learning
Authors:
Yaqi Liu,
Chao Xia,
Song Xiao,
Qingxiao Guan,
Wenqian Dong,
Yifan Zhang,
Nenghai Yu
Abstract:
Copy-move forgery detection aims at detecting duplicated regions in a suspected forged image, and deep learning based copy-move forgery detection methods are in the ascendant. These deep learning based methods heavily rely on synthetic training data, and the performance will degrade when facing new tasks. In this paper, we propose a Transformer-style copy-move forgery detection network named as CM…
▽ More
Copy-move forgery detection aims at detecting duplicated regions in a suspected forged image, and deep learning based copy-move forgery detection methods are in the ascendant. These deep learning based methods heavily rely on synthetic training data, and the performance will degrade when facing new tasks. In this paper, we propose a Transformer-style copy-move forgery detection network named as CMFDFormer, and provide a novel PCSD (Pooled Cube and Strip Distillation) continual learning framework to help CMFDFormer handle new tasks. CMFDFormer consists of a MiT (Mix Transformer) backbone network and a PHD (Pluggable Hybrid Decoder) mask prediction network. The MiT backbone network is a Transformer-style network which is adopted on the basis of comprehensive analyses with CNN-style and MLP-style backbones. The PHD network is constructed based on self-correlation computation, hierarchical feature integration, a multi-scale cycle fully-connected block and a mask reconstruction block. The PHD network is applicable to feature extractors of different styles for hierarchical multi-scale information extraction, achieving comparable performance. Last but not least, we propose a PCSD continual learning framework to improve the forgery detectability and avoid catastrophic forgetting when handling new tasks. Our continual learning framework restricts intermediate features from the PHD network, and takes advantage of both cube pooling and strip pooling. Extensive experiments on publicly available datasets demonstrate the good performance of CMFDFormer and the effectiveness of the PCSD continual learning framework.
△ Less
Submitted 10 March, 2024; v1 submitted 22 November, 2023;
originally announced November 2023.
-
Provable Representation with Efficient Planning for Partial Observable Reinforcement Learning
Authors:
Hongming Zhang,
Tongzheng Ren,
Chenjun Xiao,
Dale Schuurmans,
Bo Dai
Abstract:
In most real-world reinforcement learning applications, state information is only partially observable, which breaks the Markov decision process assumption and leads to inferior performance for algorithms that conflate observations with state. Partially Observable Markov Decision Processes (POMDPs), on the other hand, provide a general framework that allows for partial observability to be accounte…
▽ More
In most real-world reinforcement learning applications, state information is only partially observable, which breaks the Markov decision process assumption and leads to inferior performance for algorithms that conflate observations with state. Partially Observable Markov Decision Processes (POMDPs), on the other hand, provide a general framework that allows for partial observability to be accounted for in learning, exploration and planning, but presents significant computational and statistical challenges. To address these difficulties, we develop a representation-based perspective that leads to a coherent framework and tractable algorithmic approach for practical reinforcement learning from partial observations. We provide a theoretical analysis for justifying the statistical efficiency of the proposed algorithm, and also empirically demonstrate the proposed algorithm can surpass state-of-the-art performance with partial observations across various benchmarks, advancing reliable reinforcement learning towards more practical applications.
△ Less
Submitted 10 June, 2024; v1 submitted 20 November, 2023;
originally announced November 2023.
-
Interlayer electric multipoles induced by in-plane field from quantum geometric origins
Authors:
Huiyuan Zheng,
Dawei Zhai,
Cong Xiao,
Wang Yao
Abstract:
We show that interlayer charge transfer in 2D materials can be driven by an in-plane electric field, giving rise to electrical multipole generation in linear and second order of in-plane field. The linear and nonlinear effects have quantum geometric origins in the Berry curvature and quantum metric respectively, defined in extended parameter spaces characteristic of layered materials. We elucidate…
▽ More
We show that interlayer charge transfer in 2D materials can be driven by an in-plane electric field, giving rise to electrical multipole generation in linear and second order of in-plane field. The linear and nonlinear effects have quantum geometric origins in the Berry curvature and quantum metric respectively, defined in extended parameter spaces characteristic of layered materials. We elucidate their symmetry characters, and demonstrate sizable dipole and quadrupole polarizations respectively in twisted bilayers and trilayers of transition metal dichalcogenides. Furthermore, we show that the effect is strongly enhanced during the topological phase transition tuned by interlayer translation. The effects point to a new electric control on layer quantum degree of freedom.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Stable $(r+1)$-th capillary hypersurfaces
Authors:
**yu Guo,
Haizhong Li,
Chao Xia
Abstract:
In this paper, we propose a new definition of stable $(r+1)$-th capillary hypersurfaces from variational perspective for any $1\leq r\leq n-1$. More precisely, we define stable $(r+1)$-th capillary hypersurfaces to be smooth local minimizers of a new energy functional under volume-preserving and contact angle-preserving variations. Using the new concept of the stable $(r+1)$-th capillary hypersurf…
▽ More
In this paper, we propose a new definition of stable $(r+1)$-th capillary hypersurfaces from variational perspective for any $1\leq r\leq n-1$. More precisely, we define stable $(r+1)$-th capillary hypersurfaces to be smooth local minimizers of a new energy functional under volume-preserving and contact angle-preserving variations. Using the new concept of the stable $(r+1)$-th capillary hypersurfaces, we generalize the stability results of Souam \cite{Souam} in an Euclidean half-space and Guo-Wang-Xia \cite{GWX} in a horoball in hyperbolic space for capillary hypersurface to $(r+1)$-th capillary hypersurface case.
△ Less
Submitted 19 November, 2023;
originally announced November 2023.
-
Cognitive Overload: Jailbreaking Large Language Models with Overloaded Logical Thinking
Authors:
Nan Xu,
Fei Wang,
Ben Zhou,
Bang Zheng Li,
Chaowei Xiao,
Muhao Chen
Abstract:
While large language models (LLMs) have demonstrated increasing power, they have also given rise to a wide range of harmful behaviors. As representatives, jailbreak attacks can provoke harmful or unethical responses from LLMs, even after safety alignment. In this paper, we investigate a novel category of jailbreak attacks specifically designed to target the cognitive structure and processes of LLM…
▽ More
While large language models (LLMs) have demonstrated increasing power, they have also given rise to a wide range of harmful behaviors. As representatives, jailbreak attacks can provoke harmful or unethical responses from LLMs, even after safety alignment. In this paper, we investigate a novel category of jailbreak attacks specifically designed to target the cognitive structure and processes of LLMs. Specifically, we analyze the safety vulnerability of LLMs in the face of (1) multilingual cognitive overload, (2) veiled expression, and (3) effect-to-cause reasoning. Different from previous jailbreak attacks, our proposed cognitive overload is a black-box attack with no need for knowledge of model architecture or access to model weights. Experiments conducted on AdvBench and MasterKey reveal that various LLMs, including both popular open-source model Llama 2 and the proprietary model ChatGPT, can be compromised through cognitive overload. Motivated by cognitive psychology work on managing cognitive load, we further investigate defending cognitive overload attack from two perspectives. Empirical studies show that our cognitive overload from three perspectives can jailbreak all studied LLMs successfully, while existing defense strategies can hardly mitigate the caused malicious uses effectively.
△ Less
Submitted 29 February, 2024; v1 submitted 16 November, 2023;
originally announced November 2023.
-
Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations
Authors:
Wenjie Mo,
Jiashu Xu,
Qin Liu,
Jiongxiao Wang,
Jun Yan,
Chaowei Xiao,
Muhao Chen
Abstract:
Existing studies in backdoor defense have predominantly focused on the training phase, overlooking the critical aspect of testing time defense. This gap becomes particularly pronounced in the context of Large Language Models (LLMs) deployed as Web Services, which typically offer only black-box access, rendering training-time defenses impractical. To bridge this gap, our work introduces defensive d…
▽ More
Existing studies in backdoor defense have predominantly focused on the training phase, overlooking the critical aspect of testing time defense. This gap becomes particularly pronounced in the context of Large Language Models (LLMs) deployed as Web Services, which typically offer only black-box access, rendering training-time defenses impractical. To bridge this gap, our work introduces defensive demonstrations, an innovative backdoor defense strategy for blackbox large language models. Our method involves identifying the task and retrieving task-relevant demonstrations from an uncontaminated pool. These demonstrations are then combined with user queries and presented to the model during testing, without requiring any modifications/tuning to the black-box model or insights into its internal mechanisms. Defensive demonstrations are designed to counteract the adverse effects of triggers, aiming to recalibrate and correct the behavior of poisoned models during test-time evaluations. Extensive experiments show that defensive demonstrations are effective in defending both instance-level and instruction-level backdoor attacks, not only rectifying the behavior of poisoned models but also surpassing existing baselines in most scenarios.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with Human Feedback in Large Language Models
Authors:
Jiongxiao Wang,
Junlin Wu,
Muhao Chen,
Yevgeniy Vorobeychik,
Chaowei Xiao
Abstract:
Reinforcement Learning with Human Feedback (RLHF) is a methodology designed to align Large Language Models (LLMs) with human preferences, playing an important role in LLMs alignment. Despite its advantages, RLHF relies on human annotators to rank the text, which can introduce potential security vulnerabilities if any adversarial annotator (i.e., attackers) manipulates the ranking score by up-ranki…
▽ More
Reinforcement Learning with Human Feedback (RLHF) is a methodology designed to align Large Language Models (LLMs) with human preferences, playing an important role in LLMs alignment. Despite its advantages, RLHF relies on human annotators to rank the text, which can introduce potential security vulnerabilities if any adversarial annotator (i.e., attackers) manipulates the ranking score by up-ranking any malicious text to steer the LLM adversarially. To assess the red-teaming of RLHF against human preference data poisoning, we propose RankPoison, a poisoning attack method on candidates' selection of preference rank flip** to reach certain malicious behaviors (e.g., generating longer sequences, which can increase the computational cost). With poisoned dataset generated by RankPoison, we can perform poisoning attacks on LLMs to generate longer tokens without hurting the original safety alignment performance. Moreover, applying RankPoison, we also successfully implement a backdoor attack where LLMs can generate longer answers under questions with the trigger word. Our findings highlight critical security challenges in RLHF, underscoring the necessity for more robust alignment methods for LLMs.
△ Less
Submitted 19 June, 2024; v1 submitted 16 November, 2023;
originally announced November 2023.
-
Role of crystal-field-splitting and longe-range-hop**s on superconducting pairing symmetry of La$_3$Ni$_2$O$_7$
Authors:
Hongquan Liu,
Chengliang Xia,
Shengjie Zhou,
Hanghui Chen
Abstract:
We study the bilayer two-orbital model for superconducting pairing symmetry of La$_3$Ni$_2$O$_7$ under pressure. By combining density-functional-theory (DFT), maximally-localized-Wannier-function, and linearized Eliashberg equation with random-phase-approximation, we find that the superconducting pairing symmetry of La$_3$Ni$_2$O$_7$ is robustly $d_{xy}$ if its DFT band structure is accurately rep…
▽ More
We study the bilayer two-orbital model for superconducting pairing symmetry of La$_3$Ni$_2$O$_7$ under pressure. By combining density-functional-theory (DFT), maximally-localized-Wannier-function, and linearized Eliashberg equation with random-phase-approximation, we find that the superconducting pairing symmetry of La$_3$Ni$_2$O$_7$ is robustly $d_{xy}$ if its DFT band structure is accurately reproduced in the downfolded model. We further show that fine-tuning of crystal-field-splitting between two Ni-$e_g$ orbitals qualitatively affects superconducting pairing symmetry of the bilayer two-orbital model, which changes from $d_{xy}$ to $s_{\pm}$ as the crystal-field-splitting exceeds a critical value. When the model only includes nearest-neighbor and second-nearest-neighbor hop**s, the crystal-field-splitting obtained by fitting to the DFT band structure is larger than the critical value and thus leads to $s_{\pm}$ superconducting pairing symmetry. When all nonzero long-range-hop**s are also included in the model, the fitted crystal-field-splitting is reduced and smaller than the critical value, which makes $d_{xy}$ superconducting pairing symmetry more favorable than $s_{\pm}$ symmetry. Our work demonstrates that in downfolded effective models, the details of band structure can play a crucial role in determining pairing symmetry in multi-orbital unconventional superconductors (such as La$_3$Ni$_2$O$_7$).
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
BClean: A Bayesian Data Cleaning System
Authors:
Jianbin Qin,
Sifan Huang,
Yaoshu Wang,
**g Zhu,
Yifan Zhang,
Yukai Miao,
Rui Mao,
Makoto Onizuka,
Chuan Xiao
Abstract:
There is a considerable body of work on data cleaning which employs various principles to rectify erroneous data and transform a dirty dataset into a cleaner one. One of prevalent approaches is probabilistic methods, including Bayesian methods. However, existing probabilistic methods often assume a simplistic distribution (e.g., Gaussian distribution), which is frequently underfitted in practice,…
▽ More
There is a considerable body of work on data cleaning which employs various principles to rectify erroneous data and transform a dirty dataset into a cleaner one. One of prevalent approaches is probabilistic methods, including Bayesian methods. However, existing probabilistic methods often assume a simplistic distribution (e.g., Gaussian distribution), which is frequently underfitted in practice, or they necessitate experts to provide a complex prior distribution (e.g., via a programming language). This requirement is both labor-intensive and costly, rendering these methods less suitable for real-world applications. In this paper, we propose BClean, a Bayesian Cleaning system that features automatic Bayesian network construction and user interaction. We recast the data cleaning problem as a Bayesian inference that fully exploits the relationships between attributes in the observed dataset and any prior information provided by users. To this end, we present an automatic Bayesian network construction method that extends a structure learning-based functional dependency discovery method with similarity functions to capture the relationships between attributes. Furthermore, our system allows users to modify the generated Bayesian network in order to specify prior information or correct inaccuracies identified by the automatic generation process. We also design an effective scoring model (called the compensative scoring model) necessary for the Bayesian inference. To enhance the efficiency of data cleaning, we propose several approximation strategies for the Bayesian inference, including graph partitioning, domain pruning, and pre-detection. By evaluating on both real-world and synthetic datasets, we demonstrate that BClean is capable of achieving an F-measure of up to 0.9 in data cleaning, outperforming existing Bayesian methods by 2% and other data cleaning methods by 15%.
△ Less
Submitted 11 November, 2023;
originally announced November 2023.
-
Smart Agent-Based Modeling: On the Use of Large Language Models in Computer Simulations
Authors:
Zengqing Wu,
Run Peng,
Xu Han,
Shuyuan Zheng,
Yixin Zhang,
Chuan Xiao
Abstract:
Computer simulations offer a robust toolset for exploring complex systems across various disciplines. A particularly impactful approach within this realm is Agent-Based Modeling (ABM), which harnesses the interactions of individual agents to emulate intricate system dynamics. ABM's strength lies in its bottom-up methodology, illuminating emergent phenomena by modeling the behaviors of individual c…
▽ More
Computer simulations offer a robust toolset for exploring complex systems across various disciplines. A particularly impactful approach within this realm is Agent-Based Modeling (ABM), which harnesses the interactions of individual agents to emulate intricate system dynamics. ABM's strength lies in its bottom-up methodology, illuminating emergent phenomena by modeling the behaviors of individual components of a system. Yet, ABM has its own set of challenges, notably its struggle with modeling natural language instructions and common sense in mathematical equations or rules. This paper seeks to transcend these boundaries by integrating Large Language Models (LLMs) like GPT into ABM. This amalgamation gives birth to a novel framework, Smart Agent-Based Modeling (SABM). Building upon the concept of smart agents -- entities characterized by their intelligence, adaptability, and computation ability -- we explore in the direction of utilizing LLM-powered agents to simulate real-world scenarios with increased nuance and realism. In this comprehensive exploration, we elucidate the state of the art of ABM, introduce SABM's potential and methodology, and present three case studies (source codes available at https://github.com/Roihn/SABM), demonstrating the SABM methodology and validating its effectiveness in modeling real-world systems. Furthermore, we cast a vision towards several aspects of the future of SABM, anticipating a broader horizon for its applications. Through this endeavor, we aspire to redefine the boundaries of computer simulations, enabling a more profound understanding of complex systems.
△ Less
Submitted 14 December, 2023; v1 submitted 10 November, 2023;
originally announced November 2023.
-
Vortex-Antivortex Lattices in a Holographic Superconductor
Authors:
Jia-Hao Su,
Chuan-Yin Xia,
Wei-Can Yang,
Hua-Bi Zeng
Abstract:
We employ the Einstein-Abelian-Higgs theory to investigate the structure of vortex-antivortex lattices within a superconductor driven by spatial periodic magnetic fields. By adjusting the parameters of the external magnetic field, including the period ($\mathcal{T}$) and the amplitude ($B_0$), various distinct vortex states emerge. These states encompass the Wigner crystallization state, the vorte…
▽ More
We employ the Einstein-Abelian-Higgs theory to investigate the structure of vortex-antivortex lattices within a superconductor driven by spatial periodic magnetic fields. By adjusting the parameters of the external magnetic field, including the period ($\mathcal{T}$) and the amplitude ($B_0$), various distinct vortex states emerge. These states encompass the Wigner crystallization state, the vortex cluster state, and the suppressed state. Additionally, we present a comprehensive phase diagram to demarcate the specific regions where these structures emerge, contributing to our understanding of superconductivity in complex magnetic environments.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
Scaling Law for Time-Reversal-Odd Nonlinear Transport
Authors:
Yue-Xin Huang,
Cong Xiao,
Shengyuan A. Yang,
Xiao Li
Abstract:
Time-reversal-odd ($\mathcal{T}$-odd) nonlinear current response has been theoretically proposed and experimentally confirmed recently. However, the role of disorder scattering in the response, especially whether it contributes to the $σ_{xx}$-independent term, has not been clarified. In this work, we derive a general scaling law for this effect, which accounts for multiple scattering sources. We…
▽ More
Time-reversal-odd ($\mathcal{T}$-odd) nonlinear current response has been theoretically proposed and experimentally confirmed recently. However, the role of disorder scattering in the response, especially whether it contributes to the $σ_{xx}$-independent term, has not been clarified. In this work, we derive a general scaling law for this effect, which accounts for multiple scattering sources. We show that the nonlinear conductivity is generally a quartic function in $σ_{xx}$. Besides intrinsic contribution, extrinsic contributions from scattering also enter the zeroth order term, and their values can be comparable to or even larger than the intrinsic one. Terms beyond zeroth order are all extrinsic. Cubic and quartic terms must involve skew scattering and they signal competition between at least two scattering sources. The behavior of zeroth order extrinsic terms is explicitly demonstrated in a Dirac model. Our finding reveals the significant role of disorder scattering in $\mathcal{T}$-odd nonlinear transport, and establishes a foundation for analyzing experimental result.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
Heintze-Karcher inequality for anisotropic free boundary hypersurfaces in convex domains
Authors:
Xiaohan Jia,
Guofang Wang,
Chao Xia,
Xuwen Zhang
Abstract:
In this paper, we prove an optimal Heintze-Karcher-type inequality for anisotropic free boundary hypersurfaces in general convex domains. The equality is achieved for anisotropic free boundary Wulff shapes in a convex cone. As applications, we prove various Alexandrov-type theorems.
In this paper, we prove an optimal Heintze-Karcher-type inequality for anisotropic free boundary hypersurfaces in general convex domains. The equality is achieved for anisotropic free boundary Wulff shapes in a convex cone. As applications, we prove various Alexandrov-type theorems.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
Rethinking Decision Transformer via Hierarchical Reinforcement Learning
Authors:
Yi Ma,
Chenjun Xiao,
Hebin Liang,
Jianye Hao
Abstract:
Decision Transformer (DT) is an innovative algorithm leveraging recent advances of the transformer architecture in reinforcement learning (RL). However, a notable limitation of DT is its reliance on recalling trajectories from datasets, losing the capability to seamlessly stitch sub-optimal trajectories together. In this work we introduce a general sequence modeling framework for studying sequenti…
▽ More
Decision Transformer (DT) is an innovative algorithm leveraging recent advances of the transformer architecture in reinforcement learning (RL). However, a notable limitation of DT is its reliance on recalling trajectories from datasets, losing the capability to seamlessly stitch sub-optimal trajectories together. In this work we introduce a general sequence modeling framework for studying sequential decision making through the lens of Hierarchical RL. At the time of making decisions, a high-level policy first proposes an ideal prompt for the current state, a low-level policy subsequently generates an action conditioned on the given prompt. We show DT emerges as a special case of this framework with certain choices of high-level and low-level policies, and discuss the potential failure of these choices. Inspired by these observations, we study how to jointly optimize the high-level and low-level policies to enable the stitching ability, which further leads to the development of new offline RL algorithms. Our empirical results clearly show that the proposed algorithms significantly surpass DT on several control and navigation benchmarks. We hope our contributions can inspire the integration of transformer architectures within the field of RL.
△ Less
Submitted 31 October, 2023;
originally announced November 2023.
-
Data-driven Modeling of a Coronal Magnetic Flux Rope: from Birth to Death
Authors:
J. H. Guo,
Y. W. Ni,
Y. Guo,
C. Xia,
B. Schmieder,
S. Poedts,
Z. Zhong,
Y. H. Zhou,
F. Yu,
P. F. Chen
Abstract:
Magnetic flux ropes are a bundle of twisted magnetic field lines produced by internal electric currents, which are responsible for solar eruptions and are the major drivers of geomagnetic storms. As such, it is crucial to develop a numerical model that can capture the entire evolution of a flux rope, from its birth to death, in order to predict whether adverse space weather events might occur or n…
▽ More
Magnetic flux ropes are a bundle of twisted magnetic field lines produced by internal electric currents, which are responsible for solar eruptions and are the major drivers of geomagnetic storms. As such, it is crucial to develop a numerical model that can capture the entire evolution of a flux rope, from its birth to death, in order to predict whether adverse space weather events might occur or not. In this paper, we develop a data-driven modeling that combines a time-dependent magneto-frictional approach with a thermodynamic magnetohydrodynamic model. Our numerical modeling successfully reproduces the formation and confined eruption of an observed flux rope, and unveils the physical details behind the observations. Regarding the long-term evolution of the active region, our simulation results indicate that the flux cancellation due to collisional shearing plays a critical role in the formation of the flux rope, corresponding to a substantial increase in magnetic free energy and helicity. Regarding the eruption stage, the deformation of the flux rope during its eruption can cause an increase in the downward tension force, which suppresses it from further rising. This finding may shed light on why some torus-unstable flux ropes lead to failed eruptions after large-angle rotations. Moreover, we find that twisted fluxes can accumulate during the confined eruptions, which would breed the subsequent eruptive flares.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Length is a Curse and a Blessing for Document-level Semantics
Authors:
Chenghao Xiao,
Yizhi Li,
G Thomas Hudson,
Chenghua Lin,
Noura Al Moubayed
Abstract:
In recent years, contrastive learning (CL) has been extensively utilized to recover sentence and document-level encoding capability from pre-trained language models. In this work, we question the length generalizability of CL-based models, i.e., their vulnerability towards length-induced semantic shift. We verify not only that length vulnerability is a significant yet overlooked research gap, but…
▽ More
In recent years, contrastive learning (CL) has been extensively utilized to recover sentence and document-level encoding capability from pre-trained language models. In this work, we question the length generalizability of CL-based models, i.e., their vulnerability towards length-induced semantic shift. We verify not only that length vulnerability is a significant yet overlooked research gap, but we can devise unsupervised CL methods solely depending on the semantic signal provided by document length. We first derive the theoretical foundations underlying length attacks, showing that elongating a document would intensify the high intra-document similarity that is already brought by CL. Moreover, we found that isotropy promised by CL is highly dependent on the length range of text exposed in training. Inspired by these findings, we introduce a simple yet universal document representation learning framework, LA(SER)$^{3}$: length-agnostic self-reference for semantically robust sentence representation learning, achieving state-of-the-art unsupervised performance on the standard information retrieval benchmark.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Variator: Accelerating Pre-trained Models with Plug-and-Play Compression Modules
Authors:
Chaojun Xiao,
Yuqi Luo,
Wenbin Zhang,
Pengle Zhang,
Xu Han,
Yankai Lin,
Zhengyan Zhang,
Ruobing Xie,
Zhiyuan Liu,
Maosong Sun,
Jie Zhou
Abstract:
Pre-trained language models (PLMs) have achieved remarkable results on NLP tasks but at the expense of huge parameter sizes and the consequent computational costs. In this paper, we propose Variator, a parameter-efficient acceleration method that enhances computational efficiency through plug-and-play compression plugins. Compression plugins are designed to reduce the sequence length via compressi…
▽ More
Pre-trained language models (PLMs) have achieved remarkable results on NLP tasks but at the expense of huge parameter sizes and the consequent computational costs. In this paper, we propose Variator, a parameter-efficient acceleration method that enhances computational efficiency through plug-and-play compression plugins. Compression plugins are designed to reduce the sequence length via compressing multiple hidden vectors into one and trained with original PLMs frozen. Different from traditional model acceleration methods, which compress PLMs to smaller sizes, Variator offers two distinct advantages: (1) In real-world applications, the plug-and-play nature of our compression plugins enables dynamic selection of different compression plugins with varying acceleration ratios based on the current workload. (2) The compression plugin comprises a few compact neural network layers with minimal parameters, significantly saving storage and memory overhead, particularly in scenarios with a growing number of tasks. We validate the effectiveness of Variator on seven datasets. Experimental results show that Variator can save 53% computational costs using only 0.9% additional parameters with a performance drop of less than 2%. Moreover, when the model scales to billions of parameters, Variator matches the strong performance of uncompressed PLMs.
△ Less
Submitted 20 February, 2024; v1 submitted 24 October, 2023;
originally announced October 2023.
-
MUSER: A Multi-View Similar Case Retrieval Dataset
Authors:
Qingquan Li,
Yiran Hu,
Feng Yao,
Chaojun Xiao,
Zhiyuan Liu,
Maosong Sun,
Weixing Shen
Abstract:
Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness. However, existing SCR datasets only focus on the fact description section when judging the similarity between cases, ignoring other valuable sections (e.g., the court's opinion) that can provide insightful reasoning process behind. Furthermore, the case similarities are t…
▽ More
Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness. However, existing SCR datasets only focus on the fact description section when judging the similarity between cases, ignoring other valuable sections (e.g., the court's opinion) that can provide insightful reasoning process behind. Furthermore, the case similarities are typically measured solely by the textual semantics of the fact descriptions, which may fail to capture the full complexity of legal cases from the perspective of legal knowledge. In this work, we present MUSER, a similar case retrieval dataset based on multi-view similarity measurement and comprehensive legal element with sentence-level legal element annotations. Specifically, we select three perspectives (legal fact, dispute focus, and law statutory) and build a comprehensive and structured label schema of legal elements for each of them, to enable accurate and knowledgeable evaluation of case similarities. The constructed dataset originates from Chinese civil cases and contains 100 query cases and 4,024 candidate cases. We implement several text classification algorithms for legal element prediction and various retrieval methods for retrieving similar cases on MUSER. The experimental results indicate that incorporating legal elements can benefit the performance of SCR models, but further efforts are still required to address the remaining challenges posed by MUSER. The source code and dataset are released at https://github.com/THUlawtech/MUSER.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
HL-DPoS: An Enhanced Anti-Long-Range Attack DPoS Algorithm
Authors:
Yang Li,
Chunhe Xia,
Chunyan Li,
Yuan Zhao,
Chen Chen,
Tianbo Wang
Abstract:
The consensus algorithm is crucial in blockchain for ensuring the validity and security of transactions across the decentralized network. However, achieving consensus among nodes and packaging blocks in blockchain networks is a complex task that requires efficient and secure consensus algorithms. The DPoS consensus algorithm has emerged as a popular choice due to its fast transaction processing an…
▽ More
The consensus algorithm is crucial in blockchain for ensuring the validity and security of transactions across the decentralized network. However, achieving consensus among nodes and packaging blocks in blockchain networks is a complex task that requires efficient and secure consensus algorithms. The DPoS consensus algorithm has emerged as a popular choice due to its fast transaction processing and high throughput. Despite these advantages, the algorithm still suffers from weaknesses such as centralization and vulnerability to long-range attacks, which can compromise the integrity of the blockchain network.
To combat these problems, we developed an Enhanced Anti-Long-Range Attack DPoS algorithm (HL-DPoS). First, we split nodes into pieces to reduce centralization issues while giving witness nodes the power to report and benefit from malicious node's reports, maintaining high efficiency and high security. Second, we propose a validation method in HL-DPoS that compares consensuses transactions with the longest chain to detect long-range attacks. Algorithm analysis and simulation experiment results demonstrate that our HL-DPoS consensus algorithm improves security while achieving better consensus performance.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
Strangelets at finite temperature: nucleon emission rates, interface and shell effects
Authors:
Hao-Song You,
Huai-Min Chen,
Jian-Feng Xu,
Cheng-Jun Xia,
Guang-Xiong Peng,
Ren-Xin Xu
Abstract:
We investigate the properties of strangelets at finite temperature $T$, where an equivparticle model is adopted with both the linear confinement and leading-order perturbative interactions accounted for using density-dependent quark masses. The shell effects are examined by solving the Dirac equations for quarks in the mean-field approximation, which diminish with temperature as the occupation pro…
▽ More
We investigate the properties of strangelets at finite temperature $T$, where an equivparticle model is adopted with both the linear confinement and leading-order perturbative interactions accounted for using density-dependent quark masses. The shell effects are examined by solving the Dirac equations for quarks in the mean-field approximation, which diminish with temperature as the occupation probability of each single-particle levels fixed by the Fermi-Dirac statistics, i.e., shell dampening. Consequently, instead of decreasing with temperature, the surface tension extracted from a liquid-drop formula increases with $T$ until reaching its peak at $T\approx 20$-40 MeV with vanishing shell corrections, where the formula roughly reproduces the free energy per baryon of all strangelets. The curvature term, nevertheless, decreases with $T$ despite the presence of shell effects. The neutron and proton emission rates are fixed microscopically according to the external nucleon gas densities that are in equilibrium with strangelets, which generally increase with $T$ ($\lesssim 50$ MeV) for stable strangelets but decrease for those that are unstable against nucleon emission at $T=0$. The energy, free energy, entropy, charge-to-mass ratio, strangeness per baryon, and root-mean-square radius of $β$-stable strangelets obtained with various parameter sets are presented as well. The results indicated in this work are useful for understanding the products of binary compact star mergers and heavy-ion collisions.
△ Less
Submitted 22 October, 2023;
originally announced October 2023.