-
Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting
Authors:
Tong Ye,
Yangkai Du,
Tengfei Ma,
Lingfei Wu,
Xuhong Zhang,
Shouling Ji,
Wenhai Wang
Abstract:
Large Language Models (LLMs) have exhibited remarkable proficiency in generating code. However, the misuse of LLM-generated (Synthetic) code has prompted concerns within both educational and industrial domains, highlighting the imperative need for the development of synthetic code detectors. Existing methods for detecting LLM-generated content are primarily tailored for general text and often stru…
▽ More
Large Language Models (LLMs) have exhibited remarkable proficiency in generating code. However, the misuse of LLM-generated (Synthetic) code has prompted concerns within both educational and industrial domains, highlighting the imperative need for the development of synthetic code detectors. Existing methods for detecting LLM-generated content are primarily tailored for general text and often struggle with code content due to the distinct grammatical structure of programming languages and massive "low-entropy" tokens. Building upon this, our work proposes a novel zero-shot synthetic code detector based on the similarity between the code and its rewritten variants. Our method relies on the intuition that the differences between the LLM-rewritten and original codes tend to be smaller when the original code is synthetic. We utilize self-supervised contrastive learning to train a code similarity model and assess our approach on two synthetic code detection benchmarks. Our results demonstrate a notable enhancement over existing synthetic content detectors designed for general texts, with an improvement of 20.5% in the APPS benchmark and 29.1% in the MBPP benchmark.
△ Less
Submitted 29 May, 2024; v1 submitted 25 May, 2024;
originally announced May 2024.
-
Deep-PE: A Learning-Based Pose Evaluator for Point Cloud Registration
Authors:
Junjie Gao,
Chongjian Wang,
Zhongjun Ding,
Shuangmin Chen,
Shiqing Xin,
Changhe Tu,
Wen** Wang
Abstract:
In the realm of point cloud registration, the most prevalent pose evaluation approaches are statistics-based, identifying the optimal transformation by maximizing the number of consistent correspondences. However, registration recall decreases significantly when point clouds exhibit a low overlap rate, despite efforts in designing feature descriptors and establishing correspondences. In this paper…
▽ More
In the realm of point cloud registration, the most prevalent pose evaluation approaches are statistics-based, identifying the optimal transformation by maximizing the number of consistent correspondences. However, registration recall decreases significantly when point clouds exhibit a low overlap rate, despite efforts in designing feature descriptors and establishing correspondences. In this paper, we introduce Deep-PE, a lightweight, learning-based pose evaluator designed to enhance the accuracy of pose selection, especially in challenging point cloud scenarios with low overlap. Our network incorporates a Pose-Aware Attention (PAA) module to simulate and learn the alignment status of point clouds under various candidate poses, alongside a Pose Confidence Prediction (PCP) module that predicts the likelihood of successful registration. These two modules facilitate the learning of both local and global alignment priors. Extensive tests across multiple benchmarks confirm the effectiveness of Deep-PE. Notably, on 3DLoMatch with a low overlap rate, Deep-PE significantly outperforms state-of-the-art methods by at least 8% and 11% in registration recall under handcrafted FPFH and learning-based FCGF descriptors, respectively. To the best of our knowledge, this is the first study to utilize deep learning to select the optimal pose without the explicit need for input correspondences.
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
Symmetry breaking of three self-organization rules:A general theory for the origin of complexity
Authors:
Wen-Hao Wu,
Ze-Zheng Li,
Wen-Xu Wang
Abstract:
Complex spatiotemporal patterns in nature significantly challenge reductionism-based modern science. The lack of a paradigm beyond reductionism hinders our understanding of the emergence of complexity. The diversity of countless patterns undermines any notion of universal mechanisms. Here, however, we show that breaking the symmetry of three simple and self-organization rules give rise to nearly a…
▽ More
Complex spatiotemporal patterns in nature significantly challenge reductionism-based modern science. The lack of a paradigm beyond reductionism hinders our understanding of the emergence of complexity. The diversity of countless patterns undermines any notion of universal mechanisms. Here, however, we show that breaking the symmetry of three simple and self-organization rules give rise to nearly all patterns in nature, such as a wide variety of Turing patterns, fractals, spiral, target and plane waves, as well as chaotic patterns. The symmetry breaking is rooted in basic physical quantities, such as positive and negative forces, space, time and bounds. Besides reproducing the hallmarks of complexity, we discover some novel phenomena, such as abrupt percolation of Turing patterns, phase transition between fractals and chaos, chaotic edge in travelling waves, etc. Our asymmetric self-organization theory established a simple and unified framework for the origin of complexity in all fields, and unveiled a deep relationship between the first principles of physics and the complex world
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
The Radical Solution and Computational Complexity
Authors:
Bo** Zheng,
Weiwu Wang
Abstract:
The radical solution of polynomials with rational coefficients is a famous solved problem. This paper found that it is a $\mathbb{NP}$ problem. Furthermore, this paper found that arbitrary $ \mathscr{P} \in \mathbb{P}$ shall have a one-way running graph $G$, and have a corresponding $\mathscr{Q} \in \mathbb{NP}$ which have a two-way running graph $G'$, $G$ and $G'$ is isomorphic, i.e., $G'$ is com…
▽ More
The radical solution of polynomials with rational coefficients is a famous solved problem. This paper found that it is a $\mathbb{NP}$ problem. Furthermore, this paper found that arbitrary $ \mathscr{P} \in \mathbb{P}$ shall have a one-way running graph $G$, and have a corresponding $\mathscr{Q} \in \mathbb{NP}$ which have a two-way running graph $G'$, $G$ and $G'$ is isomorphic, i.e., $G'$ is combined by $G$ and its reverse $G^{-1}$. When $\mathscr{P}$ is an algorithm for solving polynomials, $G^{-1}$ is the radical formula. According to Galois' Theory, a general radical formula does not exist. Therefore, there exists an $\mathbb{NP}$, which does not have a general, deterministic and polynomial time-complexity algorithm, i.e., $\mathbb{P} \neq \mathbb{NP}$. Moreover, this paper pointed out that this theorem actually is an impossible trinity.
△ Less
Submitted 4 May, 2024;
originally announced May 2024.
-
Correlated Charge Density Wave Insulators in Chirally Twisted Triple Bilayer Graphene
Authors:
Wenxuan Wang,
Gengdong Zhou,
Wenlu Lin,
Zuo Feng,
Yijie Wang,
Miao Liang,
Zaizhe Zhang,
Min Wu,
Le Liu,
Kenji Watanabe,
Takashi Taniguchi,
Wei Yang,
Guangyu Zhang,
Kaihui Liu,
**hua Gao,
Yang Liu,
X. C. Xie,
Zhida Song,
Xiaobo Lu
Abstract:
Electrons residing in flat-band system can play a vital role in triggering spectacular phenomenology due to relatively large interactions and spontaneous breaking of different degeneracies. In this work we demonstrate chirally twisted triple bilayer graphene, a new moiré structure formed by three pieces of helically stacked Bernal bilayer graphene, as a highly tunable flat-band system. In addition…
▽ More
Electrons residing in flat-band system can play a vital role in triggering spectacular phenomenology due to relatively large interactions and spontaneous breaking of different degeneracies. In this work we demonstrate chirally twisted triple bilayer graphene, a new moiré structure formed by three pieces of helically stacked Bernal bilayer graphene, as a highly tunable flat-band system. In addition to the correlated insulators showing at integer moiré fillings, commonly attributed to interaction induced symmetry broken isospin flavors in graphene, we observe abundant insulating states at half-integer moiré fillings, suggesting a longer-range interaction and the formation of charge density wave insulators which spontaneously break the moiré translation symmetry. With weak out-of-plane magnetic field applied, as observed half-integer filling states are enhanced and more quarter-integer filling states appear, pointing towards further quadrupling moiré unit cells. The insulating states at fractional fillings combined with Hartree-Fock calculations demonstrate the observation of a new type of correlated charge density wave insulators in graphene and points to a new accessible twist manner engineering correlated moiré electronics.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Dual opposing quadrature-PT symmetry
Authors:
Wencong Wang,
Jacob Kokinda,
Jiazhen Li,
Qing Gu,
Dongmei Liu,
Jianming Wen
Abstract:
Our recent research on type-I quadrature parity-time (PT) symmetry, utilizing an open twin-beam system, not only enables observing genuine quantum photonic PT symmetry amid phase-sensitive amplification (PSA) and loss in the presence of Langevin noise but also reveals additional classical-to-quantum (C2Q) transitions in quadrature and relative-intensity noise fluctuations. In contrast to the previ…
▽ More
Our recent research on type-I quadrature parity-time (PT) symmetry, utilizing an open twin-beam system, not only enables observing genuine quantum photonic PT symmetry amid phase-sensitive amplification (PSA) and loss in the presence of Langevin noise but also reveals additional classical-to-quantum (C2Q) transitions in quadrature and relative-intensity noise fluctuations. In contrast to the previous setup, our exploration of an alternative system assuming no loss involves a type-II PSA-only scheme. This scheme facilitates dual opposing quadrature PT symmetry, offering a comprehensive and complementary comprehension of C2Q transitions and anti-Hermiticity-enhanced quantum sensing. Furthermore, our investigation into the correlation with the Einstein-Podolsky-Rosen criteria uncovers previously unexplored connections between PT symmetry and nonclassicality, as well as quantum entanglement within the continuous-variable framework.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Trajectory-Based Multi-Objective Hyperparameter Optimization for Model Retraining
Authors:
Wenyu Wang,
Zheyi Fan,
Szu Hui Ng
Abstract:
Training machine learning models inherently involves a resource-intensive and noisy iterative learning procedure that allows epoch-wise monitoring of the model performance. However, in multi-objective hyperparameter optimization scenarios, the insights gained from the iterative learning procedure typically remain underutilized. We notice that tracking the model performance across multiple epochs u…
▽ More
Training machine learning models inherently involves a resource-intensive and noisy iterative learning procedure that allows epoch-wise monitoring of the model performance. However, in multi-objective hyperparameter optimization scenarios, the insights gained from the iterative learning procedure typically remain underutilized. We notice that tracking the model performance across multiple epochs under a hyperparameter setting creates a trajectory in the objective space and that trade-offs along the trajectories are often overlooked despite their potential to offer valuable insights to decision-making for model retraining. Therefore, in this study, we propose to enhance the multi-objective hyperparameter optimization problem by having training epochs as an additional decision variable to incorporate trajectory information. Correspondingly, we present a novel trajectory-based multi-objective Bayesian optimization algorithm characterized by two features: 1) an acquisition function that captures the improvement made by the predictive trajectory of any hyperparameter setting and 2) a multi-objective early stop** mechanism that determines when to terminate the trajectory to maximize epoch efficiency. Numerical experiments on diverse synthetic simulations and hyperparameter tuning benchmarks indicate that our algorithm outperforms the state-of-the-art multi-objective optimizers in both locating better trade-offs and tuning efficiency.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Minimizing UCB: a Better Local Search Strategy in Local Bayesian Optimization
Authors:
Zheyi Fan,
Wenyu Wang,
Szu Hui Ng,
Qingpei Hu
Abstract:
Local Bayesian optimization is a promising practical approach to solve the high dimensional black-box function optimization problem. Among them is the approximated gradient class of methods, which implements a strategy similar to gradient descent. These methods have achieved good experimental results and theoretical guarantees. However, given the distributional properties of the Gaussian processes…
▽ More
Local Bayesian optimization is a promising practical approach to solve the high dimensional black-box function optimization problem. Among them is the approximated gradient class of methods, which implements a strategy similar to gradient descent. These methods have achieved good experimental results and theoretical guarantees. However, given the distributional properties of the Gaussian processes applied on these methods, there may be potential to further exploit the information of the Gaussian processes to facilitate the BO search. In this work, we develop the relationship between the steps of the gradient descent method and one that minimizes the Upper Confidence Bound (UCB), and show that the latter can be a better strategy than direct gradient descent when a Gaussian process is applied as a surrogate. Through this insight, we propose a new local Bayesian optimization algorithm, MinUCB, which replaces the gradient descent step with minimizing UCB in GIBO. We further show that MinUCB maintains a similar convergence rate with GIBO. We then improve the acquisition function of MinUCB further through a look ahead strategy, and obtain a more efficient algorithm LA-MinUCB. We apply our algorithms on different synthetic and real-world functions, and the results show the effectiveness of our method. Our algorithms also illustrate improvements on local search strategies from an upper bound perspective in Bayesian optimization, and provides a new direction for future algorithm design.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Spin chirality engineering induced giant topological Hall effect in a kagome magnet
Authors:
Wei Xia,
Shihao Zhang,
Jian Yuan,
Yurui Wei,
Haonan Wang,
Hong Du,
Xiangqi Liu,
Jiangteng Guo,
Zicheng Tao,
Ke Qu,
Xia Wang,
Xuerong Liu,
Wenbo Wang,
**guang Cheng,
Yulin Chen,
Jianpeng Liu,
Ruidan Zhong,
Xuewen Fu,
Zhenzhong Yang,
Yanfeng Guo
Abstract:
The ferrimagnet TbMn6Sn6 has attracted vast attention, because its pristine Mn kagome lattice with strong spin-orbit coupling and out-of-plane Tb-Mn exchange supports quantum-limit Chern topological magnetism which can be described by the simple spinless Haldane model. We unveil herein that engineering the pristine kagome lattice through partial replacement of Mn by nonmagnetic Cr which tends to c…
▽ More
The ferrimagnet TbMn6Sn6 has attracted vast attention, because its pristine Mn kagome lattice with strong spin-orbit coupling and out-of-plane Tb-Mn exchange supports quantum-limit Chern topological magnetism which can be described by the simple spinless Haldane model. We unveil herein that engineering the pristine kagome lattice through partial replacement of Mn by nonmagnetic Cr which tends to concentrate into the single Mn1 layer in a unit cell breaks the collinear configuration of Mn spins and reduces the D6h point group symmetry to the C2 one. The nearly isolated Tb networks result in easily polarized Tb spins even under a weak magnetic field, and simultaneously, different spin chirality of the Tb-Mn1-Mn1 and Mn1-Mn1-Mn1. Such a peculiar spin structure leads to a plateau-like topological Hall effect with a record resistivity of 19.1 μOhm cm among bulk systems. Our direct visualization of the domain-wall structure and its evolution under external magnetic field fully support the picture, thus highlighting the pivotal role of broken kagome lattice symmetry in generating the peculiar spin chirality in real space. Our results set a paradigm for exploration of exotic properties in kagome topological magnets and would be a proof-of-principle strategy for investigating the correlation between magnetism and exotic topological properties in kagome lattice.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Diffusion Actor-Critic with Entropy Regulator
Authors:
Yinuo Wang,
Likun Wang,
Yuxuan Jiang,
Wenjun Zou,
Tong Liu,
Xujie Song,
Wenxuan Wang,
Liming Xiao,
Jiang Wu,
**gliang Duan,
Shengbo Eben Li
Abstract:
Reinforcement learning (RL) has proven highly effective in addressing complex decision-making and control tasks. However, in most traditional RL algorithms, the policy is typically parameterized as a diagonal Gaussian distribution with learned mean and variance, which constrains their capability to acquire complex policies. In response to this problem, we propose an online RL algorithm termed diff…
▽ More
Reinforcement learning (RL) has proven highly effective in addressing complex decision-making and control tasks. However, in most traditional RL algorithms, the policy is typically parameterized as a diagonal Gaussian distribution with learned mean and variance, which constrains their capability to acquire complex policies. In response to this problem, we propose an online RL algorithm termed diffusion actor-critic with entropy regulator (DACER). This algorithm conceptualizes the reverse process of the diffusion model as a novel policy function and leverages the capability of the diffusion model to fit multimodal distributions, thereby enhancing the representational capacity of the policy. Since the distribution of the diffusion policy lacks an analytical expression, its entropy cannot be determined analytically. To mitigate this, we propose a method to estimate the entropy of the diffusion policy utilizing Gaussian mixture model. Building on the estimated entropy, we can learn a parameter $α$ that modulates the degree of exploration and exploitation. Parameter $α$ will be employed to adaptively regulate the variance of the added noise, which is applied to the action output by the diffusion model. Experimental trials on MuJoCo benchmarks and a multimodal task demonstrate that the DACER algorithm achieves state-of-the-art (SOTA) performance in most MuJoCo control tasks while exhibiting a stronger representational capacity of the diffusion policy.
△ Less
Submitted 15 June, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
NeCGS: Neural Compression for 3D Geometry Sets
Authors:
Siyu Ren,
Junhui Hou,
Wen** Wang
Abstract:
This paper explores the problem of effectively compressing 3D geometry sets containing diverse categories. We make \textit{the first} attempt to tackle this fundamental and challenging problem and propose NeCGS, a neural compression paradigm, which can compress hundreds of detailed and diverse 3D mesh models (~684 MB) by about 900 times (0.76 MB) with high accuracy and preservation of detailed geo…
▽ More
This paper explores the problem of effectively compressing 3D geometry sets containing diverse categories. We make \textit{the first} attempt to tackle this fundamental and challenging problem and propose NeCGS, a neural compression paradigm, which can compress hundreds of detailed and diverse 3D mesh models (~684 MB) by about 900 times (0.76 MB) with high accuracy and preservation of detailed geometric details. Specifically, we first represent each irregular mesh model/shape in a regular representation that implicitly describes the geometry structure of the model using a 4D regular volume, called TSDF-Def volume. Such a regular representation can not only capture local surfaces more effectively but also facilitate the subsequent process. Then we construct a quantization-aware auto-decoder network architecture to regress these 4D volumes, which can summarize the similarity of local geometric structures within a model and across different models for redundancy limination, resulting in more compact representations, including an embedded feature of a smaller size associated with each model and a network parameter set shared by all models. We finally encode the resulting features and network parameters into bitstreams through entropy coding. After decompressing the features and network parameters, we can reconstruct the TSDF-Def volumes, where the 3D surfaces can be extracted through the deformable marching cubes.Extensive experiments and ablation studies demonstrate the significant advantages of our NeCGS over state-of-the-art methods both quantitatively and qualitatively.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Flatten Anything: Unsupervised Neural Surface Parameterization
Authors:
Qijian Zhang,
Junhui Hou,
Wen** Wang,
Ying He
Abstract:
Surface parameterization plays an essential role in numerous computer graphics and geometry processing applications. Traditional parameterization approaches are designed for high-quality meshes laboriously created by specialized 3D modelers, thus unable to meet the processing demand for the current explosion of ordinary 3D data. Moreover, their working mechanisms are typically restricted to certai…
▽ More
Surface parameterization plays an essential role in numerous computer graphics and geometry processing applications. Traditional parameterization approaches are designed for high-quality meshes laboriously created by specialized 3D modelers, thus unable to meet the processing demand for the current explosion of ordinary 3D data. Moreover, their working mechanisms are typically restricted to certain simple topologies, thus relying on cumbersome manual efforts (e.g., surface cutting, part segmentation) for pre-processing. In this paper, we introduce the Flatten Anything Model (FAM), an unsupervised neural architecture to achieve global free-boundary surface parameterization via learning point-wise map**s between 3D points on the target geometric surface and adaptively-deformed UV coordinates within the 2D parameter domain. To mimic the actual physical procedures, we ingeniously construct geometrically-interpretable sub-networks with specific functionalities of surface cutting, UV deforming, unwrap**, and wrap**, which are assembled into a bi-directional cycle map** framework. Compared with previous methods, our FAM directly operates on discrete surface points without utilizing connectivity information, thus significantly reducing the strict requirements for mesh quality and even applicable to unstructured point cloud data. More importantly, our FAM is fully-automated without the need for pre-cutting and can deal with highly-complex topologies, since its learning process adaptively finds reasonable cutting seams and UV boundaries. Extensive experiments demonstrate the universality, superiority, and inspiring potential of our proposed neural surface parameterization paradigm. The code will be publicly available.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling
Authors:
Shuaipeng Li,
Penghao Zhao,
Hailin Zhang,
Xingwu Sun,
Hao Wu,
Dian Jiao,
Weiyan Wang,
Chengjun Liu,
Zheng Fang,
**bao Xue,
Yangyu Tao,
Bin Cui,
Di Wang
Abstract:
In current deep learning tasks, Adam style optimizers such as Adam, Adagrad, RMSProp, Adafactor, and Lion have been widely used as alternatives to SGD style optimizers. These optimizers typically update model parameters using the sign of gradients, resulting in more stable convergence curves. The learning rate and the batch size are the most critical hyperparameters for optimizers, which require c…
▽ More
In current deep learning tasks, Adam style optimizers such as Adam, Adagrad, RMSProp, Adafactor, and Lion have been widely used as alternatives to SGD style optimizers. These optimizers typically update model parameters using the sign of gradients, resulting in more stable convergence curves. The learning rate and the batch size are the most critical hyperparameters for optimizers, which require careful tuning to enable effective convergence. Previous research has shown that the optimal learning rate increases linearly or follows similar rules with batch size for SGD style optimizers. However, this conclusion is not applicable to Adam style optimizers. In this paper, we elucidate the connection between optimal learning rates and batch sizes for Adam style optimizers through both theoretical analysis and extensive experiments. First, we raise the scaling law between batch sizes and optimal learning rates in the sign of gradient case, in which we prove that the optimal learning rate first rises and then falls as the batch size increases. Moreover, the peak value of the surge will gradually move toward the larger batch size as training progresses. Second, we conducted experiments on various CV and NLP tasks and verified the correctness of the scaling law.
△ Less
Submitted 4 June, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
From Text to Pixel: Advancing Long-Context Understanding in MLLMs
Authors:
Yujie Lu,
Xiujun Li,
Tsu-Jui Fu,
Miguel Eckstein,
William Yang Wang
Abstract:
The rapid progress in Multimodal Large Language Models (MLLMs) has significantly advanced their ability to process and understand complex visual and textual information. However, the integration of multiple images and extensive textual contexts remains a challenge due to the inherent limitation of the models' capacity to handle long input sequences efficiently. In this paper, we introduce SEEKER,…
▽ More
The rapid progress in Multimodal Large Language Models (MLLMs) has significantly advanced their ability to process and understand complex visual and textual information. However, the integration of multiple images and extensive textual contexts remains a challenge due to the inherent limitation of the models' capacity to handle long input sequences efficiently. In this paper, we introduce SEEKER, a multimodal large language model designed to tackle this issue. SEEKER aims to optimize the compact encoding of long text by compressing the text sequence into the visual pixel space via images, enabling the model to handle long text within a fixed token-length budget efficiently. Our empirical experiments on six long-context multimodal tasks demonstrate that SEEKER can leverage fewer image tokens to convey the same amount of textual information compared with the OCR-based approach, and is more efficient in understanding long-form multimodal input and generating long-form textual output, outperforming all existing proprietary and open-source MLLMs by large margins.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Giant Acoustic Geometric Spin and Orbital Hall Effect
Authors:
Wei Wang,
Yang Tan,
**g**g Liu,
Bin Liang,
Jianchun Cheng
Abstract:
Acoustic waves in fluid with spin-0 nature have been long believed not to support spin Hall effect and strong orbital Hall effect that enables experimental observation. Here we report the first theoretical explication and experimental demonstration of giant acoustic geometric spin and orbital Hall effect characterized by a large transverse shift. We reveal that this effect occurs when a vortex bea…
▽ More
Acoustic waves in fluid with spin-0 nature have been long believed not to support spin Hall effect and strong orbital Hall effect that enables experimental observation. Here we report the first theoretical explication and experimental demonstration of giant acoustic geometric spin and orbital Hall effect characterized by a large transverse shift. We reveal that this effect occurs when a vortex beam is observed from a tilted reference frame free of wave-interface interactions or gradient-index media needed for observing conventional ones, and can be amplified by simply binding the beam tightly. Thanks to this mechanism, large transverse shifts proportional to angular momentum are observed in a compact system. Our work provides deeper insights into the physics of angular momentum of classic waves.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
S-Eval: Automatic and Adaptive Test Generation for Benchmarking Safety Evaluation of Large Language Models
Authors:
Xiaohan Yuan,
**feng Li,
Dongxia Wang,
Yuefeng Chen,
Xiaofeng Mao,
Longtao Huang,
Hui Xue,
Wenhai Wang,
Kui Ren,
**gyi Wang
Abstract:
Large Language Models have gained considerable attention for their revolutionary capabilities. However, there is also growing concern on their safety implications, making a comprehensive safety evaluation for LLMs urgently needed before model deployment. In this work, we propose S-Eval, a new comprehensive, multi-dimensional and open-ended safety evaluation benchmark. At the core of S-Eval is a no…
▽ More
Large Language Models have gained considerable attention for their revolutionary capabilities. However, there is also growing concern on their safety implications, making a comprehensive safety evaluation for LLMs urgently needed before model deployment. In this work, we propose S-Eval, a new comprehensive, multi-dimensional and open-ended safety evaluation benchmark. At the core of S-Eval is a novel LLM-based automatic test prompt generation and selection framework, which trains an expert testing LLM Mt combined with a range of test selection strategies to automatically construct a high-quality test suite for the safety evaluation. The key to the automation of this process is a novel expert safety-critique LLM Mc able to quantify the riskiness score of an LLM's response, and additionally produce risk tags and explanations. Besides, the generation process is also guided by a carefully designed risk taxonomy with four different levels, covering comprehensive and multi-dimensional safety risks of concern. Based on these, we systematically construct a new and large-scale safety evaluation benchmark for LLMs consisting of 220,000 evaluation prompts, including 20,000 base risk prompts (10,000 in Chinese and 10,000 in English) and 200,000 corresponding attack prompts derived from 10 popular adversarial instruction attacks against LLMs. Moreover, considering the rapid evolution of LLMs and accompanied safety threats, S-Eval can be flexibly configured and adapted to include new risks, attacks and models. S-Eval is extensively evaluated on 20 popular and representative LLMs. The results confirm that S-Eval can better reflect and inform the safety risks of LLMs compared to existing benchmarks. We also explore the impacts of parameter scales, language environments, and decoding parameters on the evaluation, providing a systematic methodology for evaluating the safety of LLMs.
△ Less
Submitted 28 May, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Impact of gauge fixing precision on the continuum limit of non-local quark-bilinear lattice operators
Authors:
Kuan Zhang,
Yi-Kai Huo,
Xiangdong Ji,
Andreas Schaefer,
Chun-Jiang Shi,
Peng Sun,
Wei Wang,
Yi-Bo Yang,
Jian-Hui Zhang
Abstract:
We analyze the gauge fixing precision dependence of some non-local quark-blinear lattice operators interesting in computing parton physics for several measurements, using 5 lattice spacings ranging from 0.032 fm to 0.121 fm. Our results show that gauge dependent non-local measurements are significantly more sensitive to the precision of gauge fixing than anticipated. The impact of imprecise gauge…
▽ More
We analyze the gauge fixing precision dependence of some non-local quark-blinear lattice operators interesting in computing parton physics for several measurements, using 5 lattice spacings ranging from 0.032 fm to 0.121 fm. Our results show that gauge dependent non-local measurements are significantly more sensitive to the precision of gauge fixing than anticipated. The impact of imprecise gauge fixing is significant for fine lattices and long distances. For instance, even with the typically defined precision of Landau gauge fixing of $10^{-8}$, the deviation caused by imprecise gauge fixing can reach 12 percent, when calculating the trace of Wilson lines at 1.2 fm with a lattice spacing of approximately 0.03 fm. Similar behavior has been observed in $ξ$ gauge and Coulomb gauge as well. For both quasi PDFs and quasi TMD-PDFs operators renormalized using the RI/MOM scheme, convergence for different lattice spacings at long distance is only observed when the precision of Landau gauge fixing is sufficiently high. To describe these findings quantitatively, we propose an empirical formula to estimate the required precision.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition
Authors:
Ganggui Ding,
Canyu Zhao,
Wen Wang,
Zhen Yang,
Zide Liu,
Hao Chen,
Chunhua Shen
Abstract:
Benefiting from large-scale pre-trained text-to-image (T2I) generative models, impressive progress has been achieved in customized image generation, which aims to generate user-specified concepts. Existing approaches have extensively focused on single-concept customization and still encounter challenges when it comes to complex scenarios that involve combining multiple concepts. These approaches o…
▽ More
Benefiting from large-scale pre-trained text-to-image (T2I) generative models, impressive progress has been achieved in customized image generation, which aims to generate user-specified concepts. Existing approaches have extensively focused on single-concept customization and still encounter challenges when it comes to complex scenarios that involve combining multiple concepts. These approaches often require retraining/fine-tuning using a few images, leading to time-consuming training processes and impeding their swift implementation. Furthermore, the reliance on multiple images to represent a singular concept increases the difficulty of customization. To this end, we propose FreeCustom, a novel tuning-free method to generate customized images of multi-concept composition based on reference concepts, using only one image per concept as input. Specifically, we introduce a new multi-reference self-attention (MRSA) mechanism and a weighted mask strategy that enables the generated image to access and focus more on the reference concepts. In addition, MRSA leverages our key finding that input concepts are better preserved when providing images with context interactions. Experiments show that our method's produced images are consistent with the given concepts and better aligned with the input text. Our method outperforms or performs on par with other training-based methods in terms of multi-concept composition and single-concept customization, but is simpler. Codes can be found at https://github.com/aim-uofa/FreeCustom.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
NeurCross: A Self-Supervised Neural Approach for Representing Cross Fields in Quad Mesh Generation
Authors:
Qiujie Dong,
Huibiao Wen,
Rui Xu,
Xiaokang Yu,
Jiaran Zhou,
Shuangmin Chen,
Shiqing Xin,
Changhe Tu,
Wen** Wang
Abstract:
Quadrilateral mesh generation plays a crucial role in numerical simulations within Computer-Aided Design and Engineering (CAD/E). The quality of the cross field is essential for generating a quadrilateral mesh. In this paper, we propose a self-supervised neural representation of the cross field, named NeurCross, comprising two modules: one to fit the signed distance function (SDF) and another to p…
▽ More
Quadrilateral mesh generation plays a crucial role in numerical simulations within Computer-Aided Design and Engineering (CAD/E). The quality of the cross field is essential for generating a quadrilateral mesh. In this paper, we propose a self-supervised neural representation of the cross field, named NeurCross, comprising two modules: one to fit the signed distance function (SDF) and another to predict the cross field. Unlike most existing approaches that operate directly on the given polygonal surface, NeurCross takes the SDF as a bridge to allow for SDF overfitting and the prediction of the cross field to proceed simultaneously. By utilizing a neural SDF, we achieve a smooth representation of the base surface, minimizing the impact of piecewise planar discretization and minor surface variations. Moreover, the principal curvatures and directions are fully encoded by the Hessian of the SDF, enabling the regularization of the overall cross field through minor adjustments to the SDF. Compared to state-of-the-art methods, NeurCross significantly improves the placement of singular points and the approximation accuracy between the input triangular surface and the output quad mesh, as demonstrated in the teaser figure.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models
Authors:
Rui Xu,
Jiepeng Wang,
Hao Pan,
Yang Liu,
Xin Tong,
Shiqing Xin,
Changhe Tu,
Taku Komura,
Wen** Wang
Abstract:
In this paper, we study an under-explored but important factor of diffusion generative models, i.e., the combinatorial complexity. Data samples are generally high-dimensional, and for various structured generation tasks, there are additional attributes which are combined to associate with data samples. We show that the space spanned by the combination of dimensions and attributes is insufficiently…
▽ More
In this paper, we study an under-explored but important factor of diffusion generative models, i.e., the combinatorial complexity. Data samples are generally high-dimensional, and for various structured generation tasks, there are additional attributes which are combined to associate with data samples. We show that the space spanned by the combination of dimensions and attributes is insufficiently sampled by existing training scheme of diffusion generative models, causing degraded test time performance. We present a simple fix to this problem by constructing stochastic processes that fully exploit the combinatorial structures, hence the name ComboStoc. Using this simple strategy, we show that network training is significantly accelerated across diverse data modalities, including images and 3D structured shapes. Moreover, ComboStoc enables a new way of test time generation which uses insynchronized time steps for different dimensions and attributes, thus allowing for varying degrees of control over them.
△ Less
Submitted 24 May, 2024; v1 submitted 22 May, 2024;
originally announced May 2024.
-
Exotic d-wave Bose Metal in two dimensions
Authors:
Zhangkai Cao,
Jiahao Su,
Jianyu Li,
Tao Ying,
WanSheng Wang,
**-Hua Sun,
Ho-Kin Tang,
Haiqing Lin
Abstract:
The Landau Fermi liquid theory, a cornerstone in condensed matter physics, encounters limitations in explaining certain phenomena, like the peculiar behavior of strange metals in high-temperature superconductors. Non-Fermi liquids, like Bose metals with uncondensed bosonic ground state, offer potential explanations, yet constructing an elusive Bose metal phase in two dimensions (2D) remains a form…
▽ More
The Landau Fermi liquid theory, a cornerstone in condensed matter physics, encounters limitations in explaining certain phenomena, like the peculiar behavior of strange metals in high-temperature superconductors. Non-Fermi liquids, like Bose metals with uncondensed bosonic ground state, offer potential explanations, yet constructing an elusive Bose metal phase in two dimensions (2D) remains a formidable challenge. Utilizing constraint path quantum Monte Carlo and functional renormalization group methods on a fermionic system with spin anisotropy in a 2D lattice, we reveal the emergence of a Cooper pair Bose metal in a highly anisotropic regime (a < 0.30) with wide range of filling, most notably at a filling fraction of n~0.8. Our findings exhibit a visible nonzero momentum Bose surface in the Cooper-pair distribution function, accompanied by a distinct signal of dxy correlation between pairs. Our results highlight that spin-dependent anisotropy in the Fermi surface leads to versatile pairing forms. Platforms such as ultracold atoms in optical lattices and recently proposed altermagnets hold promise for realizing this intriguing phase.
△ Less
Submitted 24 May, 2024; v1 submitted 22 May, 2024;
originally announced May 2024.
-
The Illusion of Anonymity: Uncovering the Impact of User Actions on Privacy in Web3 Social Ecosystems
Authors:
Bin Wang,
Tianjian Liu,
Wenqi Wang,
Yuan Weng,
Chao Li,
Guangquan Xu,
Meng Shen,
Sencun Zhu,
Wei Wang
Abstract:
The rise of Web3 social ecosystems signifies the dawn of a new chapter in digital interaction, offering significant prospects for user engagement and financial advancement. Nonetheless, this progress is shadowed by potential privacy concessions, especially as these platforms frequently merge with existing Web2.0 social media accounts, amplifying data privacy risks for users.
In this study, we in…
▽ More
The rise of Web3 social ecosystems signifies the dawn of a new chapter in digital interaction, offering significant prospects for user engagement and financial advancement. Nonetheless, this progress is shadowed by potential privacy concessions, especially as these platforms frequently merge with existing Web2.0 social media accounts, amplifying data privacy risks for users.
In this study, we investigate the nuanced dynamics between user engagement on Web3 social platforms and the consequent privacy concerns. We scrutinize the widespread phenomenon of fabricated activities, which encompasses the establishment of bogus accounts aimed at mimicking popularity and the deliberate distortion of social interactions by some individuals to gain financial rewards. Such deceptive maneuvers not only distort the true measure of the active user base but also amplify privacy threats for all members of the user community. We also find that, notwithstanding their attempts to limit social exposure, users remain entangled in privacy vulnerabilities. The actions of those highly engaged users, albeit often a minority group, can inadvertently breach the privacy of the larger collective.
By casting light on the delicate interplay between user engagement, financial motives, and privacy issues, we offer a comprehensive examination of the intrinsic challenges and hazards present in the Web3 social milieu. We highlight the urgent need for more stringent privacy measures and ethical protocols to navigate the complex web of social exchanges and financial ambitions in the rapidly evolving Web3.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Study of the decays $χ_{cJ}\toΛ\barΛω$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using $(27.12\pm 0.14)\times10^{8}$ $ψ(3686)$ events collected with the BESIII detector, we present the first observation of the decays $χ_{cJ}\toΛ\barΛω$, where $J=0, 1, 2$, with statistical significances of $11.7 σ, 11.2 σ$, and $11.8 σ$. The branching fractions of these decays are determined to be $\mathcal{B}(χ_{c0}\toΛ\barΛω)=({2.37 \pm 0.22 \pm 0.23}) \times 10^{-4}$,…
▽ More
Using $(27.12\pm 0.14)\times10^{8}$ $ψ(3686)$ events collected with the BESIII detector, we present the first observation of the decays $χ_{cJ}\toΛ\barΛω$, where $J=0, 1, 2$, with statistical significances of $11.7 σ, 11.2 σ$, and $11.8 σ$. The branching fractions of these decays are determined to be $\mathcal{B}(χ_{c0}\toΛ\barΛω)=({2.37 \pm 0.22 \pm 0.23}) \times 10^{-4}$, $\mathcal{B}(χ_{c1}\toΛ\barΛω)=({1.01 \pm 0.10 \pm 0.11}) \times 10^{-4}$, and $\mathcal{B}(χ_{c2}\toΛ\barΛω)=({1.40 \pm 0.13 \pm 0.17}) \times 10^{-4}$, where the first uncertainties are statistical and the second are systematic. We observe no clear intermediate structures.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Interpretable Spatio-Temporal Embedding for Brain Structural-Effective Network with Ordinary Differential Equation
Authors:
Haoteng Tang,
Guodong Liu,
Siyuan Dai,
Kai Ye,
Kun Zhao,
Wenlu Wang,
Carl Yang,
Lifang He,
Alex Leow,
Paul Thompson,
Heng Huang,
Liang Zhan
Abstract:
The MRI-derived brain network serves as a pivotal instrument in elucidating both the structural and functional aspects of the brain, encompassing the ramifications of diseases and developmental processes. However, prevailing methodologies, often focusing on synchronous BOLD signals from functional MRI (fMRI), may not capture directional influences among brain regions and rarely tackle temporal fun…
▽ More
The MRI-derived brain network serves as a pivotal instrument in elucidating both the structural and functional aspects of the brain, encompassing the ramifications of diseases and developmental processes. However, prevailing methodologies, often focusing on synchronous BOLD signals from functional MRI (fMRI), may not capture directional influences among brain regions and rarely tackle temporal functional dynamics. In this study, we first construct the brain-effective network via the dynamic causal model. Subsequently, we introduce an interpretable graph learning framework termed Spatio-Temporal Embedding ODE (STE-ODE). This framework incorporates specifically designed directed node embedding layers, aiming at capturing the dynamic interplay between structural and effective networks via an ordinary differential equation (ODE) model, which characterizes spatial-temporal brain dynamics. Our framework is validated on several clinical phenotype prediction tasks using two independent publicly available datasets (HCP and OASIS). The experimental results clearly demonstrate the advantages of our model compared to several state-of-the-art methods.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
MAMMOTH-Subaru. II. Diverse Populations of Circumgalactic Ly$α$ Nebulae at Cosmic Noon
Authors:
Mingyu Li,
Haibin Zhang,
Zheng Cai,
Yongming Liang,
Nobunari Kashikawa,
Ke Ma,
Xiaohui Fan,
J. Xavier Prochaska,
Bjorn H. C. Emonts,
Xin Wang,
Yun**g Wu,
Shiwu Zhang,
Qiong Li,
Sean D. Johnson,
Minghao Yue,
Fabrizio Arrigoni Battaia,
Sebastiano Cantalupo,
Joseph F. Hennawi,
Satoshi Kikuta,
Yuanhang Ning,
Masami Ouchi,
Rhythm Shimakawa,
Ben Wang,
Weichen Wang,
Zheng Zheng
, et al. (1 additional authors not shown)
Abstract:
Circumgalactic Lyman-alpha (Ly$α$) nebulae are gaseous halos around galaxies exhibiting luminous extended Ly$α$ emission. This work investigates Ly$α$ nebulae from deep imaging of $\sim12~\mathrm{deg}^2$ sky, targeted by the MAMMOTH-Subaru survey. Utilizing the wide-field capability of Hyper Suprime-Cam (HSC), we present one of the largest blind Ly$α$ nebula selections, including QSO nebulae, Ly…
▽ More
Circumgalactic Lyman-alpha (Ly$α$) nebulae are gaseous halos around galaxies exhibiting luminous extended Ly$α$ emission. This work investigates Ly$α$ nebulae from deep imaging of $\sim12~\mathrm{deg}^2$ sky, targeted by the MAMMOTH-Subaru survey. Utilizing the wide-field capability of Hyper Suprime-Cam (HSC), we present one of the largest blind Ly$α$ nebula selections, including QSO nebulae, Ly$α$ blobs, and radio galaxy nebulae down to typical $2σ$ Ly$α$ surface brightness of $(5-10)\times10^{-18}\mathrm{~erg~s^{-1}~cm^{-2}~arcsec^{-2}}$. The sample contains 117 nebulae with Ly$α$ sizes of 40 - 400 kpc, and the most gigantic one spans about 365 kpc, referred to as the Ivory Nebula. Combining with multiwavelength data, we investigate diverse nebula populations and associated galaxies. We find a small fraction of Ly$α$ nebulae have QSOs ($\sim7\%$), luminous infrared galaxies ($\sim1\%$), and radio galaxies ($\sim 2\%$). Remarkably, among the 28 enormous Ly$α$ nebulae (ELANe) exceeding 100 kpc, about $80\%$ are associated with UV-faint galaxies ($M_\mathrm{UV} > -22$), categorized as Type II ELANe. We underscore that Type II ELANe constitute the majority but remain largely hidden in current galaxy and QSO surveys. Dusty starburst and obscured AGN activity are proposed to explain the nature of Type II ELANe. The SED of stacking all Ly$α$ nebulae also reveals signs of massive dusty star-forming galaxies with obscured AGNs. We propose a model to explain the dusty nature where the diverse populations of Ly$α$ nebula capture massive galaxies at different evolutionary stages undergoing violent assembling. Ly$α$ nebulae provide critical insights into the formation and evolution of today's massive cluster galaxies at cosmic noon.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Search for the lepton-flavor violating decay $B^0_s\toφμ^\pmτ^\mp$
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1062 additional authors not shown)
Abstract:
A search for the lepton-flavor violating decays $B^0_s\toφμ^\pmτ^\mp$ is presented, using a sample of proton-proton collisions at center-of-mass energies of 7, 8, and 13 TeV, collected with the LHCb detector and corresponding to a total integrated luminosity of $9\,\text{fb}^{-1}$. The $τ$ leptons are selected using decays with three charged pions. No significant excess is observed, and an upper l…
▽ More
A search for the lepton-flavor violating decays $B^0_s\toφμ^\pmτ^\mp$ is presented, using a sample of proton-proton collisions at center-of-mass energies of 7, 8, and 13 TeV, collected with the LHCb detector and corresponding to a total integrated luminosity of $9\,\text{fb}^{-1}$. The $τ$ leptons are selected using decays with three charged pions. No significant excess is observed, and an upper limit on the branching fraction is determined to be ${\cal B}( B^0_s\toφμ^\pmτ^\mp) < 1.0\times 10^{-5}$ at 90% confidence level.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
QCRD: Quality-guided Contrastive Rationale Distillation for Large Language Models
Authors:
Wei Wang,
Zhaowei Li,
Qi Xu,
Yiqing Cai,
Hang Song,
Qi Qi,
Ran Zhou,
Zhida Huang,
Tao Wang,
Li Xiao
Abstract:
Deploying large language models (LLMs) poses challenges in terms of resource limitations and inference efficiency. To address these challenges, recent research has focused on using smaller task-specific language models, which are enhanced by distilling the knowledge rationales generated by LLMs. However, previous works mostly emphasize the effectiveness of positive knowledge, while overlooking the…
▽ More
Deploying large language models (LLMs) poses challenges in terms of resource limitations and inference efficiency. To address these challenges, recent research has focused on using smaller task-specific language models, which are enhanced by distilling the knowledge rationales generated by LLMs. However, previous works mostly emphasize the effectiveness of positive knowledge, while overlooking the knowledge noise and the exploration of negative knowledge. In this paper, we first propose a general approach called quality-guided contrastive rationale distillation for reasoning capacity learning, considering contrastive learning perspectives. For the learning of positive knowledge, we collect positive rationales through self-consistency to denoise the LLM rationales generated by temperature sampling. For the negative knowledge distillation, we generate negative rationales using temperature sampling for the iteration-before smaller language models themselves. Finally, a contrastive loss is designed to better distill the positive and negative rationales into the smaller language model, where an online-update discriminator is used to judge the qualities of rationales and assign weights for better optimizing the training process. Through extensive experiments on multiple reasoning tasks, we demonstrate that our method consistently outperforms the previous distillation methods and produces higher-quality rationales.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Precision measurement of the branching fraction of \boldmath $J/ψ\rightarrow K^+K^-$ via $ψ(2S)\rightarrow π^+π^-J/ψ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (604 additional authors not shown)
Abstract:
Using a sample of $448.1 \times 10^6$ $ψ(2S)$ events collected with the BESIII detector, we perform a study of the decay $J/ψ\rightarrow K^+K^-$ via $ψ(2S)\rightarrow π^+π^-J/ψ$.
The branching fraction of $J/ψ\rightarrow K^+K^-$ is determined to be $\mathcal{B}_{K^+K^-}=(3.072\pm 0.023({\rm stat.})\pm 0.050({\rm syst.}))\times 10^{-4}$, which is consistent with previous measurements but with sig…
▽ More
Using a sample of $448.1 \times 10^6$ $ψ(2S)$ events collected with the BESIII detector, we perform a study of the decay $J/ψ\rightarrow K^+K^-$ via $ψ(2S)\rightarrow π^+π^-J/ψ$.
The branching fraction of $J/ψ\rightarrow K^+K^-$ is determined to be $\mathcal{B}_{K^+K^-}=(3.072\pm 0.023({\rm stat.})\pm 0.050({\rm syst.}))\times 10^{-4}$, which is consistent with previous measurements but with significantly improved precision.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Study of $b$-hadron decays to $Λ_c^+ h^- h^{\prime -}$ final states
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1072 additional authors not shown)
Abstract:
Decays of $Ξ_b^-$ and $Ω_b^-$ baryons to $Λ_c^+ h^- h^{\prime -}$ final states, with $h^- h^{\prime -}$ being $π^-π^-$, $K^-π^-$ and $K^-K^-$ meson pairs, are searched for using data collected with the LHCb detector. The data sample studied corresponds to an integrated luminosity of $8.7\,\mathrm{fb}^{-1}$ of $pp$ collisions collected at centre-of-mass energies $\sqrt{s} = 7$, $8$ and…
▽ More
Decays of $Ξ_b^-$ and $Ω_b^-$ baryons to $Λ_c^+ h^- h^{\prime -}$ final states, with $h^- h^{\prime -}$ being $π^-π^-$, $K^-π^-$ and $K^-K^-$ meson pairs, are searched for using data collected with the LHCb detector. The data sample studied corresponds to an integrated luminosity of $8.7\,\mathrm{fb}^{-1}$ of $pp$ collisions collected at centre-of-mass energies $\sqrt{s} = 7$, $8$ and $13\,\mathrm{Te\kern -0.1em V}$. The products of the relative branching fractions and fragmentation fractions for each signal mode, relative to the $B^- \to Λ_c^+ \overline{p} π^-$ mode, are measured, with $Ξ_{b}^- \toΛ_{c}^+ K^- π^-$, $Ξ_{b}^- \toΛ_{c}^+ K^- K^-$ and $Ω_{b}^- \toΛ_{c}^+ K^- K^-$ decays being observed at over $5\,σ$ significance. The $Ξ_{b}^- \toΛ_{c}^+ K^- π^-$ mode is also used to measure the $Ξ_{b}^-$ production asymmetry, which is found to be consistent with zero. In addition, the $B^- \to Λ_{c}^+ \overline{p} K^-$ decay is observed for the first time, and its branching fraction is measured relative to that of the $B^- \to Λ_{c}^+ \overline{p} π^-$ mode.
△ Less
Submitted 22 May, 2024; v1 submitted 21 May, 2024;
originally announced May 2024.
-
Quantifying Emergence in Large Language Models
Authors:
Hang Chen,
Xinyu Yang,
Jiaying Zhu,
Wenya Wang
Abstract:
Emergence, broadly conceptualized as the ``intelligent'' behaviors of LLMs, has recently been studied and proved challenging to quantify due to the lack of a measurable definition. Most commonly, it has been estimated statistically through model performances across extensive datasets and tasks, which consumes significant resources. In addition, such estimation is difficult to interpret and may not…
▽ More
Emergence, broadly conceptualized as the ``intelligent'' behaviors of LLMs, has recently been studied and proved challenging to quantify due to the lack of a measurable definition. Most commonly, it has been estimated statistically through model performances across extensive datasets and tasks, which consumes significant resources. In addition, such estimation is difficult to interpret and may not accurately reflect the models' intrinsic emergence. In this work, we propose a quantifiable solution for estimating emergence. Inspired by emergentism in dynamics, we quantify the strength of emergence by comparing the entropy reduction of the macroscopic (semantic) level with that of the microscopic (token) level, both of which are derived from the representations within the transformer block. Using a low-cost estimator, our quantification method demonstrates consistent behaviors across a suite of LMs (GPT-2, GEMMA, etc.) under both in-context learning (ICL) and natural sentences. Empirical results show that (1) our method gives consistent measurements which align with existing observations based on performance metrics, validating the effectiveness of our emergence quantification; (2) our proposed metric uncovers novel emergence patterns such as the correlations between the variance of our metric and the number of ``shots'' in ICL, which further suggests a new way of interpreting hallucinations in LLMs; (3) we offer a potential solution towards estimating the emergence of larger and closed-resource LMs via smaller LMs like GPT-2. Our codes are available at: https://github.com/Zodiark-ch/Emergence-of-LLMs/.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Correlated insulators and charge density wave states in chirally twisted triple bilayer graphene
Authors:
Geng-Dong Zhou,
Yi-Jie Wang,
Wen-Xuan Wang,
Xiao-Bo Lu,
Zhi-Da Song
Abstract:
Motivated by recent experimental observations of displacement-field-tuned correlated insulators at integer and half-integer fillings in chirally twisted triple bilayer graphene (CTTBG), we study the single-particle and interacting physics of CTTBG. We find that there are two inequivalent stacking orders, {\it i.e.}, ABABBC and ABABAB, and both exhibit flat bands with nontrivial topology. We then u…
▽ More
Motivated by recent experimental observations of displacement-field-tuned correlated insulators at integer and half-integer fillings in chirally twisted triple bilayer graphene (CTTBG), we study the single-particle and interacting physics of CTTBG. We find that there are two inequivalent stacking orders, {\it i.e.}, ABABBC and ABABAB, and both exhibit flat bands with nontrivial topology. We then use the Hartree-Fock approximation to calculate the rich phase diagram of CTTBG at all integer and half-integer fillings in both stacking orders and under the vertical displacement field. Under a small displacement field, the groundstates are flavor polarized states for ABABBC stacking order and intervalley coherent states for ABABAB stacking order at all integer and half-integer fillings. A larger displacement field will turn them into layer-polarized states. At half-integer fillings, the groundstates also exhibit charge density wave (CDW) order. For ABABAB stacking, the groundstates are always $2\times1$ stripe state among a range of displacement fields. For ABABBC stacking, the groundstates are also $2\times1$ stripe states under a small displacement field and a larger displacement will possibly favor further translation-symmetry-breaking, depending on filling and the direction of the displacement field. We demonstrate that the CDW states observed in the experiment can originate from the strong Coulomb interaction of the flat band electrons.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Multi-dimension Transformer with Attention-based Filtering for Medical Image Segmentation
Authors:
Wentao Wang,
Xi Xiao,
Mingjie Liu,
Qing Tian,
Xuanyao Huang,
Qizhen Lan,
Swalpa Kumar Roy,
Tianyang Wang
Abstract:
The accurate segmentation of medical images is crucial for diagnosing and treating diseases. Recent studies demonstrate that vision transformer-based methods have significantly improved performance in medical image segmentation, primarily due to their superior ability to establish global relationships among features and adaptability to various inputs. However, these methods struggle with the low s…
▽ More
The accurate segmentation of medical images is crucial for diagnosing and treating diseases. Recent studies demonstrate that vision transformer-based methods have significantly improved performance in medical image segmentation, primarily due to their superior ability to establish global relationships among features and adaptability to various inputs. However, these methods struggle with the low signal-to-noise ratio inherent to medical images. Additionally, the effective utilization of channel and spatial information, which are essential for medical image segmentation, is limited by the representation capacity of self-attention. To address these challenges, we propose a multi-dimension transformer with attention-based filtering (MDT-AF), which redesigns the patch embedding and self-attention mechanism for medical image segmentation. MDT-AF incorporates an attention-based feature filtering mechanism into the patch embedding blocks and employs a coarse-to-fine process to mitigate the impact of low signal-to-noise ratio. To better capture complex structures in medical images, MDT-AF extends the self-attention mechanism to incorporate spatial and channel dimensions, enriching feature representation. Moreover, we introduce an interaction mechanism to improve the feature aggregation between spatial and channel dimensions. Experimental results on three public medical image segmentation benchmarks show that MDT-AF achieves state-of-the-art (SOTA) performance.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Real topological phonons in 3D carbon allotropes
Authors:
Xiaotian Wang,
**gbo Bai,
Jianhua Wang,
Zhenxiang Cheng,
Shifeng Qian,
Wenhong Wang,
Gang Zhang,
Zhi-Ming Yu,
Yugui Yao
Abstract:
There has been a significant focus on real topological systems that enjoy space-time inversion symmetry (PT ) and lack spin-orbit coupling. While the theoretical classification of the real topology has been established, more progress has yet to be made in the materials realization of such real topological systems in three dimensions (3D). To address this crucial issue, by selecting the carbon-base…
▽ More
There has been a significant focus on real topological systems that enjoy space-time inversion symmetry (PT ) and lack spin-orbit coupling. While the theoretical classification of the real topology has been established, more progress has yet to be made in the materials realization of such real topological systems in three dimensions (3D). To address this crucial issue, by selecting the carbon-based material candidates as targets, we perform high-throughput computing to inspect the real topology in the phonon spectrums of the 3D carbon allotropes in the Samara Carbon Allotrope Database (SACADA). Among 1192 kinds of 3D carbon allotropes, we find 65 real topological systems with a phononic real Chern insulating (PRCI) state, 2 real topological systems with a phononic real nodal line (PRNL) state, 10 real topological systems with a phononic real Dirac point (PRDP) state, and 8 real topological systems with a phononic real triple-point pair (PRTPP) state. This extremely expands the material candidates with real topology, especially for the gapless topological phonons. We exhibit the PRCI, PRNL, PRTPP, and PRDP states of 27-SG. 166-pcu-h, 1081-SG. 194- 4 2T13-CA, 52-SG. 141-gis, and 132-SG. 191-3,4T157 as illustrative examples, and explore the second-order boundary mode, i.e., phononic hinge mode. Among the four examples, the materials 1081-SG. 194-42T13-CA and 52-SG. 141-gis are so ideal that the PRNL and PRTPP in them are well separated from other bands, and the phononic hinge mode can be clearly observed. This study aims to broaden the understanding of 3D topological phonons, and emphasizes the potential of 3D carbon allotropes as a valuable framework for exploring the fascinating physics related to phononic hinge modes and phononic real topology.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Sparse Attention-driven Quality Prediction for Production Process Optimization in Digital Twins
Authors:
Yanlei Yin,
Lihua Wang,
Wenbo Wang,
Dinh Thai Hoang
Abstract:
In the process industry, optimizing production lines for long-term efficiency requires real-time monitoring and analysis of operation states to fine-tune production line parameters. However, the complexity in operational logic and the intricate coupling of production process parameters make it difficult to develop an accurate mathematical model for the entire process, thus hindering the deployment…
▽ More
In the process industry, optimizing production lines for long-term efficiency requires real-time monitoring and analysis of operation states to fine-tune production line parameters. However, the complexity in operational logic and the intricate coupling of production process parameters make it difficult to develop an accurate mathematical model for the entire process, thus hindering the deployment of efficient optimization mechanisms. In view of these difficulties, we propose to deploy a digital twin of the production line by digitally abstracting its physical layout and operational logic. By iteratively map** the real-world data reflecting equipment operation status and product quality inspection in the digital twin, we adopt a quality prediction model for production process based on self-attention-enabled temporal convolutional neural networks. This model enables the data-driven state evolution of the digital twin. The digital twin takes a role of aggregating the information of actual operating conditions and the results of quality-sensitive analysis, which facilitates the optimization of process production quality with virtual-reality evolution under multi-dimensional constraints. Leveraging the digital twin model as an information-flow carrier, we extract temporal features from key process indicators and establish a production process quality prediction model based on the proposed composite neural network. Our operation experiments on a specific tobacco shredding line demonstrate that the proposed digital twin-based production process optimization method fosters seamless integration between virtual and real production lines. This integration achieves an average operating status prediction accuracy of over 98\% and near-optimal production process control.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Data quality control system and long-term performance monitor of the LHAASO-KM2A
Authors:
Zhen Cao,
F. Aharonian,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
W. Bian,
A. V. Bukevich,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
H. X. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. Chen
, et al. (263 additional authors not shown)
Abstract:
The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To…
▽ More
The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To ensure the reliability of the LHAASO-KM2A data, a three-level quality control system has been established. It is used to monitor the status of detector units, stability of reconstructed parameters and the performance of the array based on observations of the Crab Nebula and Moon shadow. This paper will introduce the control system and its application on the LHAASO-KM2A data collected from August 2021 to July 2023. During this period, the pointing and angular resolution of the array were stable. From the observations of the Moon shadow and Crab Nebula, the results achieved using the two methods are consistent with each other. According to the observation of the Crab Nebula at energies from 25 TeV to 100 TeV, the time averaged pointing errors are estimated to be $-0.003^{\circ} \pm 0.005^{\circ}$ and $0.001^{\circ} \pm 0.006^{\circ}$ in the R.A. and Dec directions, respectively.
△ Less
Submitted 13 June, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
Simulating a Chern Insulator with C = $\pm$2 on Synthetic Floquet Lattice
Authors:
Lingxiao Lei,
Weichen Wang,
Guangyao Huang,
Shun Hu,
Xi Cao,
Xinfang Zhang,
Mingtang Deng,
**xing Chen
Abstract:
The synthetic Floquet lattice, generated by multiple strong drives with mutually incommensurate frequencies, provides a powerful platform for the quantum simulation of topological phenomena. In this study, we propose a 4-band tight-binding model of the Chern insulator with a Chern number C = $\pm$2 by coupling two layers of the half-BHZ lattice and subsequently map** it onto the Floquet lattice…
▽ More
The synthetic Floquet lattice, generated by multiple strong drives with mutually incommensurate frequencies, provides a powerful platform for the quantum simulation of topological phenomena. In this study, we propose a 4-band tight-binding model of the Chern insulator with a Chern number C = $\pm$2 by coupling two layers of the half-BHZ lattice and subsequently map** it onto the Floquet lattice to simulate its topological properties. To determine the Chern number of our Floquet-version model, we extend the energy pum** method proposed by Martin et al. [Phys. Rev. X 7, 041008 (2017)] and the topological oscillation method introduced by Boyers et al. [Phys. Rev. Lett. 125, 160505 (2020)], followed by numerical simulations for both methodologies. The simulation results demonstrate the successful extraction of the Chern number using either of these methods, providing an excellent prediction of the phase diagram that closely aligns with the theoretical one derived from the original bilayer half-BHZ model. Finally, we briefly discuss a potential experimental implementation for our model. Our work demonstrates significant potential for simulating complex topological matter using quantum computing platforms, thereby paving the way for constructing a more universal simulator for non-interacting topological quantum states and advancing our understanding of these intriguing phenomena.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention
Authors:
Peng Li,
Yuan Liu,
Xiaoxiao Long,
Feihu Zhang,
Cheng Lin,
Mengfei Li,
Xingqun Qi,
Shanghang Zhang,
Wenhan Luo,
** Tan,
Wen** Wang,
Qifeng Liu,
Yike Guo
Abstract:
In this paper, we introduce Era3D, a novel multiview diffusion method that generates high-resolution multiview images from a single-view image. Despite significant advancements in multiview generation, existing methods still suffer from camera prior mismatch, inefficacy, and low resolution, resulting in poor-quality multiview images. Specifically, these methods assume that the input images should…
▽ More
In this paper, we introduce Era3D, a novel multiview diffusion method that generates high-resolution multiview images from a single-view image. Despite significant advancements in multiview generation, existing methods still suffer from camera prior mismatch, inefficacy, and low resolution, resulting in poor-quality multiview images. Specifically, these methods assume that the input images should comply with a predefined camera type, e.g. a perspective camera with a fixed focal length, leading to distorted shapes when the assumption fails. Moreover, the full-image or dense multiview attention they employ leads to an exponential explosion of computational complexity as image resolution increases, resulting in prohibitively expensive training costs. To bridge the gap between assumption and reality, Era3D first proposes a diffusion-based camera prediction module to estimate the focal length and elevation of the input image, which allows our method to generate images without shape distortions. Furthermore, a simple but efficient attention layer, named row-wise attention, is used to enforce epipolar priors in the multiview diffusion, facilitating efficient cross-view information fusion. Consequently, compared with state-of-the-art methods, Era3D generates high-quality multiview images with up to a 512*512 resolution while reducing computation complexity by 12x times. Comprehensive experiments demonstrate that Era3D can reconstruct high-quality and detailed 3D meshes from diverse single-view input images, significantly outperforming baseline multiview diffusion methods. Project page: https://penghtyx.github.io/Era3D/.
△ Less
Submitted 29 May, 2024; v1 submitted 19 May, 2024;
originally announced May 2024.
-
Improved measurement of the branching fraction of $h_{c}\rightarrowγη^\prime/η$ and search for $h_{c}\rightarrowγπ^0$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (645 additional authors not shown)
Abstract:
The processes $h_c\rightarrowγP(P = η^\prime,~η,~π^{0}))$ are studied with a sample of $(27.12\pm0.14)\times10^{8}$ $ψ(3686)$ events collected by the BESIII detector at the BEPCII collider. The branching fractions of $h_c\rightarrowγη^\prime$ and $h_c\rightarrowγη$ are measured to be $(1.40\pm0.11\pm0.04\pm0.10)\times10^{-3}$ and $(3.77\pm0.55\pm0.13\pm0.26)\times10^{-4}$, respectively, where the…
▽ More
The processes $h_c\rightarrowγP(P = η^\prime,~η,~π^{0}))$ are studied with a sample of $(27.12\pm0.14)\times10^{8}$ $ψ(3686)$ events collected by the BESIII detector at the BEPCII collider. The branching fractions of $h_c\rightarrowγη^\prime$ and $h_c\rightarrowγη$ are measured to be $(1.40\pm0.11\pm0.04\pm0.10)\times10^{-3}$ and $(3.77\pm0.55\pm0.13\pm0.26)\times10^{-4}$, respectively, where the first uncertainties are statistical, the second systematic, and the third from the branching fraction of $ψ(3686)\rightarrowπ^{0}h_c$. The ratio $R_{h_c}=\frac{\mathscr{B}(h_c\rightarrowγη)}{\mathscr{B}(h_c\rightarrowγη^\prime)}$ is calculated to be $(27.0\pm4.4\pm1.0)\%$. The measurements are consistent with the previous results with improved precision by a factor of 2. The results are valuable for gaining a deeper understanding of $η-η^\prime$ mixing, and its manifestation within quantum chromodynamics. No significant signal is found for the decay $h_c\rightarrowγπ^{0}$, and an upper limit is placed on its branching fraction of $\mathscr{B}(h_c\rightarrowγπ^{0})<5.0\times10^{-5}$, at the 90\% confidence level.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
NetMamba: Efficient Network Traffic Classification via Pre-training Unidirectional Mamba
Authors:
Tongze Wang,
Xiaohui Xie,
Wenduo Wang,
Chuyi Wang,
Youjian Zhao,
Yong Cui
Abstract:
Network traffic classification is a crucial research area aiming to enhance service quality, streamline network management, and bolster cybersecurity. To address the growing complexity of transmission encryption techniques, various machine learning and deep learning methods have been proposed. However, existing approaches face two main challenges. Firstly, they struggle with model inefficiency due…
▽ More
Network traffic classification is a crucial research area aiming to enhance service quality, streamline network management, and bolster cybersecurity. To address the growing complexity of transmission encryption techniques, various machine learning and deep learning methods have been proposed. However, existing approaches face two main challenges. Firstly, they struggle with model inefficiency due to the quadratic complexity of the widely used Transformer architecture. Secondly, they suffer from inadequate traffic representation because of discarding important byte information while retaining unwanted biases. To address these challenges, we propose NetMamba, an efficient linear-time state space model equipped with a comprehensive traffic representation scheme. We adopt a specially selected and improved unidirectional Mamba architecture for the networking field, instead of the Transformer, to address efficiency issues. In addition, we design a traffic representation scheme to extract valid information from massive traffic data while removing biased information. Evaluation experiments on six public datasets encompassing three main classification tasks showcase NetMamba's superior classification performance compared to state-of-the-art baselines. It achieves an accuracy rate of nearly 99% (some over 99%) in all tasks. Additionally, NetMamba demonstrates excellent efficiency, improving inference speed by up to 60 times while maintaining comparably low memory usage. Furthermore, NetMamba exhibits superior few-shot learning abilities, achieving better classification performance with fewer labeled data. To the best of our knowledge, NetMamba is the first model to tailor the Mamba architecture for networking.
△ Less
Submitted 25 May, 2024; v1 submitted 19 May, 2024;
originally announced May 2024.
-
Transverse polarization measurement of $Λ$ hyperons in $p$Ne collisions at $\sqrt{s_{NN}}$ = 68.4 GeV with the $\mbox{LHCb}$ detector
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1065 additional authors not shown)
Abstract:
A measurement of the transverse polarization of the $Λ$ and $\barΛ$ hyperons in $p$Ne fixed-target collisions at $\sqrt{s_{NN}}$ = 68.4 GeV is presented using data collected by the LHCb detector. The polarization is studied using the decay $Λ\rightarrow p π^-$ together with its charge conjugated process, the integrated values measured are…
▽ More
A measurement of the transverse polarization of the $Λ$ and $\barΛ$ hyperons in $p$Ne fixed-target collisions at $\sqrt{s_{NN}}$ = 68.4 GeV is presented using data collected by the LHCb detector. The polarization is studied using the decay $Λ\rightarrow p π^-$ together with its charge conjugated process, the integrated values measured are
$$ P_Λ = 0.029 \pm 0.019 \, (\rm{stat}) \pm 0.012 \, (\rm{syst}) \, , $$ $$ P_{\barΛ} = 0.003 \pm 0.023 \, (\rm{stat}) \pm 0.014 \,(\rm{syst}) \,. $$
Furthermore, the results are shown as a function of the Feynman~$x$~variable, transverse momentum, pseudorapidity and rapidity of the hyperons, and are compared with previous measurements.
△ Less
Submitted 24 May, 2024; v1 submitted 18 May, 2024;
originally announced May 2024.
-
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Authors:
Jian Hu,
Xibin Wu,
Weixun Wang,
Xianyu,
Dehao Zhang,
Yu Cao
Abstract:
As large language models (LLMs) continue to grow by scaling laws, reinforcement learning from human feedback (RLHF) has gained significant attention due to its outstanding performance. However, unlike pretraining or fine-tuning a single model, scaling reinforcement learning from human feedback (RLHF) for training large language models poses coordination challenges across four models. We present Op…
▽ More
As large language models (LLMs) continue to grow by scaling laws, reinforcement learning from human feedback (RLHF) has gained significant attention due to its outstanding performance. However, unlike pretraining or fine-tuning a single model, scaling reinforcement learning from human feedback (RLHF) for training large language models poses coordination challenges across four models. We present OpenRLHF, an open-source framework enabling efficient RLHF scaling. Unlike existing RLHF frameworks that co-locate four models on the same GPUs, OpenRLHF re-designs scheduling for the models beyond 70B parameters using Ray, vLLM, and DeepSpeed, leveraging improved resource utilization and diverse training approaches. Integrating seamlessly with Hugging Face, OpenRLHF provides an out-of-the-box solution with optimized algorithms and launch scripts, which ensures user-friendliness. OpenRLHF implements RLHF, DPO, rejection sampling, and other alignment techniques. Empowering state-of-the-art LLM development, OpenRLHF's code is available at https://github.com/OpenLLMAI/OpenRLHF.
△ Less
Submitted 3 June, 2024; v1 submitted 19 May, 2024;
originally announced May 2024.
-
Research on the Quantum confinement of Carriers in the Type-I Quantum Wells Structure
Authors:
Xinxin Li,
Zhen Deng,
Yang Jiang,
Chunhua Du,
Haiqiang Jia,
Wenxin Wang,
Hong Chen
Abstract:
Quantum confinement is recognized to be an inherent property in low-dimensional structures. Traditionally it is believed that the carriers trapped within the well cannot escape due to the discrete energy levels. However, our previous research has revealed efficient carrier escape in low-dimensional structures, contradicting this conventional understanding. In this study, we review the energy band…
▽ More
Quantum confinement is recognized to be an inherent property in low-dimensional structures. Traditionally it is believed that the carriers trapped within the well cannot escape due to the discrete energy levels. However, our previous research has revealed efficient carrier escape in low-dimensional structures, contradicting this conventional understanding. In this study, we review the energy band structure of quantum wells considering it as a superposition of the bulk material dispersion and quantization energy dispersion resulting from the quantum confinement across the whole Brillouin zone. By accounting for all wave vectors, we obtain a certain distribution of carrier energy at each quantization energy level, giving rise to the energy subbands. These results enable carriers to escape from the well under the influence of an electric field. Additionally, we have compiled a comprehensive summary of various energy band scenarios in quantum well structures, relevant to carrier transport. Such a new interpretation holds significant value in deepening our comprehension of low-dimensional energy bands, discovering new physical phenomena, and designing novel devices with superior performance.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
3D Vessel Reconstruction from Sparse-View Dynamic DSA Images via Vessel Probability Guided Attenuation Learning
Authors:
Zhentao Liu,
Huangxuan Zhao,
Wenhui Qin,
Zhenghong Zhou,
Xinggang Wang,
Wen** Wang,
Xiaochun Lai,
Chuansheng Zheng,
Dinggang Shen,
Zhiming Cui
Abstract:
Digital Subtraction Angiography (DSA) is one of the gold standards in vascular disease diagnosing. With the help of contrast agent, time-resolved 2D DSA images deliver comprehensive insights into blood flow information and can be utilized to reconstruct 3D vessel structures. Current commercial DSA systems typically demand hundreds of scanning views to perform reconstruction, resulting in substanti…
▽ More
Digital Subtraction Angiography (DSA) is one of the gold standards in vascular disease diagnosing. With the help of contrast agent, time-resolved 2D DSA images deliver comprehensive insights into blood flow information and can be utilized to reconstruct 3D vessel structures. Current commercial DSA systems typically demand hundreds of scanning views to perform reconstruction, resulting in substantial radiation exposure. However, sparse-view DSA reconstruction, aimed at reducing radiation dosage, is still underexplored in the research community. The dynamic blood flow and insufficient input of sparse-view DSA images present significant challenges to the 3D vessel reconstruction task. In this study, we propose to use a time-agnostic vessel probability field to solve this problem effectively. Our approach, termed as vessel probability guided attenuation learning, represents the DSA imaging as a complementary weighted combination of static and dynamic attenuation fields, with the weights derived from the vessel probability field. Functioning as a dynamic mask, vessel probability provides proper gradients for both static and dynamic fields adaptive to different scene types. This mechanism facilitates a self-supervised decomposition between static backgrounds and dynamic contrast agent flow, and significantly improves the reconstruction quality. Our model is trained by minimizing the disparity between synthesized projections and real captured DSA images. We further employ two training strategies to improve our reconstruction quality: (1) coarse-to-fine progressive training to achieve better geometry and (2) temporal perturbed rendering loss to enforce temporal consistency. Experimental results have demonstrated superior quality on both 3D vessel reconstruction and 2D view synthesis.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Large Fermi surface in pristine kagome metal CsV$_3$Sb$_5$ and enhanced quasiparticle effective masses
Authors:
Wei Zhang,
Tsz Fung Poon,
Chun Wai Tsang,
Wenyan Wang,
X. Liu,
J. Xie,
S. T. Lam,
Shanmin Wang,
Kwing To Lai,
A. Pourret,
G. Seyfarth,
G. Knebel,
Wing Chi Yu,
Swee K. Goh
Abstract:
The kagome metal CsV$_3$Sb$_5$ is an ideal platform to study the interplay between topology and electron correlation. To understand the fermiology of CsV$_3$Sb$_5$, intensive quantum oscillation (QO) studies at ambient pressure have been conducted. However, due to the Fermi surface reconstruction by the complicated charge density wave (CDW) order, the QO spectrum is exceedingly complex, hindering…
▽ More
The kagome metal CsV$_3$Sb$_5$ is an ideal platform to study the interplay between topology and electron correlation. To understand the fermiology of CsV$_3$Sb$_5$, intensive quantum oscillation (QO) studies at ambient pressure have been conducted. However, due to the Fermi surface reconstruction by the complicated charge density wave (CDW) order, the QO spectrum is exceedingly complex, hindering a complete understanding of the fermiology. Here, we directly map the Fermi surface of the pristine CsV$_3$Sb$_5$ by measuring Shubnikov-de Haas QOs up to 29 T under pressure, where the CDW order is completely suppressed. The QO spectrum of the pristine CsV$_3$Sb$_5$ is significantly simpler than the one in the CDW phase, and the detected oscillation frequencies agree well with our density functional theory calculations. In particular, a frequency as large as 8,200 T is detected. Pressure-dependent QO studies further reveal a weak but noticeable enhancement of the quasiparticle effective masses on approaching the critical pressure where the CDW order disappears, hinting at the presence of quantum fluctuations. Our high-pressure QO results reveal the large, unreconstructed Fermi surface of CsV$_3$Sb$_5$, paving the way to understanding the parent state of this intriguing metal in which the electrons can be organized into different ordered states.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Suppression of blow-up in Patlak-Keller-Segel system coupled with linearized Navier-Stokes equations via the 3D Couette flow
Authors:
Shikun Cui,
Lili Wang,
Wendong Wang
Abstract:
It is known that finite-time blow-up in the 3D Patlak-Keller-Segel system may occur for arbitrarily small values of the initial mass. It's interesting whether one can prevent the finite-time blow-up via the stabilizing effect of the moving fluid. Consider the three-dimensional Patlak-Keller-Segel system coupled with the linearized Navier-Stokes equations near the Couette flow $(\ Ay, 0, 0 \ )$ in…
▽ More
It is known that finite-time blow-up in the 3D Patlak-Keller-Segel system may occur for arbitrarily small values of the initial mass. It's interesting whether one can prevent the finite-time blow-up via the stabilizing effect of the moving fluid. Consider the three-dimensional Patlak-Keller-Segel system coupled with the linearized Navier-Stokes equations near the Couette flow $(\ Ay, 0, 0 \ )$ in a finite channel $\mathbb{T}\times\mathbb{I}\times\mathbb{T}$ with $ \mathbb{T}=[0,2π) $ and $ \mathbb{I}=[-1,1] $, with the non-slip boundary condition, and we show that if the shear flow is sufficiently strong (A is large enough), then the solutions to Patlak-Keller-Segel-Navier-Stokes system are global in time as long as the initial cell mass is sufficiently small (for example, $M<\frac49$) and $ A\left(\|u_{2,0}(0)\|_{L^{2}}+\|u_{3,0}(0)\|_{L^{2}} \right)\leq C_{0} $, which seems to be the first result of considering the suppression effect of Couette flow in the 3D Patlak-Keller-Segel-Navier-Stokes model, and also the first time considering the non-slip boundary condition.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Energy dependence of the low-frequency quasi-periodic oscillations in Swift J1727.8-1613
Authors:
Haifan Zhu,
Wei Wang
Abstract:
Based on observations from the Insight-Hard X-ray Modulation Telescope (Insight-HXMT), an analysis of Type-C quasi-periodic oscillations (QPOs) observed during the outburst of the new black hole candidate Swift J1727.8-1613 in 2023 was conducted. This analysis scrutinized the QPO's evolution throughout the outburst, particularly noting its rapid frequency escalation during two flare events. Utiliz…
▽ More
Based on observations from the Insight-Hard X-ray Modulation Telescope (Insight-HXMT), an analysis of Type-C quasi-periodic oscillations (QPOs) observed during the outburst of the new black hole candidate Swift J1727.8-1613 in 2023 was conducted. This analysis scrutinized the QPO's evolution throughout the outburst, particularly noting its rapid frequency escalation during two flare events. Utilizing the energy range covered by Insight-HXMT, a dependency of the QPO frequency on energy was observed. Below approximately 3 Hz, minimal variations in frequency with energy were noted, whereas clear variations with photon energy were observed when it exceeded approximately 3 Hz. Additionally, a sharp drop in the rate of change was observed when the frequency exceeded approximately 8 Hz. This behavior, similar to several previously reported sources, suggests the presence of a common underlying physical mechanism. Moreover, the QPO rms-frequency relationship can be explained by the Lense-Thirring precession model. The relationship between rms-energy and phase lag with frequency suggests the black hole system as a high-inclination source.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Search for the leptonic decays $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
M. Albrecht,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
R. Baldini Ferroli,
I. Balossino,
Y. Ban,
V. Batozskaya,
D. Becker,
K. Begzsuren,
N. Berger,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
J. Bloms,
A. Bortone,
I. Boyko
, et al. (559 additional authors not shown)
Abstract:
We present the first search for the leptonic decays $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$ by analyzing a data sample of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.178 and 4.226 GeV, corresponding to an integrated luminosity of 6.32~fb$^{-1}$. No significant signal is observed. The upper limits on the branching fractions for…
▽ More
We present the first search for the leptonic decays $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$ by analyzing a data sample of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.178 and 4.226 GeV, corresponding to an integrated luminosity of 6.32~fb$^{-1}$. No significant signal is observed. The upper limits on the branching fractions for $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$ are set to be $1.1 \times 10^{-5}$ and $4.3 \times 10^{-6}$ at 90\% confidence level, respectively.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Function based sim-to-real learning for shape control of deformable free-form surfaces
Authors:
Yingjun Tian,
Guoxin Fang,
Renbo Su,
Weiming Wang,
Simeon Gill,
Andrew Weightman,
Charlie C. L. Wang
Abstract:
For the shape control of deformable free-form surfaces, simulation plays a crucial role in establishing the map** between the actuation parameters and the deformed shapes. The differentiation of this forward kinematic map** is usually employed to solve the inverse kinematic problem for determining the actuation parameters that can realize a target shape. However, the free-form surfaces obtaine…
▽ More
For the shape control of deformable free-form surfaces, simulation plays a crucial role in establishing the map** between the actuation parameters and the deformed shapes. The differentiation of this forward kinematic map** is usually employed to solve the inverse kinematic problem for determining the actuation parameters that can realize a target shape. However, the free-form surfaces obtained from simulators are always different from the physically deformed shapes due to the errors introduced by hardware and the simplification adopted in physical simulation. To fill the gap, we propose a novel deformation function based sim-to-real learning method that can map the geometric shape of a simulated model into its corresponding shape of the physical model. Unlike the existing sim-to-real learning methods that rely on completely acquired dense markers, our method accommodates sparsely distributed markers and can resiliently use all captured frames -- even for those in the presence of missing markers. To demonstrate its effectiveness, our sim-to-real method has been integrated into a neural network-based computational pipeline designed to tackle the inverse kinematic problem on a pneumatically actuated deformable mannequin.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Authors:
Zhimin Li,
Jianwei Zhang,
Qin Lin,
Jiangfeng Xiong,
Yanxin Long,
Xinchi Deng,
Yingfang Zhang,
Xingchao Liu,
Minbin Huang,
Zedong Xiao,
Dayou Chen,
Jiajun He,
Jiahao Li,
Wenyue Li,
Chen Zhang,
Rongwei Quan,
Jianxiang Lu,
Jiabin Huang,
Xiaoyan Yuan,
Xiaoxiao Zheng,
Yixuan Li,
Jihong Zhang,
Chao Zhang,
Meng Chen,
Jie Liu
, et al. (20 additional authors not shown)
Abstract:
We present Hunyuan-DiT, a text-to-image diffusion transformer with fine-grained understanding of both English and Chinese. To construct Hunyuan-DiT, we carefully design the transformer structure, text encoder, and positional encoding. We also build from scratch a whole data pipeline to update and evaluate data for iterative model optimization. For fine-grained language understanding, we train a Mu…
▽ More
We present Hunyuan-DiT, a text-to-image diffusion transformer with fine-grained understanding of both English and Chinese. To construct Hunyuan-DiT, we carefully design the transformer structure, text encoder, and positional encoding. We also build from scratch a whole data pipeline to update and evaluate data for iterative model optimization. For fine-grained language understanding, we train a Multimodal Large Language Model to refine the captions of the images. Finally, Hunyuan-DiT can perform multi-turn multimodal dialogue with users, generating and refining images according to the context. Through our holistic human evaluation protocol with more than 50 professional human evaluators, Hunyuan-DiT sets a new state-of-the-art in Chinese-to-image generation compared with other open-source models. Code and pretrained models are publicly available at github.com/Tencent/HunyuanDiT
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Treatment Effect Estimation for User Interest Exploration on Recommender Systems
Authors:
Jiaju Chen,
Wenjie Wang,
Chongming Gao,
Peng Wu,
Jianxiong Wei,
Qingsong Hua
Abstract:
Recommender systems learn personalized user preferences from user feedback like clicks. However, user feedback is usually biased towards partially observed interests, leaving many users' hidden interests unexplored. Existing approaches typically mitigate the bias, increase recommendation diversity, or use bandit algorithms to balance exploration-exploitation trade-offs. Nevertheless, they fail to…
▽ More
Recommender systems learn personalized user preferences from user feedback like clicks. However, user feedback is usually biased towards partially observed interests, leaving many users' hidden interests unexplored. Existing approaches typically mitigate the bias, increase recommendation diversity, or use bandit algorithms to balance exploration-exploitation trade-offs. Nevertheless, they fail to consider the potential rewards of recommending different categories of items and lack the global scheduling of allocating top-N recommendations to categories, leading to suboptimal exploration. In this work, we propose an Uplift model-based Recommender (UpliftRec) framework, which regards top-N recommendation as a treatment optimization problem. UpliftRec estimates the treatment effects, i.e., the click-through rate (CTR) under different category exposure ratios, by using observational user feedback. UpliftRec calculates group-level treatment effects to discover users' hidden interests with high CTR rewards and leverages inverse propensity weighting to alleviate confounder bias. Thereafter, UpliftRec adopts a dynamic programming method to calculate the optimal treatment for overall CTR maximization. We implement UpliftRec on different backend models and conduct extensive experiments on three datasets. The empirical results validate the effectiveness of UpliftRec in discovering users' hidden interests while achieving superior recommendation accuracy.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.