-
Single-Atom Verification of the Optimal Trade-Off between Speed and Cost in Shortcuts to Adiabaticity
Authors:
J. -W. Zhang,
J. -T. Bu,
J. C. Li,
Weiquan Meng,
W. -Q. Ding,
B. Wang,
W. -F. Yuan,
H. -J. Du,
G. -Y. Ding,
W. -J. Chen,
L. Chen,
F. Zhou,
Zhenyu Xu,
M. Feng
Abstract:
The approach of shortcuts to adiabaticity enables the effective execution of adiabatic dynamics in quantum information processing with enhanced speed. Owing to the inherent trade-off between dynamical speed and the cost associated with the transitionless driving field, executing arbitrarily fast operations becomes impractical. To understand the accurate interplay between speed and energetic cost i…
▽ More
The approach of shortcuts to adiabaticity enables the effective execution of adiabatic dynamics in quantum information processing with enhanced speed. Owing to the inherent trade-off between dynamical speed and the cost associated with the transitionless driving field, executing arbitrarily fast operations becomes impractical. To understand the accurate interplay between speed and energetic cost in this process, we propose theoretically and verify experimentally a new trade-off, which is characterized by a tightly optimized bound within $s$-parameterized phase spaces. Our experiment is carried out in a single ultracold $^{40}$Ca$^{+}$ ion trapped in a harmonic potential. By exactly operating the quantum states of the ion, we execute the Landau-Zener model as an example, where the quantum speed limit as well as the cost are governed by the spectral gap. We witness that our proposed trade-off is indeed tight in scenarios involving both initially eigenstates and initially thermal equilibrium states. Our work helps understanding the fundamental constraints in shortcuts to adiabaticity and illuminates the potential of under-utilized phase spaces that have been traditionally overlooked.
△ Less
Submitted 6 June, 2024; v1 submitted 24 April, 2024;
originally announced April 2024.
-
Beyond ESM2: Graph-Enhanced Protein Sequence Modeling with Efficient Clustering
Authors:
Shujian Jiao,
Bingxuan Li,
Lei Wang,
Xiao** Zhang,
Wei Chen,
Jiajie Peng,
Zhongyu Wei
Abstract:
Proteins are essential to life's processes, underpinning evolution and diversity. Advances in sequencing technology have revealed millions of proteins, underscoring the need for sophisticated pre-trained protein models for biological analysis and AI development. Facebook's ESM2, the most advanced protein language model to date, leverages a masked prediction task for unsupervised learning, crafting…
▽ More
Proteins are essential to life's processes, underpinning evolution and diversity. Advances in sequencing technology have revealed millions of proteins, underscoring the need for sophisticated pre-trained protein models for biological analysis and AI development. Facebook's ESM2, the most advanced protein language model to date, leverages a masked prediction task for unsupervised learning, crafting amino acid representations with notable biochemical accuracy. Yet, it lacks in delivering functional protein insights, signaling an opportunity for enhancing representation quality.Our study addresses this gap by incorporating protein family classification into ESM2's training.This approach, augmented with Community Propagation-Based Clustering Algorithm, improves global protein representations, while a contextual prediction task fine-tunes local amino acid accuracy. Significantly, our model achieved state-of-the-art results in several downstream experiments, demonstrating the power of combining global and local methodologies to substantially boost protein representation quality.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Curvature, diameter and signs of graphs
Authors:
Wei Chen,
Shi** Liu
Abstract:
We prove a Li-Yau type eigenvalue-diameter estimate for signed graphs. That is, the nonzero eigenvalues of the Laplacian of a non-negatively curved signed graph are lower bounded by $1/D^2$ up to a constant, where $D$ stands for the diameter. This leads to several interesting applications, including a volume estimate for non-negatively curved signed graphs in terms of frustration index and diamete…
▽ More
We prove a Li-Yau type eigenvalue-diameter estimate for signed graphs. That is, the nonzero eigenvalues of the Laplacian of a non-negatively curved signed graph are lower bounded by $1/D^2$ up to a constant, where $D$ stands for the diameter. This leads to several interesting applications, including a volume estimate for non-negatively curved signed graphs in terms of frustration index and diameter, and a two-sided Li-Yau estimate for triangle-free graphs. Our proof is built upon a combination of Chung-Lin-Yau type gradient estimate and a new trick involving strong nodal domain walks of signed graphs. We further discuss extensions of part of our results to nonlinear Laplacians on signed graphs.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning
Authors:
Weifeng Chen,
Jiacheng Zhang,
Jie Wu,
Hefeng Wu,
Xuefeng Xiao,
Liang Lin
Abstract:
The rapid development of diffusion models has triggered diverse applications. Identity-preserving text-to-image generation (ID-T2I) particularly has received significant attention due to its wide range of application scenarios like AI portrait and advertising. While existing ID-T2I methods have demonstrated impressive results, several key challenges remain: (1) It is hard to maintain the identity…
▽ More
The rapid development of diffusion models has triggered diverse applications. Identity-preserving text-to-image generation (ID-T2I) particularly has received significant attention due to its wide range of application scenarios like AI portrait and advertising. While existing ID-T2I methods have demonstrated impressive results, several key challenges remain: (1) It is hard to maintain the identity characteristics of reference portraits accurately, (2) The generated images lack aesthetic appeal especially while enforcing identity retention, and (3) There is a limitation that cannot be compatible with LoRA-based and Adapter-based methods simultaneously. To address these issues, we present \textbf{ID-Aligner}, a general feedback learning framework to enhance ID-T2I performance. To resolve identity features lost, we introduce identity consistency reward fine-tuning to utilize the feedback from face detection and recognition models to improve generated identity preservation. Furthermore, we propose identity aesthetic reward fine-tuning leveraging rewards from human-annotated preference data and automatically constructed feedback on character structure generation to provide aesthetic tuning signals. Thanks to its universal feedback fine-tuning framework, our method can be readily applied to both LoRA and Adapter models, achieving consistent performance gains. Extensive experiments on SD1.5 and SDXL diffusion models validate the effectiveness of our approach. \textbf{Project Page: \url{https://idaligner.github.io/}}
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
ControlTraj: Controllable Trajectory Generation with Topology-Constrained Diffusion Model
Authors:
Yuanshao Zhu,
James Jianqiao Yu,
Xiangyu Zhao,
Qidong Liu,
Yongchao Ye,
Wei Chen,
Zijian Zhang,
Xuetao Wei,
Yuxuan Liang
Abstract:
Generating trajectory data is among promising solutions to addressing privacy concerns, collection costs, and proprietary restrictions usually associated with human mobility analyses. However, existing trajectory generation methods are still in their infancy due to the inherent diversity and unpredictability of human activities, grappling with issues such as fidelity, flexibility, and generalizabi…
▽ More
Generating trajectory data is among promising solutions to addressing privacy concerns, collection costs, and proprietary restrictions usually associated with human mobility analyses. However, existing trajectory generation methods are still in their infancy due to the inherent diversity and unpredictability of human activities, grappling with issues such as fidelity, flexibility, and generalizability. To overcome these obstacles, we propose ControlTraj, a Controllable Trajectory generation framework with the topology-constrained diffusion model. Distinct from prior approaches, ControlTraj utilizes a diffusion model to generate high-fidelity trajectories while integrating the structural constraints of road network topology to guide the geographical outcomes. Specifically, we develop a novel road segment autoencoder to extract fine-grained road segment embedding. The encoded features, along with trip attributes, are subsequently merged into the proposed geographic denoising UNet architecture, named GeoUNet, to synthesize geographic trajectories from white noise. Through experimentation across three real-world data settings, ControlTraj demonstrates its ability to produce human-directed, high-fidelity trajectory generation with adaptability to unexplored geographical contexts.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Simulation-Free Determination of Microstructure Representative Volume Element Size via Fisher Scores
Authors:
Wei Liu,
Satyajit Mojumder,
Wing Kam Liu,
Wei Chen,
Daniel W. Apley
Abstract:
A representative volume element (RVE) is a reasonably small unit of microstructure that can be simulated to obtain the same effective properties as the entire microstructure sample. Finite element (FE) simulation of RVEs, as opposed to much larger samples, saves computational expense, especially in multiscale modeling. Therefore, it is desirable to have a framework that determines RVE size prior t…
▽ More
A representative volume element (RVE) is a reasonably small unit of microstructure that can be simulated to obtain the same effective properties as the entire microstructure sample. Finite element (FE) simulation of RVEs, as opposed to much larger samples, saves computational expense, especially in multiscale modeling. Therefore, it is desirable to have a framework that determines RVE size prior to FE simulations. Existing methods select the RVE size based on when the FE-simulated properties of samples of increasing size converge with insignificant statistical variations, with the drawback that many samples must be simulated. We propose a simulation-free alternative that determines RVE size based only on a micrograph. The approach utilizes a machine learning model trained to implicitly characterize the stochastic nature of the input micrograph. The underlying rationale is to view RVE size as the smallest moving window size for which the stochastic nature of the microstructure within the window is stationary as the window moves across a large micrograph. For this purpose, we adapt a recently developed Fisher score-based framework for microstructure nonstationarity monitoring. Because the resulting RVE size is based solely on the micrograph and does not involve any FE simulation of specific properties, it constitutes an RVE for any property of interest that solely depends on the microstructure characteristics. Through numerical experiments of simple and complex microstructures, we validate our approach and show that our selected RVE sizes are consistent with when the chosen FE-simulated properties converge.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Authors:
Marah Abdin,
Sam Ade Jacobs,
Ammar Ahmad Awan,
Jyoti Aneja,
Ahmed Awadallah,
Hany Awadalla,
Nguyen Bach,
Amit Bahree,
Arash Bakhtiari,
Jianmin Bao,
Harkirat Behl,
Alon Benhaim,
Misha Bilenko,
Johan Bjorck,
Sébastien Bubeck,
Qin Cai,
Martin Cai,
Caio César Teodoro Mendes,
Weizhu Chen,
Vishrav Chaudhary,
Dong Chen,
Dongdong Chen,
Yen-Chun Chen,
Yi-Ling Chen,
Parul Chopra
, et al. (90 additional authors not shown)
Abstract:
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset…
▽ More
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset for training, a scaled-up version of the one used for phi-2, composed of heavily filtered publicly available web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide some initial parameter-scaling results with a 7B and 14B models trained for 4.8T tokens, called phi-3-small and phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75% and 78% on MMLU, and 8.7 and 8.9 on MT-bench). Moreover, we also introduce phi-3-vision, a 4.2 billion parameter model based on phi-3-mini with strong reasoning capabilities for image and text prompts.
△ Less
Submitted 23 May, 2024; v1 submitted 22 April, 2024;
originally announced April 2024.
-
Towards Robust Trajectory Representations: Isolating Environmental Confounders with Causal Learning
Authors:
Kang Luo,
Yuanshao Zhu,
Wei Chen,
Kun Wang,
Zhengyang Zhou,
Sijie Ruan,
Yuxuan Liang
Abstract:
Trajectory modeling refers to characterizing human movement behavior, serving as a pivotal step in understanding mobility patterns. Nevertheless, existing studies typically ignore the confounding effects of geospatial context, leading to the acquisition of spurious correlations and limited generalization capabilities. To bridge this gap, we initially formulate a Structural Causal Model (SCM) to de…
▽ More
Trajectory modeling refers to characterizing human movement behavior, serving as a pivotal step in understanding mobility patterns. Nevertheless, existing studies typically ignore the confounding effects of geospatial context, leading to the acquisition of spurious correlations and limited generalization capabilities. To bridge this gap, we initially formulate a Structural Causal Model (SCM) to decipher the trajectory representation learning process from a causal perspective. Building upon the SCM, we further present a Trajectory modeling framework (TrajCL) based on Causal Learning, which leverages the backdoor adjustment theory as an intervention tool to eliminate the spurious correlations between geospatial context and trajectories. Extensive experiments on two real-world datasets verify that TrajCL markedly enhances performance in trajectory classification tasks while showcasing superior generalization and interpretability.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Sentiment-oriented Transformer-based Variational Autoencoder Network for Live Video Commenting
Authors:
Fengyi Fu,
Shancheng Fang,
Weidong Chen,
Zhendong Mao
Abstract:
Automatic live video commenting is with increasing attention due to its significance in narration generation, topic explanation, etc. However, the diverse sentiment consideration of the generated comments is missing from the current methods. Sentimental factors are critical in interactive commenting, and lack of research so far. Thus, in this paper, we propose a Sentiment-oriented Transformer-base…
▽ More
Automatic live video commenting is with increasing attention due to its significance in narration generation, topic explanation, etc. However, the diverse sentiment consideration of the generated comments is missing from the current methods. Sentimental factors are critical in interactive commenting, and lack of research so far. Thus, in this paper, we propose a Sentiment-oriented Transformer-based Variational Autoencoder (So-TVAE) network which consists of a sentiment-oriented diversity encoder module and a batch attention module, to achieve diverse video commenting with multiple sentiments and multiple semantics. Specifically, our sentiment-oriented diversity encoder elegantly combines VAE and random mask mechanism to achieve semantic diversity under sentiment guidance, which is then fused with cross-modal features to generate live video comments. Furthermore, a batch attention module is also proposed in this paper to alleviate the problem of missing sentimental samples, caused by the data imbalance, which is common in live videos as the popularity of videos varies. Extensive experiments on Livebot and VideoIC datasets demonstrate that the proposed So-TVAE outperforms the state-of-the-art methods in terms of the quality and diversity of generated comments. Related code is available at https://github.com/fufy1024/So-TVAE.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
decoupleQ: Towards 2-bit Post-Training Uniform Quantization via decoupling Parameters into Integer and Floating Points
Authors:
Yi Guo,
Fanliu Kong,
Xiaoyang Li,
Hui Li,
Wei Chen,
Xiaogang Tian,
**** Cai,
Yang Zhang,
Shouda Liu
Abstract:
Quantization emerges as one of the most promising compression technologies for deploying efficient large models for various real time application in recent years. Considering that the storage and IO of weights take up the vast majority of the overhead inside a large model, weight only quantization can lead to large gains. However, existing quantization schemes suffer from significant accuracy degr…
▽ More
Quantization emerges as one of the most promising compression technologies for deploying efficient large models for various real time application in recent years. Considering that the storage and IO of weights take up the vast majority of the overhead inside a large model, weight only quantization can lead to large gains. However, existing quantization schemes suffer from significant accuracy degradation at very low bits, or require some additional computational overhead when deployed, making it difficult to be applied to large-scale applications in industry. In this paper, we propose decoupleQ, achieving a substantial increase in model accuracy, especially at very low bits. decoupleQ abandons the traditional heuristic quantization paradigm and decouples the model parameters into integer and floating-point parts, thus transforming the quantization problem into a traditional mathematical optimization problem with constraints, which is then solved alternatively by off-the-shelf optimization methods.
Quantization via decoupleQ is linear and uniform, making it hardware-friendlier than non-uniform counterpart, and enabling the idea to be migrated to high-bit quantization to enhance its robustness. Our method has achieved well on-line accuracy near fp16/bf16 on the 2-bit quantization of large speech models in ByteDance. The code is available at https://github.com/bytedance/decoupleQ
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Adaptive Catalyst Discovery Using Multicriteria Bayesian Optimization with Representation Learning
Authors:
Jie Chen,
Pengfei Ou,
Yuxin Chang,
Hengrui Zhang,
Xiao-Yan Li,
Edward H. Sargent,
Wei Chen
Abstract:
High-performance catalysts are crucial for sustainable energy conversion and human health. However, the discovery of catalysts faces challenges due to the absence of efficient approaches to navigating vast and high-dimensional structure and composition spaces. In this study, we propose a high-throughput computational catalyst screening approach integrating density functional theory (DFT) and Bayes…
▽ More
High-performance catalysts are crucial for sustainable energy conversion and human health. However, the discovery of catalysts faces challenges due to the absence of efficient approaches to navigating vast and high-dimensional structure and composition spaces. In this study, we propose a high-throughput computational catalyst screening approach integrating density functional theory (DFT) and Bayesian Optimization (BO). Within the BO framework, we propose an uncertainty-aware atomistic machine learning model, UPNet, which enables automated representation learning directly from high-dimensional catalyst structures and achieves principled uncertainty quantification. Utilizing a constrained expected improvement acquisition function, our BO framework simultaneously considers multiple evaluation criteria. Using the proposed methods, we explore catalyst discovery for the CO2 reduction reaction. The results demonstrate that our approach achieves high prediction accuracy, facilitates interpretable feature extraction, and enables multicriteria design optimization, leading to significant reduction of computing power and time (10x reduction of required DFT calculations) in high-performance catalyst discovery.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
AgentCoord: Visually Exploring Coordination Strategy for LLM-based Multi-Agent Collaboration
Authors:
Bo Pan,
Jiaying Lu,
Ke Wang,
Li Zheng,
Zhen Wen,
Yingchaojie Feng,
Minfeng Zhu,
Wei Chen
Abstract:
The potential of automatic task-solving through Large Language Model (LLM)-based multi-agent collaboration has recently garnered widespread attention from both the research community and industry. While utilizing natural language to coordinate multiple agents presents a promising avenue for democratizing agent technology for general users, designing coordination strategies remains challenging with…
▽ More
The potential of automatic task-solving through Large Language Model (LLM)-based multi-agent collaboration has recently garnered widespread attention from both the research community and industry. While utilizing natural language to coordinate multiple agents presents a promising avenue for democratizing agent technology for general users, designing coordination strategies remains challenging with existing coordination frameworks. This difficulty stems from the inherent ambiguity of natural language for specifying the collaboration process and the significant cognitive effort required to extract crucial information (e.g. agent relationship, task dependency, result correspondence) from a vast amount of text-form content during exploration. In this work, we present a visual exploration framework to facilitate the design of coordination strategies in multi-agent collaboration. We first establish a structured representation for LLM-based multi-agent coordination strategy to regularize the ambiguity of natural language. Based on this structure, we devise a three-stage generation method that leverages LLMs to convert a user's general goal into an executable initial coordination strategy. Users can further intervene at any stage of the generation process, utilizing LLMs and a set of interactions to explore alternative strategies. Whenever a satisfactory strategy is identified, users can commence the collaboration and examine the visually enhanced execution result. We develop AgentCoord, a prototype interactive system, and conduct a formal user study to demonstrate the feasibility and effectiveness of our approach.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Joint Transmitter and Receiver Design for Movable Antenna Enhanced Multicast Communications
Authors:
Ying Gao,
Qingqing Wu,
Wen Chen
Abstract:
Movable antenna (MA) is an emerging technology that utilizes localized antenna movement to pursue better channel conditions for enhancing communication performance. In this paper, we study the MA-enhanced multicast transmission from a base station equipped with multiple MAs to multiple groups of single-MA users. Our goal is to maximize the minimum weighted signal-to-interference-plus-noise ratio (…
▽ More
Movable antenna (MA) is an emerging technology that utilizes localized antenna movement to pursue better channel conditions for enhancing communication performance. In this paper, we study the MA-enhanced multicast transmission from a base station equipped with multiple MAs to multiple groups of single-MA users. Our goal is to maximize the minimum weighted signal-to-interference-plus-noise ratio (SINR) among all the users by jointly optimizing the position of each transmit/receive MA and the transmit beamforming. To tackle this challenging problem, we first consider the single-group scenario and propose an efficient algorithm based on the techniques of alternating optimization and successive convex approximation. Particularly, when optimizing transmit or receive MA positions, we construct a concave lower bound for the signal-to-noise ratio (SNR) of each user by applying only the second-order Taylor expansion, which is more effective than existing works utilizing two-step approximations. The proposed design is then extended to the general multi-group scenario. Simulation results demonstrate that significant performance gains in terms of achievable max-min SNR/SINR can be obtained by our proposed algorithm over benchmark schemes. Additionally, the proposed algorithm can notably reduce the required amount of transmit power or antennas for achieving a target level of max-min SNR/SINR performance compared to benchmark schemes.
△ Less
Submitted 9 May, 2024; v1 submitted 17 April, 2024;
originally announced April 2024.
-
Graph Neural Networks for Wireless Networks: Graph Representation, Architecture and Evaluation
Authors:
Yang Lu,
Yuhang Li,
Ruichen Zhang,
Wei Chen,
Bo Ai,
Dusit Niyato
Abstract:
Graph neural networks (GNNs) have been regarded as the basic model to facilitate deep learning (DL) to revolutionize resource allocation in wireless networks. GNN-based models are shown to be able to learn the structural information about graphs representing the wireless networks to adapt to the time-varying channel state information and dynamics of network topology. This article aims to provide a…
▽ More
Graph neural networks (GNNs) have been regarded as the basic model to facilitate deep learning (DL) to revolutionize resource allocation in wireless networks. GNN-based models are shown to be able to learn the structural information about graphs representing the wireless networks to adapt to the time-varying channel state information and dynamics of network topology. This article aims to provide a comprehensive overview of applying GNNs to optimize wireless networks via answering three fundamental questions, i.e., how to input the wireless network data into GNNs, how to improve the performance of GNNs, and how to evaluate GNNs. Particularly, two graph representations are given to transform wireless network parameters into graph-structured data. Then, we focus on the architecture design of the GNN-based models via introducing the basic message passing as well as model improvement methods including multi-head attention mechanism and residual structure. At last, we give task-oriented evaluation metrics for DL-enabled wireless resource allocation. We also highlight certain challenges and potential research directions for the application of GNNs in wireless networks.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
When are Foundation Models Effective? Understanding the Suitability for Pixel-Level Classification Using Multispectral Imagery
Authors:
Yiqun Xie,
Zhihao Wang,
Weiye Chen,
Zhili Li,
Xiaowei Jia,
Yanhua Li,
Ruichen Wang,
Kangyang Chai,
Ruohan Li,
Sergii Skakun
Abstract:
Foundation models, i.e., very large deep learning models, have demonstrated impressive performances in various language and vision tasks that are otherwise difficult to reach using smaller-size models. The major success of GPT-type of language models is particularly exciting and raises expectations on the potential of foundation models in other domains including satellite remote sensing. In this c…
▽ More
Foundation models, i.e., very large deep learning models, have demonstrated impressive performances in various language and vision tasks that are otherwise difficult to reach using smaller-size models. The major success of GPT-type of language models is particularly exciting and raises expectations on the potential of foundation models in other domains including satellite remote sensing. In this context, great efforts have been made to build foundation models to test their capabilities in broader applications, and examples include Prithvi by NASA-IBM, Segment-Anything-Model, ViT, etc. This leads to an important question: Are foundation models always a suitable choice for different remote sensing tasks, and when or when not? This work aims to enhance the understanding of the status and suitability of foundation models for pixel-level classification using multispectral imagery at moderate resolution, through comparisons with traditional machine learning (ML) and regular-size deep learning models. Interestingly, the results reveal that in many scenarios traditional ML models still have similar or better performance compared to foundation models, especially for tasks where texture is less useful for classification. On the other hand, deep learning models did show more promising results for tasks where labels partially depend on texture (e.g., burn scar), while the difference in performance between foundation models and deep learning models is not obvious. The results conform with our analysis: The suitability of foundation models depend on the alignment between the self-supervised learning tasks and the real downstream tasks, and the typical masked autoencoder paradigm is not necessarily suitable for many remote sensing problems.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Octopus v3: Technical Report for On-device Sub-billion Multimodal AI Agent
Authors:
Wei Chen,
Zhiyuan Li
Abstract:
A multimodal AI agent is characterized by its ability to process and learn from various types of data, including natural language, visual, and audio inputs, to inform its actions. Despite advancements in large language models that incorporate visual data, such as GPT-4V, effectively translating image-based data into actionable outcomes for AI agents continues to be challenging. In this paper, we i…
▽ More
A multimodal AI agent is characterized by its ability to process and learn from various types of data, including natural language, visual, and audio inputs, to inform its actions. Despite advancements in large language models that incorporate visual data, such as GPT-4V, effectively translating image-based data into actionable outcomes for AI agents continues to be challenging. In this paper, we introduce a multimodal model that incorporates the concept of functional token specifically designed for AI agent applications. To ensure compatibility with edge devices, our model is optimized to a compact size of less than 1B parameters. Like GPT-4, our model can process both English and Chinese. We demonstrate that this model is capable of operating efficiently on a wide range of edge devices, including as constrained as a Raspberry Pi.
△ Less
Submitted 18 April, 2024; v1 submitted 17 April, 2024;
originally announced April 2024.
-
Density estimation for ordinal biological sequences and its applications
Authors:
Wei-Chia Chen,
Juannan Zhou,
David M. McCandlish
Abstract:
Biological sequences do not come at random. Instead, they appear with particular frequencies that reflect properties of the associated system or phenomenon. Knowing how biological sequences are distributed in sequence space is thus a natural first step toward understanding the underlying mechanisms. Here we propose a new method for inferring the probability distribution from which a sample of biol…
▽ More
Biological sequences do not come at random. Instead, they appear with particular frequencies that reflect properties of the associated system or phenomenon. Knowing how biological sequences are distributed in sequence space is thus a natural first step toward understanding the underlying mechanisms. Here we propose a new method for inferring the probability distribution from which a sample of biological sequences were drawn for the case where the sequences are composed of elements that admit a natural ordering. Our method is based on Bayesian field theory, a physics-based machine learning approach, and can be regarded as a nonparametric extension of the traditional maximum entropy estimate. As an example, we use it to analyze the aneuploidy data pertaining to gliomas from The Cancer Genome Atlas project. In addition, we demonstrate two follow-up analyses that can be performed with the resulting probability distribution. One of them is to investigate the associations among the sequence sites. This provides us a way to infer the governing biological grammar. The other is to study the global geometry of the probability landscape, which allows us to look at the problem from an evolutionary point of view. It can be seen that this methodology enables us to learn from a sample of sequences about how a biological system or phenomenon in the real world works.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Modulation of the Octahedral Structure and Potential Superconductivity of La3Ni2O7 at Ambient Pressure by Compressive Strain
Authors:
Zihao Huo,
Peng Zhang,
Aiqin Yang,
Zhengtao Liu,
Xiangru Tao,
Zihan Zhang,
Qiwen Jiang,
Wenxuan Chen,
Defang Duan,
Tian Cui
Abstract:
Superconductivity at Tc = 80 K has recently been reported above 14 GPa in La3Ni2O7, which thus introduces a new family of high-temperature superconductors. Using a first-principles calculation with Coulomb repulsion, we unveil a surprising new route to obtain superconductivity in La3Ni2O7 at ambient pressure by introducing compressive strain along the [001] direction. The shape of the NiO6 octahed…
▽ More
Superconductivity at Tc = 80 K has recently been reported above 14 GPa in La3Ni2O7, which thus introduces a new family of high-temperature superconductors. Using a first-principles calculation with Coulomb repulsion, we unveil a surprising new route to obtain superconductivity in La3Ni2O7 at ambient pressure by introducing compressive strain along the [001] direction. The shape of the NiO6 octahedra affect the Ni-3dz2 density of states (DOS) at Fermi level, and it can be modulated by applying compressive strain instead of hydrostatic pressure. Notably, when the octahedral regularity parameter defined herein is R ~ 4%, La3Ni2O7 acquires a high Ni-3dz2 DOS and hole Fermi pocket. Our study thus indicates a path for obtaining superconductivity in La3Ni2O7 at ambient pressure and elucidates the relationship between structural properties and superconductivity.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Increasing Binary Trees and the $(α,β)$-Eulerian Polynomials
Authors:
William Y. C. Chen,
Amy M. Fu
Abstract:
In light of the grammar given by Ji for the $(α,β)$-Eulerian polynomials introduced by Carlitz and Scoville, we provide a labeling scheme for increasing binary trees. In this setting, we obtain a combinatorial interpretation of the $γ$-coefficients of the $α$-Eulerian polynomials in terms of forests of planted 0-1-2-plane trees, which specializes to a combinatorial interpretation of the $γ$-coeffi…
▽ More
In light of the grammar given by Ji for the $(α,β)$-Eulerian polynomials introduced by Carlitz and Scoville, we provide a labeling scheme for increasing binary trees. In this setting, we obtain a combinatorial interpretation of the $γ$-coefficients of the $α$-Eulerian polynomials in terms of forests of planted 0-1-2-plane trees, which specializes to a combinatorial interpretation of the $γ$-coefficients of the derangement polynomials in the same vein. By means of a decomposition of an increasing binary tree into a forest, we find combinatorial interpretations of the sums involving two identities of Ji, one of which can be viewed as $(α,β)$-extensions of the formulas of Petersen and Stembridge.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Engineering software 2.0 by interpolating neural networks: unifying training, solving, and calibration
Authors:
Chanwook Park,
Sourav Saha,
Jiachen Guo,
Xiaoyu Xie,
Satyajit Mojumder,
Miguel A. Bessa,
Dong Qian,
Wei Chen,
Gregory J. Wagner,
Jian Cao,
Wing Kam Liu
Abstract:
The evolution of artificial intelligence (AI) and neural network theories has revolutionized the way software is programmed, shifting from a hard-coded series of codes to a vast neural network. However, this transition in engineering software has faced challenges such as data scarcity, multi-modality of data, low model accuracy, and slow inference. Here, we propose a new network based on interpola…
▽ More
The evolution of artificial intelligence (AI) and neural network theories has revolutionized the way software is programmed, shifting from a hard-coded series of codes to a vast neural network. However, this transition in engineering software has faced challenges such as data scarcity, multi-modality of data, low model accuracy, and slow inference. Here, we propose a new network based on interpolation theories and tensor decomposition, the interpolating neural network (INN). Instead of interpolating training data, a common notion in computer science, INN interpolates interpolation points in the physical space whose coordinates and values are trainable. It can also extrapolate if the interpolation points reside outside of the range of training data and the interpolation functions have a larger support domain. INN features orders of magnitude fewer trainable parameters, faster training, a smaller memory footprint, and higher model accuracy compared to feed-forward neural networks (FFNN) or physics-informed neural networks (PINN). INN is poised to usher in Engineering Software 2.0, a unified neural network that spans various domains of space, time, parameters, and initial/boundary conditions. This has previously been computationally prohibitive due to the exponentially growing number of trainable parameters, easily exceeding the parameter size of ChatGPT, which is over 1 trillion. INN addresses this challenge by leveraging tensor decomposition and tensor product, with adaptable network architecture.
△ Less
Submitted 22 April, 2024; v1 submitted 16 April, 2024;
originally announced April 2024.
-
Building Semantic Communication System via Molecules: An End-to-End Training Approach
Authors:
Yukun Cheng,
Wei Chen,
Bo Ai
Abstract:
The concept of semantic communication provides a novel approach for applications in scenarios with limited communication resources. In this paper, we propose an end-to-end (E2E) semantic molecular communication system, aiming to enhance the efficiency of molecular communication systems by reducing the transmitted information. Specifically, following the joint source channel coding paradigm, the ne…
▽ More
The concept of semantic communication provides a novel approach for applications in scenarios with limited communication resources. In this paper, we propose an end-to-end (E2E) semantic molecular communication system, aiming to enhance the efficiency of molecular communication systems by reducing the transmitted information. Specifically, following the joint source channel coding paradigm, the network is designed to encode the task-relevant information into the concentration of the information molecules, which is robust to the degradation of the molecular communication channel. Furthermore, we propose a channel network to enable the E2E learning over the non-differentiable molecular channel. Experimental results demonstrate the superior performance of the semantic molecular communication system over the conventional methods in classification tasks.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Magic Clothing: Controllable Garment-Driven Image Synthesis
Authors:
Weifeng Chen,
Tao Gu,
Yuhao Xu,
Chengcai Chen
Abstract:
We propose Magic Clothing, a latent diffusion model (LDM)-based network architecture for an unexplored garment-driven image synthesis task. Aiming at generating customized characters wearing the target garments with diverse text prompts, the image controllability is the most critical issue, i.e., to preserve the garment details and maintain faithfulness to the text prompts. To this end, we introdu…
▽ More
We propose Magic Clothing, a latent diffusion model (LDM)-based network architecture for an unexplored garment-driven image synthesis task. Aiming at generating customized characters wearing the target garments with diverse text prompts, the image controllability is the most critical issue, i.e., to preserve the garment details and maintain faithfulness to the text prompts. To this end, we introduce a garment extractor to capture the detailed garment features, and employ self-attention fusion to incorporate them into the pretrained LDMs, ensuring that the garment details remain unchanged on the target character. Then, we leverage the joint classifier-free guidance to balance the control of garment features and text prompts over the generated results. Meanwhile, the proposed garment extractor is a plug-in module applicable to various finetuned LDMs, and it can be combined with other extensions like ControlNet and IP-Adapter to enhance the diversity and controllability of the generated characters. Furthermore, we design Matched-Points-LPIPS (MP-LPIPS), a robust metric for evaluating the consistency of the target image to the source garment. Extensive experiments demonstrate that our Magic Clothing achieves state-of-the-art results under various conditional controls for garment-driven image synthesis. Our source code is available at https://github.com/ShineChen1024/MagicClothing.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Disorder Chaos in Short-Range, Diluted, and Lévy Spin Glasses
Authors:
Wei-Kuo Chen,
Heejune Kim,
Arnab Sen
Abstract:
In a recent breakthrough [arXiv:2301.04112], Chatterjee proved site disorder chaos in the Edwards-Anderson (EA) short-range spin glass model utilizing the Hermite spectral method. In this paper, we demonstrate the further usefulness of this Hermite spectral approach by extending the validity of site disorder chaos in three related spin glass models.
The first, called the mixed even $p$-spin shor…
▽ More
In a recent breakthrough [arXiv:2301.04112], Chatterjee proved site disorder chaos in the Edwards-Anderson (EA) short-range spin glass model utilizing the Hermite spectral method. In this paper, we demonstrate the further usefulness of this Hermite spectral approach by extending the validity of site disorder chaos in three related spin glass models.
The first, called the mixed even $p$-spin short-range model, is a generalization of the EA model where the underlying graph is a deterministic bounded degree hypergraph consisting of hyperedges with even number of vertices. The second model is the diluted mixed $p$-spin model, which is allowed to have hyperedges with both odd and even number of vertices. For both models, our results hold under general symmetric disorder distributions. The main novelty of our argument is played by an elementary algebraic equation for the Fourier-Hermite series coefficients for the two-spin correlation functions. It allows us to deduce necessary geometric conditions to determine the contributing coefficients in the overlap function, which in spirit is the same as the crucial Lemma 1 in [arXiv:2301.04112]. Finally, we also establish disorder chaos in the Lévy model with stable index $α\in (1, 2)$.
△ Less
Submitted 13 June, 2024; v1 submitted 14 April, 2024;
originally announced April 2024.
-
Bridging Data Islands: Geographic Heterogeneity-Aware Federated Learning for Collaborative Remote Sensing Semantic Segmentation
Authors:
Jieyi Tan,
Yansheng Li,
Sergey A. Bartalev,
Bo Dang,
Wei Chen,
Yongjun Zhang,
Liangqi Yuan
Abstract:
Remote sensing semantic segmentation (RSS) is an essential task in Earth Observation missions. Due to data privacy concerns, high-quality remote sensing images with annotations cannot be well shared among institutions, making it difficult to fully utilize RSS data to train a generalized model. Federated Learning (FL), a privacy-preserving collaborative learning technology, is a potential solution.…
▽ More
Remote sensing semantic segmentation (RSS) is an essential task in Earth Observation missions. Due to data privacy concerns, high-quality remote sensing images with annotations cannot be well shared among institutions, making it difficult to fully utilize RSS data to train a generalized model. Federated Learning (FL), a privacy-preserving collaborative learning technology, is a potential solution. However, the current research on how to effectively apply FL in RSS is still scarce and requires further investigation. Remote sensing images in various institutions often exhibit strong geographical heterogeneity. More specifically, it is reflected in terms of class-distribution heterogeneity and object-appearance heterogeneity. Unfortunately, most existing FL studies show inadequate focus on geographical heterogeneity, thus leading to performance degradation in the global model. Considering the aforementioned issues, we propose a novel Geographic Heterogeneity-Aware Federated Learning (GeoFed) framework to address privacy-preserving RSS. Through Global Feature Extension and Tail Regeneration modules, class-distribution heterogeneity is alleviated. Additionally, we design an Essential Feature Mining strategy to alleviate object-appearance heterogeneity by constructing essential features. Extensive experiments on three datasets (i.e., FBP, CASID, Inria) show that our GeoFed consistently outperforms the current state-of-the-art methods. The code will be available publicly.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
Observation of the Josephson effect in superhydrides: DC SQUID based on (La,Ce)H$_{10}$ with operating temperature of 179 K
Authors:
Dmitrii V. Semenok,
Ivan A. Troyan,
Di Zhou,
Wuhao Chen,
Ho-kwang Mao,
Viktor V. Struzhkin
Abstract:
Among known materials, hydride superconductors have the highest critical temperatures and are very promising as a basis for electronic sensors. Superconducting quantum interference device (SQUID), due to its unique sensitivity to magnetic fields, is the most important application of superconductors in microelectronics. In this work, we describe a direct current SQUID made of lanthanum-cerium super…
▽ More
Among known materials, hydride superconductors have the highest critical temperatures and are very promising as a basis for electronic sensors. Superconducting quantum interference device (SQUID), due to its unique sensitivity to magnetic fields, is the most important application of superconductors in microelectronics. In this work, we describe a direct current SQUID made of lanthanum-cerium superhydride (La, Ce)H$_{10}$ at a pressure of 148 GPa, with an operating temperature of 179 K and a bias current of about 2 mA. When placing (La, Ce)H$_{10}$ in a modulated magnetic field (0.1-0.005 Hz, 5 G), we observed the generation of higher harmonics up to 18$ν$$_0$ and a periodic dependence of the sample resistance on the magnetic flux density R ${\propto}$ cos($π$$Φ$/$Φ$$_0$). We demonstrate that the (La, Ce)H$_{10}$ SQUID with a size of about 6 $μ$m, operates in the mode of low thermal fluctuations and can be used to detect magnetic fields below 0.1 G. Our findings pave the road to more advanced applications of the Josephson effect and SQUIDs made of hydride superconductors.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
Stable Acceleration of a LHe-Free Nb3Sn demo SRF e-linac Based on Conduction Cooling
Authors:
Ziqin Yang,
Yuan He,
Tiancai Jiang,
Feng Bai,
Fengfeng Wang,
Weilong Chen,
Guangze Jiang,
Yimeng Chu,
Hangxu Li,
Bo Zhao,
Guozhen Sun,
Zongheng Xue,
Yugang Zhao,
Zheng Gao,
Yaguang Li,
**ran Xiong,
Hao Guo,
Liepeng Sun,
Guirong Huang,
Zhijun Wang,
Junhui Zhang,
Teng Tan,
Hongwei Zhao,
Wenlong Zhan
Abstract:
The design, construction, and commissioning of a conduction-cooled Nb3Sn demonstration superconducting radio frequency (SRF) electron accelerator at the Institute of Modern Physics of the Chinese Academy of Sciences (IMP, CAS) will be presented. In the context of engineering application planning for Nb3Sn thin-film SRF cavities within the CiADS project, a 650MHz 5-cell elliptical cavity was coated…
▽ More
The design, construction, and commissioning of a conduction-cooled Nb3Sn demonstration superconducting radio frequency (SRF) electron accelerator at the Institute of Modern Physics of the Chinese Academy of Sciences (IMP, CAS) will be presented. In the context of engineering application planning for Nb3Sn thin-film SRF cavities within the CiADS project, a 650MHz 5-cell elliptical cavity was coated using the vapor diffusion method for electron beam acceleration. Through high-precision collaborative control of 10 GM cryocooler, slow cooldown of the cavity crossing 18K is achieved accompanied by obviously characteristic magnetic flux expulsion. The horizontal test results of the liquid helium-free (LHe-free) cryomodule show that the cavity can operate steadily at Epk=6.02MV/m in continuous wave (CW) mode, and at Epk=14.90MV/m in 40% duty cycle pulse mode. The beam acceleration experiment indicates that the maximum average current of the electron beam in the macropulse after acceleration exceeds 200mA, with a maximum energy gain of 4.6MeV. The results provide a principle validation for the engineering application of Nb3Sn thin-film SRF cavities, highlighting the promising industrial application prospects of a small-scale compact Nb3Sn SRF accelerator driven by commercial cryocoolers.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models
Authors:
Yingchaojie Feng,
Zhizhang Chen,
Zhining Kang,
Sijia Wang,
Minfeng Zhu,
Wei Zhang,
Wei Chen
Abstract:
The proliferation of large language models (LLMs) has underscored concerns regarding their security vulnerabilities, notably against jailbreak attacks, where adversaries design jailbreak prompts to circumvent safety mechanisms for potential misuse. Addressing these concerns necessitates a comprehensive analysis of jailbreak prompts to evaluate LLMs' defensive capabilities and identify potential we…
▽ More
The proliferation of large language models (LLMs) has underscored concerns regarding their security vulnerabilities, notably against jailbreak attacks, where adversaries design jailbreak prompts to circumvent safety mechanisms for potential misuse. Addressing these concerns necessitates a comprehensive analysis of jailbreak prompts to evaluate LLMs' defensive capabilities and identify potential weaknesses. However, the complexity of evaluating jailbreak performance and understanding prompt characteristics makes this analysis laborious. We collaborate with domain experts to characterize problems and propose an LLM-assisted framework to streamline the analysis process. It provides automatic jailbreak assessment to facilitate performance evaluation and support analysis of components and keywords in prompts. Based on the framework, we design JailbreakLens, a visual analysis system that enables users to explore the jailbreak performance against the target model, conduct multi-level analysis of prompt characteristics, and refine prompt instances to verify findings. Through a case study, technical evaluations, and expert interviews, we demonstrate our system's effectiveness in hel** users evaluate model security and identify model weaknesses.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
Flashlights: Microlensing vs Stellar Variability of Transients in the Star Clusters of the Dragon Arc
Authors:
Sung Kei Li,
Patrick L. Kelly,
Jose M. Diego,
Jeremy Lim,
WenLei Chen,
Amruth Alfred,
Liliya L. R. Williams,
Thomas J. Broadhurst,
Ashish. K. Meena,
Adi Zitrin,
Alex Chow
Abstract:
We study the nature of transient events detected in the "Dragon Arc", a star-forming galaxy at a redshift of $0.7251$ that is gravitationally lensed by the galaxy cluster Abell 370. In particular, we focus on a subset of ten transients that are identified as unresolved young star clusters in the deep broadband, F200LP, taken as part of the "Flashlights" Hubble Space Telescope program, showing flux…
▽ More
We study the nature of transient events detected in the "Dragon Arc", a star-forming galaxy at a redshift of $0.7251$ that is gravitationally lensed by the galaxy cluster Abell 370. In particular, we focus on a subset of ten transients that are identified as unresolved young star clusters in the deep broadband, F200LP, taken as part of the "Flashlights" Hubble Space Telescope program, showing flux variations of $\sim 10-20\%$ over a period of about a year. Here we develop several methods to address whether stellar microlensing alone is capable of explaining the transients, or whether intrinsic stellar outbursts or variability are required to explain them. We first present a lens model that has new constraints in the Dragon Arc itself to understand the properties of the lensed young star clusters. Using our improved galaxy-cluster lens model, we simulate the effect of microlensing on the flux variation for unresolved stars within lensed young star clusters. We find good agreement between the observed and the expected detection rates of microlensing events by intracluster stars of young star clusters within $1σ$. However, we cannot fully exclude the possibility that a minority of these transients are caused by intrinsic stellar variability such as outbursts of Luminous Blue Variables (LBVs). With JWST observations taken recently or coming in the near future, the color information will be able to break the degeneracy and definitively test whether or not these lensed young star cluster transients are caused by stellar microlensing.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
MoPE: Mixture of Prefix Experts for Zero-Shot Dialogue State Tracking
Authors:
Tianwen Tang,
Tong Zhu,
Haodong Liu,
Yin Bai,
Jia Cheng,
Wenliang Chen
Abstract:
Zero-shot dialogue state tracking (DST) transfers knowledge to unseen domains, reducing the cost of annotating new datasets. Previous zero-shot DST models mainly suffer from domain transferring and partial prediction problems. To address these challenges, we propose Mixture of Prefix Experts (MoPE) to establish connections between similar slots in different domains, which strengthens the model tra…
▽ More
Zero-shot dialogue state tracking (DST) transfers knowledge to unseen domains, reducing the cost of annotating new datasets. Previous zero-shot DST models mainly suffer from domain transferring and partial prediction problems. To address these challenges, we propose Mixture of Prefix Experts (MoPE) to establish connections between similar slots in different domains, which strengthens the model transfer performance in unseen domains. Empirical results demonstrate that MoPE-DST achieves the joint goal accuracy of 57.13% on MultiWOZ2.1 and 55.40% on SGD.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
Communication-Efficient Model Aggregation with Layer Divergence Feedback in Federated Learning
Authors:
Liwei Wang,
Jun Li,
Wen Chen,
Qingqing Wu,
Ming Ding
Abstract:
Federated Learning (FL) facilitates collaborative machine learning by training models on local datasets, and subsequently aggregating these local models at a central server. However, the frequent exchange of model parameters between clients and the central server can result in significant communication overhead during the FL training process. To solve this problem, this paper proposes a novel FL f…
▽ More
Federated Learning (FL) facilitates collaborative machine learning by training models on local datasets, and subsequently aggregating these local models at a central server. However, the frequent exchange of model parameters between clients and the central server can result in significant communication overhead during the FL training process. To solve this problem, this paper proposes a novel FL framework, the Model Aggregation with Layer Divergence Feedback mechanism (FedLDF). Specifically, we calculate model divergence between the local model and the global model from the previous round. Then through model layer divergence feedback, the distinct layers of each client are uploaded and the amount of data transferred is reduced effectively. Moreover, the convergence bound reveals that the access ratio of clients has a positive correlation with model performance. Simulation results show that our algorithm uploads local models with reduced communication overhead while upholding a superior global model performance.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
JWST Discovery of $40+$ Microlensed Stars in a Magnified Galaxy, the "Dragon" behind Abell 370
Authors:
Yoshinobu Fudamoto,
Fengwu Sun,
Jose M. Diego,
Liang Dai,
Masamune Oguri,
Adi Zitrin,
Erik Zackrisson,
Mathilde Jauzac,
David J. Lagattuta,
Eiichi Egami,
Edoardo Iani,
Rogier A. Windhorst,
Katsuya T. Abe,
Franz Erik Bauer,
Fuyan Bian,
Rachana Bhatawdekar,
Thomas J. Broadhurst,
Zheng Cai,
Chian-Chou Chen,
Wenlei Chen,
Seth H. Cohen,
Christopher J. Conselice,
Daniel Espada,
Nicholas Foo,
Brenda L. Frye
, et al. (21 additional authors not shown)
Abstract:
Strong gravitational magnification by massive galaxy clusters enable us to detect faint background sources, resolve their detailed internal structures, and in the most extreme cases identify and study individual stars in distant galaxies. Highly magnified individual stars allow for a wide range of applications, including studies of stellar populations in distant galaxies and constraining small-sca…
▽ More
Strong gravitational magnification by massive galaxy clusters enable us to detect faint background sources, resolve their detailed internal structures, and in the most extreme cases identify and study individual stars in distant galaxies. Highly magnified individual stars allow for a wide range of applications, including studies of stellar populations in distant galaxies and constraining small-scale dark matter structures. However, these applications have been hampered by the small number of events observed, as typically one or a few stars are identified from each distant galaxy. Here, we report the discovery of 46 significant microlensed stars in a single strongly-lensed high-redshift galaxy behind the Abell 370 cluster at redshift of 0.725 when the Universe was half of its current age (dubbed the ``Dragon arc''), based on two observations separated by one year with the James Webb Space Telescope ({\it JWST}). These events are mostly found near the expected lensing critical curves, suggesting that these are magnified individual stars that appear as transients from intracluster stellar microlenses. Through multi-wavelength photometry and colors, we constrain stellar types and find that many of them are consistent with red giants/supergiants magnified by factors of thousands. This finding reveals an unprecedented high occurrence of microlensing events in the Dragon arc, and proves that {\it JWST}'s time-domain observations open up the possibility of conducting statistical studies of high-redshift stars and subgalactic scale perturbations in the lensing dark matter field.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Rho-1: Not All Tokens Are What You Need
Authors:
Zhenghao Lin,
Zhibin Gou,
Yeyun Gong,
Xiao Liu,
Yelong Shen,
Ruochen Xu,
Chen Lin,
Yujiu Yang,
Jian Jiao,
Nan Duan,
Weizhu Chen
Abstract:
Previous language model pre-training methods have uniformly applied a next-token prediction loss to all training tokens. Challenging this norm, we posit that ''Not all tokens in a corpus are equally important for language model training''. Our initial analysis examines token-level training dynamics of language model, revealing distinct loss patterns for different tokens. Leveraging these insights,…
▽ More
Previous language model pre-training methods have uniformly applied a next-token prediction loss to all training tokens. Challenging this norm, we posit that ''Not all tokens in a corpus are equally important for language model training''. Our initial analysis examines token-level training dynamics of language model, revealing distinct loss patterns for different tokens. Leveraging these insights, we introduce a new language model called Rho-1. Unlike traditional LMs that learn to predict every next token in a corpus, Rho-1 employs Selective Language Modeling (SLM), which selectively trains on useful tokens that aligned with the desired distribution. This approach involves scoring pretraining tokens using a reference model, and then training the language model with a focused loss on tokens with higher scores. When continual pretraining on 15B OpenWebMath corpus, Rho-1 yields an absolute improvement in few-shot accuracy of up to 30% in 9 math tasks. After fine-tuning, Rho-1-1B and 7B achieved state-of-the-art results of 40.6% and 51.8% on MATH dataset, respectively - matching DeepSeekMath with only 3% of the pretraining tokens. Furthermore, when pretraining on 80B general tokens, Rho-1 achieves 6.8% average enhancement across 15 diverse tasks, increasing both efficiency and performance of the language model pre-training.
△ Less
Submitted 23 May, 2024; v1 submitted 11 April, 2024;
originally announced April 2024.
-
Low-rank Adaptation for Spatio-Temporal Forecasting
Authors:
Weilin Ruan,
Wei Chen,
Xilin Dang,
Jianxiang Zhou,
Weichuang Li,
Xu Liu,
Yuxuan Liang
Abstract:
Spatio-temporal forecasting is crucial in real-world dynamic systems, predicting future changes using historical data from diverse locations. Existing methods often prioritize the development of intricate neural networks to capture the complex dependencies of the data, yet their accuracy fails to show sustained improvement. Besides, these methods also overlook node heterogeneity, hindering customi…
▽ More
Spatio-temporal forecasting is crucial in real-world dynamic systems, predicting future changes using historical data from diverse locations. Existing methods often prioritize the development of intricate neural networks to capture the complex dependencies of the data, yet their accuracy fails to show sustained improvement. Besides, these methods also overlook node heterogeneity, hindering customized prediction modules from handling diverse regional nodes effectively. In this paper, our goal is not to propose a new model but to present a novel low-rank adaptation framework as an off-the-shelf plugin for existing spatial-temporal prediction models, termed ST-LoRA, which alleviates the aforementioned problems through node-level adjustments. Specifically, we first tailor a node adaptive low-rank layer comprising multiple trainable low-rank matrices. Additionally, we devise a multi-layer residual fusion stacking module, injecting the low-rank adapters into predictor modules of various models. Across six real-world traffic datasets and six different types of spatio-temporal prediction models, our approach minimally increases the parameters and training time of the original models by less than 4%, still achieving consistent and sustained performance enhancement.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Transferable and Principled Efficiency for Open-Vocabulary Segmentation
Authors:
**gxuan Xu,
Wuyang Chen,
Yao Zhao,
Yunchao Wei
Abstract:
Recent success of pre-trained foundation vision-language models makes Open-Vocabulary Segmentation (OVS) possible. Despite the promising performance, this approach introduces heavy computational overheads for two challenges: 1) large model sizes of the backbone; 2) expensive costs during the fine-tuning. These challenges hinder this OVS strategy from being widely applicable and affordable in real-…
▽ More
Recent success of pre-trained foundation vision-language models makes Open-Vocabulary Segmentation (OVS) possible. Despite the promising performance, this approach introduces heavy computational overheads for two challenges: 1) large model sizes of the backbone; 2) expensive costs during the fine-tuning. These challenges hinder this OVS strategy from being widely applicable and affordable in real-world scenarios. Although traditional methods such as model compression and efficient fine-tuning can address these challenges, they often rely on heuristics. This means that their solutions cannot be easily transferred and necessitate re-training on different models, which comes at a cost. In the context of efficient OVS, we target achieving performance that is comparable to or even better than prior OVS works based on large vision-language foundation models, by utilizing smaller models that incur lower training costs. The core strategy is to make our efficiency principled and thus seamlessly transferable from one OVS framework to others without further customization. Comprehensive experiments on diverse OVS benchmarks demonstrate our superior trade-off between segmentation accuracy and computation costs over previous works. Our code is available on https://github.com/Xujxyang/OpenTrans
△ Less
Submitted 3 June, 2024; v1 submitted 10 April, 2024;
originally announced April 2024.
-
Joint Active And Passive IRS Aided Wireless Communication: Elements Allocation and Achievable Rate
Authors:
Chaoying Huang,
Wen Chen,
Qingqing Wu
Abstract:
Equip** reflecting elements at the active intelligent reflecting surface (AIRS) enhances signal amplification capability but meanwhile incurs non-negligible amplification noise, which thus challenges the determination of elements allocation for maximizing achievable rate in multi-cooperative AIRS and passive IRS (PIRS) jointly aided wireless communication system. To tackle this issue, we conside…
▽ More
Equip** reflecting elements at the active intelligent reflecting surface (AIRS) enhances signal amplification capability but meanwhile incurs non-negligible amplification noise, which thus challenges the determination of elements allocation for maximizing achievable rate in multi-cooperative AIRS and passive IRS (PIRS) jointly aided wireless communication system. To tackle this issue, we consider the downlink communication from a single-antenna transmitter (Tx) to a single-antenna receiver (Rx), which aided by a pair of AIRS and PIRS with two different deployment orders. Specifically, we target to determine the number of AIRS/PIRS elements over both transmission orders under given deployment budget for the achievable rate maximization. Our analysis illustrates that the PIRS should be allocated more elements than the AIRS for achieving optimized rate and linear signal-to-noise ratio (SNR) scaling orders are attained in both schemes. Simulation results are provided to evaluate the proposed algorithm and compare the rate performance of the AIRS and PIRS jointly aided wireless system with various benchmark systems.
△ Less
Submitted 10 April, 2024; v1 submitted 10 April, 2024;
originally announced April 2024.
-
Does Mapo Tofu Contain Coffee? Probing LLMs for Food-related Cultural Knowledge
Authors:
Li Zhou,
Taelin Karidi,
Nicolas Garneau,
Yong Cao,
Wanlong Liu,
Wenyu Chen,
Daniel Hershcovich
Abstract:
Recent studies have highlighted the presence of cultural biases in Large Language Models (LLMs), yet often lack a robust methodology to dissect these phenomena comprehensively. Our work aims to bridge this gap by delving into the Food domain, a universally relevant yet culturally diverse aspect of human life. We introduce FmLAMA, a multilingual dataset centered on food-related cultural facts and v…
▽ More
Recent studies have highlighted the presence of cultural biases in Large Language Models (LLMs), yet often lack a robust methodology to dissect these phenomena comprehensively. Our work aims to bridge this gap by delving into the Food domain, a universally relevant yet culturally diverse aspect of human life. We introduce FmLAMA, a multilingual dataset centered on food-related cultural facts and variations in food practices. We analyze LLMs across various architectures and configurations, evaluating their performance in both monolingual and multilingual settings. By leveraging templates in six different languages, we investigate how LLMs interact with language-specific and cultural knowledge. Our findings reveal that (1) LLMs demonstrate a pronounced bias towards food knowledge prevalent in the United States; (2) Incorporating relevant cultural context significantly improves LLMs' ability to access cultural knowledge; (3) The efficacy of LLMs in capturing cultural nuances is highly dependent on the interplay between the probing language, the specific model architecture, and the cultural context in question. This research underscores the complexity of integrating cultural understanding into LLMs and emphasizes the importance of culturally diverse datasets to mitigate biases and enhance model performance across different cultural domains.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
DiffusionDialog: A Diffusion Model for Diverse Dialog Generation with Latent Space
Authors:
Jianxiang Xiang,
Zhenhua Liu,
Haodong Liu,
Yin Bai,
Jia Cheng,
Wenliang Chen
Abstract:
In real-life conversations, the content is diverse, and there exists the one-to-many problem that requires diverse generation. Previous studies attempted to introduce discrete or Gaussian-based continuous latent variables to address the one-to-many problem, but the diversity is limited. Recently, diffusion models have made breakthroughs in computer vision, and some attempts have been made in natur…
▽ More
In real-life conversations, the content is diverse, and there exists the one-to-many problem that requires diverse generation. Previous studies attempted to introduce discrete or Gaussian-based continuous latent variables to address the one-to-many problem, but the diversity is limited. Recently, diffusion models have made breakthroughs in computer vision, and some attempts have been made in natural language processing. In this paper, we propose DiffusionDialog, a novel approach to enhance the diversity of dialogue generation with the help of diffusion model. In our approach, we introduce continuous latent variables into the diffusion model. The problem of using latent variables in the dialog task is how to build both an effective prior of the latent space and an inferring process to obtain the proper latent given the context. By combining the encoder and latent-based diffusion model, we encode the response's latent representation in a continuous space as the prior, instead of fixed Gaussian distribution or simply discrete ones. We then infer the latent by denoising step by step with the diffusion model. The experimental results show that our model greatly enhances the diversity of dialog responses while maintaining coherence. Furthermore, in further analysis, we find that our diffusion model achieves high inference efficiency, which is the main challenge of applying diffusion models in natural language processing.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
SpikeNVS: Enhancing Novel View Synthesis from Blurry Images via Spike Camera
Authors:
Gaole Dai,
Zhenyu Wang,
Qinwen Xu,
Ming Lu,
Wen Chen,
Boxin Shi,
Shanghang Zhang,
Tiejun Huang
Abstract:
One of the most critical factors in achieving sharp Novel View Synthesis (NVS) using neural field methods like Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) is the quality of the training images. However, Conventional RGB cameras are susceptible to motion blur. In contrast, neuromorphic cameras like event and spike cameras inherently capture more comprehensive temporal information…
▽ More
One of the most critical factors in achieving sharp Novel View Synthesis (NVS) using neural field methods like Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) is the quality of the training images. However, Conventional RGB cameras are susceptible to motion blur. In contrast, neuromorphic cameras like event and spike cameras inherently capture more comprehensive temporal information, which can provide a sharp representation of the scene as additional training data. Recent methods have explored the integration of event cameras to improve the quality of NVS. The event-RGB approaches have some limitations, such as high training costs and the inability to work effectively in the background. Instead, our study introduces a new method that uses the spike camera to overcome these limitations. By considering texture reconstruction from spike streams as ground truth, we design the Texture from Spike (TfS) loss. Since the spike camera relies on temporal integration instead of temporal differentiation used by event cameras, our proposed TfS loss maintains manageable training costs. It handles foreground objects with backgrounds simultaneously. We also provide a real-world dataset captured with our spike-RGB camera system to facilitate future research endeavors. We conduct extensive experiments using synthetic and real-world datasets to demonstrate that our design can enhance novel view synthesis across NeRF and 3DGS. The code and dataset will be made available for public access.
△ Less
Submitted 12 April, 2024; v1 submitted 9 April, 2024;
originally announced April 2024.
-
Spiral Scanning and Self-Supervised Image Reconstruction Enable Ultra-Sparse Sampling Multispectral Photoacoustic Tomography
Authors:
Yutian Zhong,
Xiaoming Zhang,
Zongxin Mo,
Shuangyang Zhang,
Wufan Chen,
Li Qi
Abstract:
Multispectral photoacoustic tomography (PAT) is an imaging modality that utilizes the photoacoustic effect to achieve non-invasive and high-contrast imaging of internal tissues. However, the hardware cost and computational demand of a multispectral PAT system consisting of up to thousands of detectors are huge. To address this challenge, we propose an ultra-sparse spiral sampling strategy for mult…
▽ More
Multispectral photoacoustic tomography (PAT) is an imaging modality that utilizes the photoacoustic effect to achieve non-invasive and high-contrast imaging of internal tissues. However, the hardware cost and computational demand of a multispectral PAT system consisting of up to thousands of detectors are huge. To address this challenge, we propose an ultra-sparse spiral sampling strategy for multispectral PAT, which we named U3S-PAT. Our strategy employs a sparse ring-shaped transducer that, when switching excitation wavelengths, simultaneously rotates and translates. This creates a spiral scanning pattern with multispectral angle-interlaced sampling. To solve the highly ill-conditioned image reconstruction problem, we propose a self-supervised learning method that is able to introduce structural information shared during spiral scanning. We simulate the proposed U3S-PAT method on a commercial PAT system and conduct in vivo animal experiments to verify its performance. The results show that even with a sparse sampling rate as low as 1/30, our U3S-PAT strategy achieves similar reconstruction and spectral unmixing accuracy as non-spiral dense sampling. Given its ability to dramatically reduce the time required for three-dimensional multispectral scanning, our U3S-PAT strategy has the potential to perform volumetric molecular imaging of dynamic biological activities.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
MuPT: A Generative Symbolic Music Pretrained Transformer
Authors:
Xingwei Qu,
Yuelin Bai,
Yinghao Ma,
Ziya Zhou,
Ka Man Lo,
Jiaheng Liu,
Ruibin Yuan,
Lejun Min,
Xueling Liu,
Tianyu Zhang,
Xinrun Du,
Shuyue Guo,
Yiming Liang,
Yizhi Li,
Shangda Wu,
Junting Zhou,
Tianyu Zheng,
Ziyang Ma,
Fengze Han,
Wei Xue,
Gus Xia,
Emmanouil Benetos,
Xiang Yue,
Chenghua Lin,
Xu Tan
, et al. (4 additional authors not shown)
Abstract:
In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the chal…
▽ More
In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the challenges associated with misaligned measures from different tracks during generation, we propose the development of a Synchronized Multi-Track ABC Notation (SMT-ABC Notation), which aims to preserve coherence across multiple musical tracks. Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set. Furthermore, we explore the implications of the Symbolic Music Scaling Law (SMS Law) on model performance. The results indicate a promising direction for future research in music generation, offering extensive resources for community-led research through our open-source contributions.
△ Less
Submitted 10 April, 2024; v1 submitted 9 April, 2024;
originally announced April 2024.
-
Incremental Joint Learning of Depth, Pose and Implicit Scene Representation on Monocular Camera in Large-scale Scenes
Authors:
Tianchen Deng,
Nailin Wang,
Chongdi Wang,
Shenghai Yuan,
**gchuan Wang,
Danwei Wang,
Weidong Chen
Abstract:
Dense scene reconstruction for photo-realistic view synthesis has various applications, such as VR/AR, autonomous vehicles. However, most existing methods have difficulties in large-scale scenes due to three core challenges: \textit{(a) inaccurate depth input.} Accurate depth input is impossible to get in real-world large-scale scenes. \textit{(b) inaccurate pose estimation.} Most existing approac…
▽ More
Dense scene reconstruction for photo-realistic view synthesis has various applications, such as VR/AR, autonomous vehicles. However, most existing methods have difficulties in large-scale scenes due to three core challenges: \textit{(a) inaccurate depth input.} Accurate depth input is impossible to get in real-world large-scale scenes. \textit{(b) inaccurate pose estimation.} Most existing approaches rely on accurate pre-estimated camera poses. \textit{(c) insufficient scene representation capability.} A single global radiance field lacks the capacity to effectively scale to large-scale scenes. To this end, we propose an incremental joint learning framework, which can achieve accurate depth, pose estimation, and large-scale scene reconstruction. A vision transformer-based network is adopted as the backbone to enhance performance in scale information estimation. For pose estimation, a feature-metric bundle adjustment (FBA) method is designed for accurate and robust camera tracking in large-scale scenes. In terms of implicit scene representation, we propose an incremental scene representation method to construct the entire large-scale scene as multiple local radiance fields to enhance the scalability of 3D scene representation. Extended experiments have been conducted to demonstrate the effectiveness and accuracy of our method in depth estimation, pose estimation, and large-scale scene reconstruction.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Cymatics Cup: Shape-Changing Drinks by Leveraging Cymatics
Authors:
Weijen Chen,
Yang Yang,
Kao-Hua Liu,
Yun Suen Pai,
Junichi Yamaoka,
Kouta Minamizawa
Abstract:
To enhance the dining experience, prior studies in Human-Computer Interaction (HCI) and gastrophysics have demonstrated that modifying the static shape of solid foods can amplify taste perception. However, the exploration of dynamic shape-changing mechanisms in liquid foods remains largely untapped. In the present study, we employ cymatics, a scientific discipline focused on utilizing sound freque…
▽ More
To enhance the dining experience, prior studies in Human-Computer Interaction (HCI) and gastrophysics have demonstrated that modifying the static shape of solid foods can amplify taste perception. However, the exploration of dynamic shape-changing mechanisms in liquid foods remains largely untapped. In the present study, we employ cymatics, a scientific discipline focused on utilizing sound frequencies to generate patterns in liquids and particles to augment the drinking experience. Utilizing speakers, we dynamically reshaped liquids exhibiting five distinct taste profiles and evaluated resultant changes in taste perception and drinking experience. Our research objectives extend beyond merely augmenting taste from visual to tactile sensations; we also prioritize the experiential aspects of drinking. Through a series of experiments and workshops, we revealed a significant impact on taste perception and overall drinking experience when mediated by cymatics effects. Building upon these findings, we designed and developed tableware to integrate cymatics principles into gastronomic experiences.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Interplay of Machine Translation, Diacritics, and Diacritization
Authors:
Wei-Rui Chen,
Ife Adebara,
Muhammad Abdul-Mageed
Abstract:
We investigate two research questions: (1) how do machine translation (MT) and diacritization influence the performance of each other in a multi-task learning setting (2) the effect of kee** (vs. removing) diacritics on MT performance. We examine these two questions in both high-resource (HR) and low-resource (LR) settings across 55 different languages (36 African languages and 19 European langu…
▽ More
We investigate two research questions: (1) how do machine translation (MT) and diacritization influence the performance of each other in a multi-task learning setting (2) the effect of kee** (vs. removing) diacritics on MT performance. We examine these two questions in both high-resource (HR) and low-resource (LR) settings across 55 different languages (36 African languages and 19 European languages). For (1), results show that diacritization significantly benefits MT in the LR scenario, doubling or even tripling performance for some languages, but harms MT in the HR scenario. We find that MT harms diacritization in LR but benefits significantly in HR for some languages. For (2), MT performance is similar regardless of diacritics being kept or removed. In addition, we propose two classes of metrics to measure the complexity of a diacritical system, finding these metrics to correlate positively with the performance of our diacritization models. Overall, our work provides insights for develo** MT and diacritization systems under different data size conditions and may have implications that generalize beyond the 55 languages we investigate.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Observation of dichotomic field-tunable electronic structure in twisted monolayer-bilayer graphene
Authors:
Hongyun Zhang,
Qian Li,
Youngju Park,
Yu** Jia,
Wanying Chen,
Jiaheng Li,
Qinxin Liu,
Changhua Bao,
Nicolas Leconte,
Shaohua Zhou,
Yuan Wang,
Kenji Watanabe,
Takashi Taniguchi,
Jose Avila,
Pavel Dudin,
Pu Yu,
Hongming Weng,
Wenhui Duan,
Quansheng Wu,
Jeil Jung,
Shuyun Zhou
Abstract:
Twisted bilayer graphene (tBLG) provides a fascinating platform for engineering flat bands and inducing correlated phenomena. By designing the stacking architecture of graphene layers, twisted multilayer graphene can exhibit different symmetries with rich tunability. For example, in twisted monolayer-bilayer graphene (tMBG) which breaks the C2z symmetry, transport measurements reveal an asymmetric…
▽ More
Twisted bilayer graphene (tBLG) provides a fascinating platform for engineering flat bands and inducing correlated phenomena. By designing the stacking architecture of graphene layers, twisted multilayer graphene can exhibit different symmetries with rich tunability. For example, in twisted monolayer-bilayer graphene (tMBG) which breaks the C2z symmetry, transport measurements reveal an asymmetric phase diagram under an out-of-plane electric field, exhibiting correlated insulating state and ferromagnetic state respectively when reversing the field direction. Revealing how the electronic structure evolves with electric field is critical for providing a better understanding of such asymmetric field-tunable properties. Here we report the experimental observation of field-tunable dichotomic electronic structure of tMBG by nanospot angle-resolved photoemission spectroscopy (NanoARPES) with operando gating. Interestingly, selective enhancement of the relative spectral weight contributions from monolayer and bilayer graphene is observed when switching the polarity of the bias voltage. Combining experimental results with theoretical calculations, the origin of such field-tunable electronic structure, resembling either tBLG or twisted double-bilayer graphene (tDBG), is attributed to the selectively enhanced contribution from different stacking graphene layers with a strong electron-hole asymmetry. Our work provides electronic structure insights for understanding the rich field-tunable physics of tMBG.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Decision Transformer for Wireless Communications: A New Paradigm of Resource Management
Authors:
Jie Zhang,
Jun Li,
Long Shi,
Zhe Wang,
Shi **,
Wen Chen,
H. Vincent Poor
Abstract:
As the next generation of mobile systems evolves, artificial intelligence (AI) is expected to deeply integrate with wireless communications for resource management in variable environments. In particular, deep reinforcement learning (DRL) is an important tool for addressing stochastic optimization issues of resource allocation. However, DRL has to start each new training process from the beginning…
▽ More
As the next generation of mobile systems evolves, artificial intelligence (AI) is expected to deeply integrate with wireless communications for resource management in variable environments. In particular, deep reinforcement learning (DRL) is an important tool for addressing stochastic optimization issues of resource allocation. However, DRL has to start each new training process from the beginning once the state and action spaces change, causing low sample efficiency and poor generalization ability. Moreover, each DRL training process may take a large number of epochs to converge, which is unacceptable for time-sensitive scenarios. In this paper, we adopt an alternative AI technology, namely, the Decision Transformer (DT), and propose a DT-based adaptive decision architecture for wireless resource management. This architecture innovates through constructing pre-trained models in the cloud and then fine-tuning personalized models at the edges. By leveraging the power of DT models learned over extensive datasets, the proposed architecture is expected to achieve rapid convergence with many fewer training epochs and higher performance in a new context, e.g., similar tasks with different state and action spaces, compared with DRL. We then design DT frameworks for two typical communication scenarios: Intelligent reflecting surfaces-aided communications and unmanned aerial vehicle-aided edge computing. Simulations demonstrate that the proposed DT frameworks achieve over $3$-$6$ times speedup in convergence and better performance relative to the classic DRL method, namely, proximal policy optimization.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
A Note on LoRA
Authors:
Vlad Fomenko,
Han Yu,
Jongho Lee,
Stanley Hsieh,
Weizhu Chen
Abstract:
LoRA (Low-Rank Adaptation) has emerged as a preferred method for efficiently adapting Large Language Models (LLMs) with remarkable simplicity and efficacy. This note extends the original LoRA paper by offering new perspectives that were not initially discussed and presents a series of insights for deploying LoRA at scale. Without introducing new experiments, we aim to improve the understanding and…
▽ More
LoRA (Low-Rank Adaptation) has emerged as a preferred method for efficiently adapting Large Language Models (LLMs) with remarkable simplicity and efficacy. This note extends the original LoRA paper by offering new perspectives that were not initially discussed and presents a series of insights for deploying LoRA at scale. Without introducing new experiments, we aim to improve the understanding and application of LoRA.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Authors:
Xinrun Du,
Zhouliang Yu,
Songyang Gao,
Ding Pan,
Yuyang Cheng,
Ziyang Ma,
Ruibin Yuan,
Xingwei Qu,
Jiaheng Liu,
Tianyu Zheng,
Xinchen Luo,
Guorui Zhou,
Binhang Yuan,
Wenhu Chen,
Jie Fu,
Ge Zhang
Abstract:
In this study, we introduce CT-LLM, a 2B large language model (LLM) that illustrates a pivotal shift towards prioritizing the Chinese language in develo** LLMs. Uniquely initiated from scratch, CT-LLM diverges from the conventional methodology by primarily incorporating Chinese textual data, utilizing an extensive corpus of 1,200 billion tokens, including 800 billion Chinese tokens, 300 billion…
▽ More
In this study, we introduce CT-LLM, a 2B large language model (LLM) that illustrates a pivotal shift towards prioritizing the Chinese language in develo** LLMs. Uniquely initiated from scratch, CT-LLM diverges from the conventional methodology by primarily incorporating Chinese textual data, utilizing an extensive corpus of 1,200 billion tokens, including 800 billion Chinese tokens, 300 billion English tokens, and 100 billion code tokens. This strategic composition facilitates the model's exceptional proficiency in understanding and processing Chinese, a capability further enhanced through alignment techniques. Demonstrating remarkable performance on the CHC-Bench, CT-LLM excels in Chinese language tasks, and showcases its adeptness in English through SFT. This research challenges the prevailing paradigm of training LLMs predominantly on English corpora and then adapting them to other languages, broadening the horizons for LLM training methodologies. By open-sourcing the full process of training a Chinese LLM, including a detailed data processing procedure with the obtained Massive Appropriate Pretraining Chinese Corpus (MAP-CC), a well-chosen multidisciplinary Chinese Hard Case Benchmark (CHC-Bench), and the 2B-size Chinese Tiny LLM (CT-LLM), we aim to foster further exploration and innovation in both academia and industry, paving the way for more inclusive and versatile language models.
△ Less
Submitted 9 April, 2024; v1 submitted 5 April, 2024;
originally announced April 2024.
-
Evolutionary Origin of Ultra-long Period Radio Transients
Authors:
Yun-Ning Fan,
Kun Xu,
Wen-Cong Chen
Abstract:
Recently, it discovered two ultra-long period radio transients GLEAM-X J162759.5-523504.3 (J1627) and GPM J1839$-$10 (J1839) with spin periods longer than 1000 s. The origin of these two ultra-long period radio transients is intriguing in understanding the spin evolution of neutron stars (NSs). In this work, we diagnose whether the interaction between strong magnetized NSs and fallback disks can s…
▽ More
Recently, it discovered two ultra-long period radio transients GLEAM-X J162759.5-523504.3 (J1627) and GPM J1839$-$10 (J1839) with spin periods longer than 1000 s. The origin of these two ultra-long period radio transients is intriguing in understanding the spin evolution of neutron stars (NSs). In this work, we diagnose whether the interaction between strong magnetized NSs and fallback disks can spin NSs down to the observed ultra-long period. Our simulations found that the magnetar+fallback disk model can account for the observed period, period derivative, and X-ray luminosity of J1627 in the quasi-spin-equilibrium stage. To evolve to the current state of J1627, the initial mass-accretion rate of the fallback disk and the magnetic field of the NS are in the range of $(1.1-30)\times10^{24}~\rm g\,s^{-1}$ and $(2-5)\times10^{14}~\rm G$, respectively. In an active lifetime of fallback disk, J1839 is impossible to achieve the observed upper limit of period derivative. Therefore, we propose that J1839 may be in the second ejector phase after the fallback disk becomes inactive. Those NSs with a magnetic field of $(2-6)\times10^{14}~\rm G$ and a fallback disk with an initial mass-accretion rate of $\sim10^{24}-10^{26}~\rm g\,s^{-1}$ are the possible progenitors of J1839.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
No Panacea in Planning: Algorithm Selection for Suboptimal Multi-Agent Path Finding
Authors:
Weizhe Chen,
Zhihan Wang,
Jiaoyang Li,
Sven Koenig,
Bistra Dilkina
Abstract:
Since more and more algorithms are proposed for multi-agent path finding (MAPF) and each of them has its strengths, choosing the correct one for a specific scenario that fulfills some specified requirements is an important task. Previous research in algorithm selection for MAPF built a standard workflow and showed that machine learning can help. In this paper, we study general solvers for MAPF, wh…
▽ More
Since more and more algorithms are proposed for multi-agent path finding (MAPF) and each of them has its strengths, choosing the correct one for a specific scenario that fulfills some specified requirements is an important task. Previous research in algorithm selection for MAPF built a standard workflow and showed that machine learning can help. In this paper, we study general solvers for MAPF, which further include suboptimal algorithms. We propose different groups of optimization objectives and learning tasks to handle the new tradeoff between runtime and solution quality. We conduct extensive experiments to show that the same loss can not be used for different groups of optimization objectives, and that standard computer vision models are no worse than customized architecture. We also provide insightful discussions on how feature-sensitive pre-processing is needed for learning for MAPF, and how different learning metrics are correlated to different learning tasks.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
CodeEditorBench: Evaluating Code Editing Capability of Large Language Models
Authors:
Jiawei Guo,
Ziming Li,
Xueling Liu,
Kai**g Ma,
Tianyu Zheng,
Zhouliang Yu,
Ding Pan,
Yizhi LI,
Ruibo Liu,
Yue Wang,
Shuyue Guo,
Xingwei Qu,
Xiang Yue,
Ge Zhang,
Wenhu Chen,
Jie Fu
Abstract:
Large Language Models (LLMs) for code are rapidly evolving, with code editing emerging as a critical capability. We introduce CodeEditorBench, an evaluation framework designed to rigorously assess the performance of LLMs in code editing tasks, including debugging, translating, polishing, and requirement switching. Unlike existing benchmarks focusing solely on code generation, CodeEditorBench empha…
▽ More
Large Language Models (LLMs) for code are rapidly evolving, with code editing emerging as a critical capability. We introduce CodeEditorBench, an evaluation framework designed to rigorously assess the performance of LLMs in code editing tasks, including debugging, translating, polishing, and requirement switching. Unlike existing benchmarks focusing solely on code generation, CodeEditorBench emphasizes real-world scenarios and practical aspects of software development. We curate diverse coding challenges and scenarios from five sources, covering various programming languages, complexity levels, and editing tasks. Evaluation of 19 LLMs reveals that closed-source models (particularly Gemini-Ultra and GPT-4), outperform open-source models in CodeEditorBench, highlighting differences in model performance based on problem types and prompt sensitivities. CodeEditorBench aims to catalyze advancements in LLMs by providing a robust platform for assessing code editing capabilities. We will release all prompts and datasets to enable the community to expand the dataset and benchmark emerging LLMs. By introducing CodeEditorBench, we contribute to the advancement of LLMs in code editing and provide a valuable resource for researchers and practitioners.
△ Less
Submitted 6 April, 2024; v1 submitted 4 April, 2024;
originally announced April 2024.