-
Synthetic high angular momentum spin dynamics in a microwave oscillator
Authors:
Saswata Roy,
Alen Senanian,
Christopher S. Wang,
Owen C. Wetherbee,
Luojia Zhang,
B. Cole,
C. P. Larson,
E. Yelton,
Kartikeya Arora,
Peter L. McMahon,
B. L. T. Plourde,
Baptiste Royer,
Valla Fatemi
Abstract:
Spins and oscillators are foundational to much of physics and applied sciences. For quantum information, a spin 1/2 exemplifies the most basic unit, a qubit. High angular momentum spins and harmonic oscillators provide multi-level manifolds (e.g., qudits) which have the potential for hardware-efficient protected encodings of quantum information and simulation of many-body quantum systems. In this…
▽ More
Spins and oscillators are foundational to much of physics and applied sciences. For quantum information, a spin 1/2 exemplifies the most basic unit, a qubit. High angular momentum spins and harmonic oscillators provide multi-level manifolds (e.g., qudits) which have the potential for hardware-efficient protected encodings of quantum information and simulation of many-body quantum systems. In this work, we demonstrate a new quantum control protocol that conceptually merges these disparate hardware platforms. Namely, we show how to modify a harmonic oscillator on-demand to implement a continuous range of generators associated to resonant driving of a harmonic qudit, and then specifically design a harmonic multi-level spin degree of freedom. The synthetic spin is verified by demonstration of spin coherent (SU(2)) rotations and comparison to other manifolds like simply-truncated oscillators. Our scheme allows universal control of the qudit, and, for the first time, we use linear, harmonic operations to accomplish four logical gates on a harmonic qudit encoding. Our results show how motion on a closed Hilbert space can be useful for quantum information processing and opens the door to superconducting circuit simulations of higher angular momentum quantum magnetism.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Pseudo-easy-axis anisotropy in antiferromagnetic $S=1$ diamond-lattice systems
Authors:
S. Vaidya,
A. Hernández-Melián,
J. P. Tidey,
S. P. M. Curley,
S. Sharma,
P. Manuel,
C. Wang,
G. L. Hannaford,
S. J. Blundell,
Z. E. Manson,
J. L. Manson,
J. Singleton,
T. Lancaster,
R. D. Johnson,
P. A. Goddard
Abstract:
We investigate the magnetic properties of $S=1$ antiferromagnetic diamond lattice, Ni$X_{2}$(pyrimidine)$_{2}$ ($X$ = Cl, Br), hosting a single-ion anisotropy (SIA) orientation which alternates between neighbouring sites. Through neutron diffraction measurements of the $X$ = Cl compound, the ordered state spins are found to align collinearly along a pseudo-easy-axis, a unique direction created by…
▽ More
We investigate the magnetic properties of $S=1$ antiferromagnetic diamond lattice, Ni$X_{2}$(pyrimidine)$_{2}$ ($X$ = Cl, Br), hosting a single-ion anisotropy (SIA) orientation which alternates between neighbouring sites. Through neutron diffraction measurements of the $X$ = Cl compound, the ordered state spins are found to align collinearly along a pseudo-easy-axis, a unique direction created by the intersection of two easy planes. Similarities in the magnetization, exhibiting spin-flop transitions, and the magnetic susceptibility in the two compounds imply that the same magnetic structure and a pseudo-easy-axis is also present for $X$ = Br. We estimate the Hamiltonian parameters by combining analytical calculations and Monte-Carlo (MC) simulations of the spin-flop and saturation field. The MC simulations also reveal that the spin-flop transition occurs when the applied field is parallel to the pseudo-easy-axis. Contrary to conventional easy-axis systems, there exist field directions perpendicular to the pseudo-easy-axis for which the magnetic saturation is approached asymptotically and no symmetry-breaking phase transition is observed at finite fields.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Editable Concept Bottleneck Models
Authors:
Lijie Hu,
Chenyang Ren,
Zhengyu Hu,
Cheng-Long Wang,
Di Wang
Abstract:
Concept Bottleneck Models (CBMs) have garnered much attention for their ability to elucidate the prediction process through a human-understandable concept layer. However, most previous studies focused on cases where the data, including concepts, are clean. In many scenarios, we always need to remove/insert some training data or new concepts from trained CBMs due to different reasons, such as priva…
▽ More
Concept Bottleneck Models (CBMs) have garnered much attention for their ability to elucidate the prediction process through a human-understandable concept layer. However, most previous studies focused on cases where the data, including concepts, are clean. In many scenarios, we always need to remove/insert some training data or new concepts from trained CBMs due to different reasons, such as privacy concerns, data mislabelling, spurious concepts, and concept annotation errors. Thus, the challenge of deriving efficient editable CBMs without retraining from scratch persists, particularly in large-scale applications. To address these challenges, we propose Editable Concept Bottleneck Models (ECBMs). Specifically, ECBMs support three different levels of data removal: concept-label-level, concept-level, and data-level. ECBMs enjoy mathematically rigorous closed-form approximations derived from influence functions that obviate the need for re-training. Experimental results demonstrate the efficiency and effectiveness of our ECBMs, affirming their adaptability within the realm of CBMs.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Deterministic interconversion of GHZ state and KLM state via Lie-transform-based pulse design in Rydberg atoms
Authors:
J. P. Wang,
Y. Q. Ji,
L. P. Yang,
C. Q. Wang,
L. Dong,
X. M. Xiu
Abstract:
Conversion between different types of entangled states is an interesting problem in quantum mechanics. But research on the conversion between Greenberger-Horne-Zeilinger (GHZ) state and Knill-Laflamme-Milburn (KLM) state in atomic system is absent. In this paper, we propose a scheme to realize the interconversion (one-step) between GHZ state and KLM state with Rydberg atoms. By utilizing Rydberg-m…
▽ More
Conversion between different types of entangled states is an interesting problem in quantum mechanics. But research on the conversion between Greenberger-Horne-Zeilinger (GHZ) state and Knill-Laflamme-Milburn (KLM) state in atomic system is absent. In this paper, we propose a scheme to realize the interconversion (one-step) between GHZ state and KLM state with Rydberg atoms. By utilizing Rydberg-mediated interactions, we simplify the system. By combining Lie-transform-based pulse design, the evolution path is built up to realize interconversion of GHZ state and KLM state. The numerical simulation result shows that the present scheme is robust against decoherence and operational imperfection, the analysis shows that the scheme is feasible with current experimental technology.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Distinguish Any Fake Videos: Unleashing the Power of Large-scale Data and Motion Features
Authors:
Lichuan Ji,
Yingqi Lin,
Zhenhua Huang,
Yan Han,
Xiaogang Xu,
Jiafei Wu,
Chong Wang,
Zhe Liu
Abstract:
The development of AI-Generated Content (AIGC) has empowered the creation of remarkably realistic AI-generated videos, such as those involving Sora. However, the widespread adoption of these models raises concerns regarding potential misuse, including face video scams and copyright disputes. Addressing these concerns requires the development of robust tools capable of accurately determining video…
▽ More
The development of AI-Generated Content (AIGC) has empowered the creation of remarkably realistic AI-generated videos, such as those involving Sora. However, the widespread adoption of these models raises concerns regarding potential misuse, including face video scams and copyright disputes. Addressing these concerns requires the development of robust tools capable of accurately determining video authenticity. The main challenges lie in the dataset and neural classifier for training. Current datasets lack a varied and comprehensive repository of real and generated content for effective discrimination. In this paper, we first introduce an extensive video dataset designed specifically for AI-Generated Video Detection (GenVidDet). It includes over 2.66 M instances of both real and generated videos, varying in categories, frames per second, resolutions, and lengths. The comprehensiveness of GenVidDet enables the training of a generalizable video detector. We also present the Dual-Branch 3D Transformer (DuB3D), an innovative and effective method for distinguishing between real and generated videos, enhanced by incorporating motion information alongside visual appearance. DuB3D utilizes a dual-branch architecture that adaptively leverages and fuses raw spatio-temporal data and optical flow. We systematically explore the critical factors affecting detection performance, achieving the optimal configuration for DuB3D. Trained on GenVidDet, DuB3D can distinguish between real and generated video content with 96.77% accuracy, and strong generalization capability even for unseen types.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Discriminative Estimation of Total Variation Distance: A Fidelity Auditor for Generative Data
Authors:
Lan Tao,
Shirong Xu,
Chi-Hua Wang,
Namjoon Suh,
Guang Cheng
Abstract:
With the proliferation of generative AI and the increasing volume of generative data (also called as synthetic data), assessing the fidelity of generative data has become a critical concern. In this paper, we propose a discriminative approach to estimate the total variation (TV) distance between two distributions as an effective measure of generative data fidelity. Our method quantitatively charac…
▽ More
With the proliferation of generative AI and the increasing volume of generative data (also called as synthetic data), assessing the fidelity of generative data has become a critical concern. In this paper, we propose a discriminative approach to estimate the total variation (TV) distance between two distributions as an effective measure of generative data fidelity. Our method quantitatively characterizes the relation between the Bayes risk in classifying two distributions and their TV distance. Therefore, the estimation of total variation distance reduces to that of the Bayes risk. In particular, this paper establishes theoretical results regarding the convergence rate of the estimation error of TV distance between two Gaussian distributions. We demonstrate that, with a specific choice of hypothesis class in classification, a fast convergence rate in estimating the TV distance can be achieved. Specifically, the estimation accuracy of the TV distance is proven to inherently depend on the separation of two Gaussian distributions: smaller estimation errors are achieved when the two Gaussian distributions are farther apart. This phenomenon is also validated empirically through extensive simulations. In the end, we apply this discriminative estimation method to rank fidelity of synthetic image data using the MNIST dataset.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
StyleMaster: Towards Flexible Stylized Image Generation with Diffusion Models
Authors:
Chengming Xu,
Kai Hu,
Donghao Luo,
Jiangning Zhang,
Wei Li,
Yanhao Ge,
Chengjie Wang
Abstract:
Stylized Text-to-Image Generation (STIG) aims to generate images based on text prompts and style reference images. We in this paper propose a novel framework dubbed as StyleMaster for this task by leveraging pretrained Stable Diffusion (SD), which tries to solve the previous problems such as insufficient style and inconsistent semantics. The enhancement lies in two novel module, namely multi-sourc…
▽ More
Stylized Text-to-Image Generation (STIG) aims to generate images based on text prompts and style reference images. We in this paper propose a novel framework dubbed as StyleMaster for this task by leveraging pretrained Stable Diffusion (SD), which tries to solve the previous problems such as insufficient style and inconsistent semantics. The enhancement lies in two novel module, namely multi-source style embedder and dynamic attention adapter. In order to provide SD with better style embeddings, we propose the multi-source style embedder considers both global and local level visual information along with textual one, which provide both complementary style-related and semantic-related knowledge. Additionally, aiming for better balance between the adaptor capacity and semantic control, the proposed dynamic attention adapter is applied to the diffusion UNet in which adaptation weights are dynamically calculated based on the style embeddings. Two objective functions are introduced to optimize the model together with denoising loss, which can further enhance semantic and style consistency. Extensive experiments demonstrate the superiority of StyleMaster over existing methods, rendering images with variable target styles while successfully maintaining the semantic information from the text prompts.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning
Authors:
Qingdong He,
Jiangning Zhang,
**long Peng,
Haoyang He,
Yabiao Wang,
Chengjie Wang
Abstract:
Transformers have revolutionized the point cloud learning task, but the quadratic complexity hinders its extension to long sequence and makes a burden on limited computational resources. The recent advent of RWKV, a fresh breed of deep sequence models, has shown immense potential for sequence modeling in NLP tasks. In this paper, we present PointRWKV, a model of linear complexity derived from the…
▽ More
Transformers have revolutionized the point cloud learning task, but the quadratic complexity hinders its extension to long sequence and makes a burden on limited computational resources. The recent advent of RWKV, a fresh breed of deep sequence models, has shown immense potential for sequence modeling in NLP tasks. In this paper, we present PointRWKV, a model of linear complexity derived from the RWKV model in the NLP field with necessary modifications for point cloud learning tasks. Specifically, taking the embedded point patches as input, we first propose to explore the global processing capabilities within PointRWKV blocks using modified multi-headed matrix-valued states and a dynamic attention recurrence mechanism. To extract local geometric features simultaneously, we design a parallel branch to encode the point cloud efficiently in a fixed radius near-neighbors graph with a graph stabilizer. Furthermore, we design PointRWKV as a multi-scale framework for hierarchical feature learning of 3D point clouds, facilitating various downstream tasks. Extensive experiments on different point cloud learning tasks show our proposed PointRWKV outperforms the transformer- and mamba-based counterparts, while significantly saving about 46\% FLOPs, demonstrating the potential option for constructing foundational 3D models.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models
Authors:
Hongyang Yang,
Boyu Zhang,
Neng Wang,
Cheng Guo,
Xiaoli Zhang,
Likun Lin,
Junlin Wang,
Tianyu Zhou,
Mao Guan,
Runjia Zhang,
Christina Dan Wang
Abstract:
As financial institutions and professionals increasingly incorporate Large Language Models (LLMs) into their workflows, substantial barriers, including proprietary data and specialized knowledge, persist between the finance sector and the AI community. These challenges impede the AI community's ability to enhance financial tasks effectively. Acknowledging financial analysis's critical role, we aim…
▽ More
As financial institutions and professionals increasingly incorporate Large Language Models (LLMs) into their workflows, substantial barriers, including proprietary data and specialized knowledge, persist between the finance sector and the AI community. These challenges impede the AI community's ability to enhance financial tasks effectively. Acknowledging financial analysis's critical role, we aim to devise financial-specialized LLM-based toolchains and democratize access to them through open-source initiatives, promoting wider AI adoption in financial decision-making. In this paper, we introduce FinRobot, a novel open-source AI agent platform supporting multiple financially specialized AI agents, each powered by LLM. Specifically, the platform consists of four major layers: 1) the Financial AI Agents layer that formulates Financial Chain-of-Thought (CoT) by breaking sophisticated financial problems down into logical sequences; 2) the Financial LLM Algorithms layer dynamically configures appropriate model application strategies for specific tasks; 3) the LLMOps and DataOps layer produces accurate models by applying training/fine-tuning techniques and using task-relevant data; 4) the Multi-source LLM Foundation Models layer that integrates various LLMs and enables the above layers to access them directly. Finally, FinRobot provides hands-on for both professional-grade analysts and laypersons to utilize powerful AI techniques for advanced financial analysis. We open-source FinRobot at \url{https://github.com/AI4Finance-Foundation/FinRobot}.
△ Less
Submitted 27 May, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Online robust estimation and bootstrap inference for function-on-scalar regression
Authors:
Guanghui Cheng,
Wenjuan Hu,
Ruitao Lin,
Chen Wang
Abstract:
We propose a novel and robust online function-on-scalar regression technique via geometric median to learn associations between functional responses and scalar covariates based on massive or streaming datasets. The online estimation procedure, developed using the average stochastic gradient descent algorithm, offers an efficient and cost-effective method for analyzing sequentially augmented datase…
▽ More
We propose a novel and robust online function-on-scalar regression technique via geometric median to learn associations between functional responses and scalar covariates based on massive or streaming datasets. The online estimation procedure, developed using the average stochastic gradient descent algorithm, offers an efficient and cost-effective method for analyzing sequentially augmented datasets, eliminating the need to store large volumes of data in memory. We establish the almost sure consistency, $L_p$ convergence, and asymptotic normality of the online estimator. To enable efficient and fast inference of the parameters of interest, including the derivation of confidence intervals, we also develop an innovative two-step online bootstrap procedure to approximate the limiting error distribution of the robust online estimator. Numerical studies under a variety of scenarios demonstrate the effectiveness and efficiency of the proposed online learning method. A real application analyzing PM$_{2.5}$ air-quality data is also included to exemplify the proposed online approach.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
AI-Olympics: Exploring the Generalization of Agents through Open Competitions
Authors:
Chen Wang,
Yan Song,
Shuai Wu,
Sa Wu,
Ruizhi Zhang,
Shu Lin,
Haifeng Zhang
Abstract:
Between 2021 and 2023, AI-Olympics, a series of online AI competitions was hosted by the online evaluation platform Jidi in collaboration with the IJCAI committee. In these competitions, an agent is required to accomplish diverse sports tasks in a two-dimensional continuous world, while competing against an opponent. This paper provides a brief overview of the competition series and highlights not…
▽ More
Between 2021 and 2023, AI-Olympics, a series of online AI competitions was hosted by the online evaluation platform Jidi in collaboration with the IJCAI committee. In these competitions, an agent is required to accomplish diverse sports tasks in a two-dimensional continuous world, while competing against an opponent. This paper provides a brief overview of the competition series and highlights notable findings. We aim to contribute insights to the field of multi-agent decision-making and explore the generalization of agents through engineering efforts.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Similarity-Navigated Conformal Prediction for Graph Neural Networks
Authors:
Jianqing Song,
Jianguo Huang,
Wenyu Jiang,
Baoming Zhang,
Shuangjie Li,
Chongjun Wang
Abstract:
Graph Neural Networks have achieved remarkable accuracy in semi-supervised node classification tasks. However, these results lack reliable uncertainty estimates. Conformal prediction methods provide a theoretical guarantee for node classification tasks, ensuring that the conformal prediction set contains the ground-truth label with a desired probability (e.g., 95%). In this paper, we empirically s…
▽ More
Graph Neural Networks have achieved remarkable accuracy in semi-supervised node classification tasks. However, these results lack reliable uncertainty estimates. Conformal prediction methods provide a theoretical guarantee for node classification tasks, ensuring that the conformal prediction set contains the ground-truth label with a desired probability (e.g., 95%). In this paper, we empirically show that for each node, aggregating the non-conformity scores of nodes with the same label can improve the efficiency of conformal prediction sets. This observation motivates us to propose a novel algorithm named Similarity-Navigated Adaptive Prediction Sets (SNAPS), which aggregates the non-conformity scores based on feature similarity and structural neighborhood. The key idea behind SNAPS is that nodes with high feature similarity or direct connections tend to have the same label. By incorporating adaptive similar nodes information, SNAPS can generate compact prediction sets and increase the singleton hit ratio (correct prediction sets of size one). Moreover, we theoretically provide a finite-sample coverage guarantee of SNAPS. Extensive experiments demonstrate the superiority of SNAPS, improving the efficiency of prediction sets and singleton hit ratio while maintaining valid coverage.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Semantic-guided Prompt Organization for Universal Goal Hijacking against LLMs
Authors:
Yihao Huang,
Chong Wang,
Xiaojun Jia,
Qing Guo,
Felix Juefei-Xu,
Jian Zhang,
Geguang Pu,
Yang Liu
Abstract:
With the rising popularity of Large Language Models (LLMs), assessing their trustworthiness through security tasks has gained critical importance. Regarding the new task of universal goal hijacking, previous efforts have concentrated solely on optimization algorithms, overlooking the crucial role of the prompt. To fill this gap, we propose a universal goal hijacking method called POUGH that incorp…
▽ More
With the rising popularity of Large Language Models (LLMs), assessing their trustworthiness through security tasks has gained critical importance. Regarding the new task of universal goal hijacking, previous efforts have concentrated solely on optimization algorithms, overlooking the crucial role of the prompt. To fill this gap, we propose a universal goal hijacking method called POUGH that incorporates semantic-guided prompt processing strategies. Specifically, the method starts with a sampling strategy to select representative prompts from a candidate pool, followed by a ranking strategy that prioritizes the prompts. Once the prompts are organized sequentially, the method employs an iterative optimization algorithm to generate the universal fixed suffix for the prompts. Experiments conducted on four popular LLMs and ten types of target responses verified the effectiveness of our method.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Leader Reward for POMO-Based Neural Combinatorial Optimization
Authors:
Chaoyang Wang,
Pengzhi Cheng,
**gze Li,
Weiwei Sun
Abstract:
Deep neural networks based on reinforcement learning (RL) for solving combinatorial optimization (CO) problems are develo** rapidly and have shown a tendency to approach or even outperform traditional solvers. However, existing methods overlook an important distinction: CO problems differ from other traditional problems in that they focus solely on the optimal solution provided by the model with…
▽ More
Deep neural networks based on reinforcement learning (RL) for solving combinatorial optimization (CO) problems are develo** rapidly and have shown a tendency to approach or even outperform traditional solvers. However, existing methods overlook an important distinction: CO problems differ from other traditional problems in that they focus solely on the optimal solution provided by the model within a specific length of time, rather than considering the overall quality of all solutions generated by the model. In this paper, we propose Leader Reward and apply it during two different training phases of the Policy Optimization with Multiple Optima (POMO) model to enhance the model's ability to generate optimal solutions. This approach is applicable to a variety of CO problems, such as the Traveling Salesman Problem (TSP), the Capacitated Vehicle Routing Problem (CVRP), and the Flexible Flow Shop Problem (FFSP), but also works well with other POMO-based models or inference phase's strategies. We demonstrate that Leader Reward greatly improves the quality of the optimal solutions generated by the model. Specifically, we reduce the POMO's gap to the optimum by more than 100 times on TSP100 with almost no additional computational overhead.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
EgoChoir: Capturing 3D Human-Object Interaction Regions from Egocentric Views
Authors:
Yuhang Yang,
Wei Zhai,
Chengfeng Wang,
Chengjun Yu,
Yang Cao,
Zheng-Jun Zha
Abstract:
Understanding egocentric human-object interaction (HOI) is a fundamental aspect of human-centric perception, facilitating applications like AR/VR and embodied AI. For the egocentric HOI, in addition to perceiving semantics e.g., ''what'' interaction is occurring, capturing ''where'' the interaction specifically manifests in 3D space is also crucial, which links the perception and operation. Existi…
▽ More
Understanding egocentric human-object interaction (HOI) is a fundamental aspect of human-centric perception, facilitating applications like AR/VR and embodied AI. For the egocentric HOI, in addition to perceiving semantics e.g., ''what'' interaction is occurring, capturing ''where'' the interaction specifically manifests in 3D space is also crucial, which links the perception and operation. Existing methods primarily leverage observations of HOI to capture interaction regions from an exocentric view. However, incomplete observations of interacting parties in the egocentric view introduce ambiguity between visual observations and interaction contents, impairing their efficacy. From the egocentric view, humans integrate the visual cortex, cerebellum, and brain to internalize their intentions and interaction concepts of objects, allowing for the pre-formulation of interactions and making behaviors even when interaction regions are out of sight. In light of this, we propose harmonizing the visual appearance, head motion, and 3D object to excavate the object interaction concept and subject intention, jointly inferring 3D human contact and object affordance from egocentric videos. To achieve this, we present EgoChoir, which links object structures with interaction contexts inherent in appearance and head motion to reveal object affordance, further utilizing it to model human contact. Additionally, a gradient modulation is employed to adopt appropriate clues for capturing interaction regions across various egocentric scenarios. Moreover, 3D contact and affordance are annotated for egocentric videos collected from Ego-Exo4D and GIMO to support the task. Extensive experiments on them demonstrate the effectiveness and superiority of EgoChoir. Code and data will be open.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Safety Alignment for Vision Language Models
Authors:
Zhendong Liu,
Yuanbi Nie,
Yingshui Tan,
Xiangyu Yue,
Qiushi Cui,
Chongjun Wang,
Xiaoyong Zhu,
Bo Zheng
Abstract:
Benefiting from the powerful capabilities of Large Language Models (LLMs), pre-trained visual encoder models connected to an LLMs can realize Vision Language Models (VLMs). However, existing research shows that the visual modality of VLMs is vulnerable, with attackers easily bypassing LLMs' safety alignment through visual modality features to launch attacks. To address this issue, we enhance the e…
▽ More
Benefiting from the powerful capabilities of Large Language Models (LLMs), pre-trained visual encoder models connected to an LLMs can realize Vision Language Models (VLMs). However, existing research shows that the visual modality of VLMs is vulnerable, with attackers easily bypassing LLMs' safety alignment through visual modality features to launch attacks. To address this issue, we enhance the existing VLMs' visual modality safety alignment by adding safety modules, including a safety projector, safety tokens, and a safety head, through a two-stage training process, effectively improving the model's defense against risky images. For example, building upon the LLaVA-v1.5 model, we achieve a safety score of 8.26, surpassing the GPT-4V on the Red Teaming Visual Language Models (RTVLM) benchmark. Our method boasts ease of use, high flexibility, and strong controllability, and it enhances safety while having minimal impact on the model's general performance. Moreover, our alignment strategy also uncovers some possible risky content within commonly used open-source multimodal datasets. Our code will be open sourced after the anonymous review.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Reinforcement Learning for Adaptive MCMC
Authors:
Congye Wang,
Wilson Chen,
Heishiro Kanagawa,
Chris. J. Oates
Abstract:
An informal observation, made by several authors, is that the adaptive design of a Markov transition kernel has the flavour of a reinforcement learning task. Yet, to-date it has remained unclear how to actually exploit modern reinforcement learning technologies for adaptive MCMC. The aim of this paper is to set out a general framework, called Reinforcement Learning Metropolis--Hastings, that is th…
▽ More
An informal observation, made by several authors, is that the adaptive design of a Markov transition kernel has the flavour of a reinforcement learning task. Yet, to-date it has remained unclear how to actually exploit modern reinforcement learning technologies for adaptive MCMC. The aim of this paper is to set out a general framework, called Reinforcement Learning Metropolis--Hastings, that is theoretically supported and empirically validated. Our principal focus is on learning fast-mixing Metropolis--Hastings transition kernels, which we cast as deterministic policies and optimise via a policy gradient. Control of the learning rate provably ensures conditions for ergodicity are satisfied. The methodology is used to construct a gradient-free sampler that out-performs a popular gradient-free adaptive Metropolis--Hastings algorithm on $\approx 90 \%$ of tasks in the PosteriorDB benchmark.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning
Authors:
Yuanhao Yue,
Chengyu Wang,
Jun Huang,
Peng Wang
Abstract:
The process of instruction tuning aligns pre-trained large language models (LLMs) with open-domain instructions and human-preferred responses. While several studies have explored autonomous approaches to distilling and annotating instructions from more powerful proprietary LLMs, such as ChatGPT, they often neglect the impact of task distributions and the varying difficulty of instructions of the t…
▽ More
The process of instruction tuning aligns pre-trained large language models (LLMs) with open-domain instructions and human-preferred responses. While several studies have explored autonomous approaches to distilling and annotating instructions from more powerful proprietary LLMs, such as ChatGPT, they often neglect the impact of task distributions and the varying difficulty of instructions of the training sets. This oversight can lead to imbalanced knowledge capabilities and poor generalization powers of small student LLMs. To address this challenge, we introduce Task-Aware Curriculum Planning for Instruction Refinement (TAPIR), a multi-round distillation framework with balanced task distributions and dynamic difficulty adjustment. This approach utilizes an oracle LLM to select instructions that are difficult for a student LLM to follow and distill instructions with balanced task distributions. By incorporating curriculum planning, our approach systematically escalates the difficulty levels, progressively enhancing the student LLM's capabilities. We rigorously evaluate TAPIR using two widely recognized benchmarks, including AlpacaEval 2.0 and MT-Bench. The empirical results demonstrate that the student LLMs, trained with our method and less training data, outperform larger instruction-tuned models and strong distillation baselines. The improvement is particularly notable in complex tasks, such as logical reasoning and code generation.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Mosaic IT: Enhancing Instruction Tuning with Data Mosaics
Authors:
Ming Li,
Pei Chen,
Chenguang Wang,
Hongyu Zhao,
Yijun Liang,
Yupeng Hou,
Fuxiao Liu,
Tianyi Zhou
Abstract:
Finetuning large language models with a variety of instruction-response pairs has enhanced their capability to understand and follow instructions. Current instruction tuning primarily relies on teacher models or human intervention to generate and refine the instructions and responses, which are costly, non-sustainable, and may lack diversity. In this paper, we introduce Mosaic Instruction Tuning (…
▽ More
Finetuning large language models with a variety of instruction-response pairs has enhanced their capability to understand and follow instructions. Current instruction tuning primarily relies on teacher models or human intervention to generate and refine the instructions and responses, which are costly, non-sustainable, and may lack diversity. In this paper, we introduce Mosaic Instruction Tuning (Mosaic-IT), a human/model-free method that can efficiently create rich and diverse augmentations from existing instruction tuning data to enhance the finetuned LLM.Mosaic-IT randomly concatenates multiple instruction data into one and trains the model to produce the corresponding responses with predefined higher-level meta-instructions to strengthen its multi-step instruction-following and format-following skills. Our extensive evaluations demonstrate a superior performance and training efficiency of Mosaic-IT, which achieves consistent performance improvements over various benchmarks and an 80% reduction in training costs compared with original instruction tuning. Our codes and data are available at https://github.com/tianyi-lab/Mosaic-IT.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Theoretical Analysis of Meta Reinforcement Learning: Generalization Bounds and Convergence Guarantees
Authors:
Cangqing Wang,
Mingxiu Sui,
Dan Sun,
Zecheng Zhang,
Yan Zhou
Abstract:
This research delves deeply into Meta Reinforcement Learning (Meta RL) through a exploration focusing on defining generalization limits and ensuring convergence. By employing a approach this article introduces an innovative theoretical framework to meticulously assess the effectiveness and performance of Meta RL algorithms. We present an explanation of generalization limits measuring how well thes…
▽ More
This research delves deeply into Meta Reinforcement Learning (Meta RL) through a exploration focusing on defining generalization limits and ensuring convergence. By employing a approach this article introduces an innovative theoretical framework to meticulously assess the effectiveness and performance of Meta RL algorithms. We present an explanation of generalization limits measuring how well these algorithms can adapt to learning tasks while maintaining consistent results. Our analysis delves into the factors that impact the adaptability of Meta RL revealing the relationship, between algorithm design and task complexity. Additionally we establish convergence assurances by proving conditions under which Meta RL strategies are guaranteed to converge towards solutions. We examine the convergence behaviors of Meta RL algorithms across scenarios providing a comprehensive understanding of the driving forces behind their long term performance. This exploration covers both convergence and real time efficiency offering a perspective, on the capabilities of these algorithms.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Search for the lepton-flavor violating decay $B^0_s\toφμ^\pmτ^\mp$
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1062 additional authors not shown)
Abstract:
A search for the lepton-flavor violating decays $B^0_s\toφμ^\pmτ^\mp$ is presented, using a sample of proton-proton collisions at center-of-mass energies of 7, 8, and 13 TeV, collected with the LHCb detector and corresponding to a total integrated luminosity of $9\,\text{fb}^{-1}$. The $τ$ leptons are selected using decays with three charged pions. No significant excess is observed, and an upper l…
▽ More
A search for the lepton-flavor violating decays $B^0_s\toφμ^\pmτ^\mp$ is presented, using a sample of proton-proton collisions at center-of-mass energies of 7, 8, and 13 TeV, collected with the LHCb detector and corresponding to a total integrated luminosity of $9\,\text{fb}^{-1}$. The $τ$ leptons are selected using decays with three charged pions. No significant excess is observed, and an upper limit on the branching fraction is determined to be ${\cal B}( B^0_s\toφμ^\pmτ^\mp) < 1.0\times 10^{-5}$ at 90% confidence level.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control
Authors:
Yue Han,
Junwei Zhu,
Keke He,
Xu Chen,
Yanhao Ge,
Wei Li,
Xiangtai Li,
Jiangning Zhang,
Chengjie Wang,
Yong Liu
Abstract:
Current face reenactment and swap** methods mainly rely on GAN frameworks, but recent focus has shifted to pre-trained diffusion models for their superior generation capabilities. However, training these models is resource-intensive, and the results have not yet achieved satisfactory performance levels. To address this issue, we introduce Face-Adapter, an efficient and effective adapter designed…
▽ More
Current face reenactment and swap** methods mainly rely on GAN frameworks, but recent focus has shifted to pre-trained diffusion models for their superior generation capabilities. However, training these models is resource-intensive, and the results have not yet achieved satisfactory performance levels. To address this issue, we introduce Face-Adapter, an efficient and effective adapter designed for high-precision and high-fidelity face editing for pre-trained diffusion models. We observe that both face reenactment/swap** tasks essentially involve combinations of target structure, ID and attribute. We aim to sufficiently decouple the control of these factors to achieve both tasks in one model. Specifically, our method contains: 1) A Spatial Condition Generator that provides precise landmarks and background; 2) A Plug-and-play Identity Encoder that transfers face embeddings to the text space by a transformer decoder. 3) An Attribute Controller that integrates spatial conditions and detailed attributes. Face-Adapter achieves comparable or even superior performance in terms of motion control precision, ID retention capability, and generation quality compared to fully fine-tuned face reenactment/swap** models. Additionally, Face-Adapter seamlessly integrates with various StableDiffusion models.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
AMFD: Distillation via Adaptive Multimodal Fusion for Multispectral Pedestrian Detection
Authors:
Zizhao Chen,
Yeqiang Qian,
Xiaoxiao Yang,
Chunxiang Wang,
Ming Yang
Abstract:
Multispectral pedestrian detection has been shown to be effective in improving performance within complex illumination scenarios. However, prevalent double-stream networks in multispectral detection employ two separate feature extraction branches for multi-modal data, leading to nearly double the inference time compared to single-stream networks utilizing only one feature extraction branch. This i…
▽ More
Multispectral pedestrian detection has been shown to be effective in improving performance within complex illumination scenarios. However, prevalent double-stream networks in multispectral detection employ two separate feature extraction branches for multi-modal data, leading to nearly double the inference time compared to single-stream networks utilizing only one feature extraction branch. This increased inference time has hindered the widespread employment of multispectral pedestrian detection in embedded devices for autonomous systems. To address this limitation, various knowledge distillation methods have been proposed. However, traditional distillation methods focus only on the fusion features and ignore the large amount of information in the original multi-modal features, thereby restricting the student network's performance. To tackle the challenge, we introduce the Adaptive Modal Fusion Distillation (AMFD) framework, which can fully utilize the original modal features of the teacher network. Specifically, a Modal Extraction Alignment (MEA) module is utilized to derive learning weights for student networks, integrating focal and global attention mechanisms. This methodology enables the student network to acquire optimal fusion strategies independent from that of teacher network without necessitating an additional feature fusion module. Furthermore, we present the SMOD dataset, a well-aligned challenging multispectral dataset for detection. Extensive experiments on the challenging KAIST, LLVIP and SMOD datasets are conducted to validate the effectiveness of AMFD. The results demonstrate that our method outperforms existing state-of-the-art methods in both reducing log-average Miss Rate and improving mean Average Precision. The code is available at https://github.com/bigD233/AMFD.git.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Precision measurement of the branching fraction of \boldmath $J/ψ\rightarrow K^+K^-$ via $ψ(2S)\rightarrow π^+π^-J/ψ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (604 additional authors not shown)
Abstract:
Using a sample of $448.1 \times 10^6$ $ψ(2S)$ events collected with the BESIII detector, we perform a study of the decay $J/ψ\rightarrow K^+K^-$ via $ψ(2S)\rightarrow π^+π^-J/ψ$.
The branching fraction of $J/ψ\rightarrow K^+K^-$ is determined to be $\mathcal{B}_{K^+K^-}=(3.072\pm 0.023({\rm stat.})\pm 0.050({\rm syst.}))\times 10^{-4}$, which is consistent with previous measurements but with sig…
▽ More
Using a sample of $448.1 \times 10^6$ $ψ(2S)$ events collected with the BESIII detector, we perform a study of the decay $J/ψ\rightarrow K^+K^-$ via $ψ(2S)\rightarrow π^+π^-J/ψ$.
The branching fraction of $J/ψ\rightarrow K^+K^-$ is determined to be $\mathcal{B}_{K^+K^-}=(3.072\pm 0.023({\rm stat.})\pm 0.050({\rm syst.}))\times 10^{-4}$, which is consistent with previous measurements but with significantly improved precision.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Generalized Strauss conjecture for semilinear wave equations on $\mathbb{R}^3$
Authors:
Chengbo Wang,
Xiaoran Zhang
Abstract:
In this manuscript, we focus on the more delicate nonlinearity of the semilinear wave equation $$\partial_{t}^2 u-Δ_{\mathbb{R}^3}u=|u|^{p_S}μ(|u|)\ ,u(0,x)=\varepsilon u_0,\ u_t(0,x)=\varepsilon u_1\ ,$$ where $p_S=1+\sqrt{2}$ is the Strauss critical index in $n=3$, and $μ$ is a modulus of continuity. Inspired by Chen, Reissig\cite{Chen_2024} and Ebert, Girardi, Reissig\cite{MR4163528}, we invest…
▽ More
In this manuscript, we focus on the more delicate nonlinearity of the semilinear wave equation $$\partial_{t}^2 u-Δ_{\mathbb{R}^3}u=|u|^{p_S}μ(|u|)\ ,u(0,x)=\varepsilon u_0,\ u_t(0,x)=\varepsilon u_1\ ,$$ where $p_S=1+\sqrt{2}$ is the Strauss critical index in $n=3$, and $μ$ is a modulus of continuity. Inspired by Chen, Reissig\cite{Chen_2024} and Ebert, Girardi, Reissig\cite{MR4163528}, we investigate the sharp condition of $μ$ as the threshold between the global existence and blow up with small data. We obtain the almost sharp results in this paper, which in particular disproves the conjecture in \cite{Chen_2024}.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Study of $b$-hadron decays to $Λ_c^+ h^- h^{\prime -}$ final states
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1072 additional authors not shown)
Abstract:
Decays of $Ξ_b^-$ and $Ω_b^-$ baryons to $Λ_c^+ h^- h^{\prime -}$ final states, with $h^- h^{\prime -}$ being $π^-π^-$, $K^-π^-$ and $K^-K^-$ meson pairs, are searched for using data collected with the LHCb detector. The data sample studied corresponds to an integrated luminosity of $8.7\,\mathrm{fb}^{-1}$ of $pp$ collisions collected at centre-of-mass energies $\sqrt{s} = 7$, $8$ and…
▽ More
Decays of $Ξ_b^-$ and $Ω_b^-$ baryons to $Λ_c^+ h^- h^{\prime -}$ final states, with $h^- h^{\prime -}$ being $π^-π^-$, $K^-π^-$ and $K^-K^-$ meson pairs, are searched for using data collected with the LHCb detector. The data sample studied corresponds to an integrated luminosity of $8.7\,\mathrm{fb}^{-1}$ of $pp$ collisions collected at centre-of-mass energies $\sqrt{s} = 7$, $8$ and $13\,\mathrm{Te\kern -0.1em V}$. The products of the relative branching fractions and fragmentation fractions for each signal mode, relative to the $B^- \to Λ_c^+ \overline{p} π^-$ mode, are measured, with $Ξ_{b}^- \toΛ_{c}^+ K^- π^-$, $Ξ_{b}^- \toΛ_{c}^+ K^- K^-$ and $Ω_{b}^- \toΛ_{c}^+ K^- K^-$ decays being observed at over $5\,σ$ significance. The $Ξ_{b}^- \toΛ_{c}^+ K^- π^-$ mode is also used to measure the $Ξ_{b}^-$ production asymmetry, which is found to be consistent with zero. In addition, the $B^- \to Λ_{c}^+ \overline{p} K^-$ decay is observed for the first time, and its branching fraction is measured relative to that of the $B^- \to Λ_{c}^+ \overline{p} π^-$ mode.
△ Less
Submitted 22 May, 2024; v1 submitted 21 May, 2024;
originally announced May 2024.
-
Stabilizing fractional Chern insulators via exchange interaction in moiré systems
Authors:
Xiaoyang Shen,
Chonghao Wang,
Rui** Guo,
Zhiming Xu,
Wenhui Duan,
Yong Xu
Abstract:
Recent experimental discovery of fractional Chern insulator in moiré Chern band in twisted transition metal dichalocogenide homobilayers has sparked intensive interest in exploring the ways of engineering band topology and correlated states in moiré systems. In this letter, we demonstrate that, with an additional exchange interaction induced by proximity effect, the topology and bandwidth of the m…
▽ More
Recent experimental discovery of fractional Chern insulator in moiré Chern band in twisted transition metal dichalocogenide homobilayers has sparked intensive interest in exploring the ways of engineering band topology and correlated states in moiré systems. In this letter, we demonstrate that, with an additional exchange interaction induced by proximity effect, the topology and bandwidth of the moiré minibands of twisted $\mathrm{MoTe_2}$ homobilayers can be easily tuned. Fractional Chern insulators at -2/3 filling are found to appear at enlarged twist angles over a large range of twist angles with enhanced many-body gaps. We further discover a topological phase transition between the fractional Chern insulator, quantum anomalous Hall crystal, and charge density wave. Our results shed light on the interplay between topology and correlation physics.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models
Authors:
Haoxiang Shi,
Jiaan Wang,
Jiarong Xu,
Cen Wang,
Tetsuya Sakai
Abstract:
Text-to-Table aims to generate structured tables to convey the key information from unstructured documents. Existing text-to-table datasets are typically oriented English, limiting the research in non-English languages. Meanwhile, the emergence of large language models (LLMs) has shown great success as general task solvers in multi-lingual settings (e.g., ChatGPT), theoretically enabling text-to-t…
▽ More
Text-to-Table aims to generate structured tables to convey the key information from unstructured documents. Existing text-to-table datasets are typically oriented English, limiting the research in non-English languages. Meanwhile, the emergence of large language models (LLMs) has shown great success as general task solvers in multi-lingual settings (e.g., ChatGPT), theoretically enabling text-to-table in other languages. In this paper, we propose a Chinese text-to-table dataset, CT-Eval, to benchmark LLMs on this task. Our preliminary analysis of English text-to-table datasets highlights two key factors for dataset construction: data diversity and data hallucination. Inspired by this, the CT-Eval dataset selects a popular Chinese multidisciplinary online encyclopedia as the source and covers 28 domains to ensure data diversity. To minimize data hallucination, we first train an LLM to judge and filter out the task samples with hallucination, then employ human annotators to clean the hallucinations in the validation and testing sets. After this process, CT-Eval contains 88.6K task samples. Using CT-Eval, we evaluate the performance of open-source and closed-source LLMs. Our results reveal that zero-shot LLMs (including GPT-4) still have a significant performance gap compared with human judgment. Furthermore, after fine-tuning, open-source LLMs can significantly improve their text-to-table ability, outperforming GPT-4 by a large margin. In short, CT-Eval not only helps researchers evaluate and quickly understand the Chinese text-to-table ability of existing LLMs but also serves as a valuable resource to significantly improve the text-to-table performance of LLMs.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Alkaline earth metal mediated inter-molecular magnetism in perfluorocubane dimers and chains
Authors:
Zhuohang Li,
Cong Wang,
Linwei Zhou,
Yurou Guan,
Linlu Wu,
Jiaqi Dai,
Wei Ji
Abstract:
Perfluorocubane ($C_8F_8$) was successfully synthesized and found to accept and store electrons in its internal cubic cavity to form magnetic moments. However their inter-molecule spin-exchange coupling mechanism is yet to be revealed. In this study, we found the inter-molecule magnetic groundstates of $C_8F_8$ dimer and one-dimensional (1D) chain are tunable from antiferromagnetic (AFM) to ferrom…
▽ More
Perfluorocubane ($C_8F_8$) was successfully synthesized and found to accept and store electrons in its internal cubic cavity to form magnetic moments. However their inter-molecule spin-exchange coupling mechanism is yet to be revealed. In this study, we found the inter-molecule magnetic groundstates of $C_8F_8$ dimer and one-dimensional (1D) chain are tunable from antiferromagnetic (AFM) to ferromagnetic (FM) by stacking orders and alkaline earth metals intercalation using first-principle calculations. The inter-molecule couplings are dominated by noncovalent halogen $C-F...C_4$ interactions. Stacking orders of dimers can regulate the relative position of the lone pairs and $σ-holes$ at the molecular interface and thus the magnetic groundstates. Alkaline earth metals M (M = Na, Mg) intercalations could form $C_4-M-C_4$ bonds and lead to FM direct exchange at the inter-molecule region. An unpaired electron donated by the intercalated atoms or electron do** can result in a local magnetic moment in dimers, exhibiting an on-off switching by the odd-even number of electron filling. Novel electronic properties such as spin gapless semiconductor and charge density wave (CDW) states emerge when $C_8F_8$ molecules self-assemble with intercalated atoms to form 1D chains. These findings manifest the roles of stacking and intercalation in modifying intermolecular magnetism and the revealed halogen bond-dominated exchange mechanisms are paramount additions to those previously established non-covalent couplings.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Data quality control system and long-term performance monitor of the LHAASO-KM2A
Authors:
Zhen Cao,
F. Aharonian,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
W. Bian,
A. V. Bukevich,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
H. X. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. Chen
, et al. (263 additional authors not shown)
Abstract:
The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To…
▽ More
The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To ensure the reliability of the LHAASO-KM2A data, a three-level quality control system has been established. It is used to monitor the status of detector units, stability of reconstructed parameters and the performance of the array based on observations of the Crab Nebula and Moon shadow. This paper will introduce the control system and its application on the LHAASO-KM2A data collected from August 2021 to July 2023. During this period, the pointing and angular resolution of the array were stable. From the observations of the Moon shadow and Crab Nebula, the results achieved using the two methods are consistent with each other. According to the observation of the Crab Nebula at energies from 25 TeV to 100 TeV, the time averaged pointing errors are estimated to be $-0.003^{\circ} \pm 0.005^{\circ}$ and $0.001^{\circ} \pm 0.006^{\circ}$ in the R.A. and Dec directions, respectively.
△ Less
Submitted 13 June, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
NetMamba: Efficient Network Traffic Classification via Pre-training Unidirectional Mamba
Authors:
Tongze Wang,
Xiaohui Xie,
Wenduo Wang,
Chuyi Wang,
Youjian Zhao,
Yong Cui
Abstract:
Network traffic classification is a crucial research area aiming to enhance service quality, streamline network management, and bolster cybersecurity. To address the growing complexity of transmission encryption techniques, various machine learning and deep learning methods have been proposed. However, existing approaches face two main challenges. Firstly, they struggle with model inefficiency due…
▽ More
Network traffic classification is a crucial research area aiming to enhance service quality, streamline network management, and bolster cybersecurity. To address the growing complexity of transmission encryption techniques, various machine learning and deep learning methods have been proposed. However, existing approaches face two main challenges. Firstly, they struggle with model inefficiency due to the quadratic complexity of the widely used Transformer architecture. Secondly, they suffer from inadequate traffic representation because of discarding important byte information while retaining unwanted biases. To address these challenges, we propose NetMamba, an efficient linear-time state space model equipped with a comprehensive traffic representation scheme. We adopt a specially selected and improved unidirectional Mamba architecture for the networking field, instead of the Transformer, to address efficiency issues. In addition, we design a traffic representation scheme to extract valid information from massive traffic data while removing biased information. Evaluation experiments on six public datasets encompassing three main classification tasks showcase NetMamba's superior classification performance compared to state-of-the-art baselines. It achieves an accuracy rate of nearly 99% (some over 99%) in all tasks. Additionally, NetMamba demonstrates excellent efficiency, improving inference speed by up to 60 times while maintaining comparably low memory usage. Furthermore, NetMamba exhibits superior few-shot learning abilities, achieving better classification performance with fewer labeled data. To the best of our knowledge, NetMamba is the first model to tailor the Mamba architecture for networking.
△ Less
Submitted 25 May, 2024; v1 submitted 19 May, 2024;
originally announced May 2024.
-
On the Trajectory Regularity of ODE-based Diffusion Sampling
Authors:
Defang Chen,
Zhenyu Zhou,
Can Wang,
Chunhua Shen,
Siwei Lyu
Abstract:
Diffusion-based generative models use stochastic differential equations (SDEs) and their equivalent ordinary differential equations (ODEs) to establish a smooth connection between a complex data distribution and a tractable prior distribution. In this paper, we identify several intriguing trajectory properties in the ODE-based sampling process of diffusion models. We characterize an implicit denoi…
▽ More
Diffusion-based generative models use stochastic differential equations (SDEs) and their equivalent ordinary differential equations (ODEs) to establish a smooth connection between a complex data distribution and a tractable prior distribution. In this paper, we identify several intriguing trajectory properties in the ODE-based sampling process of diffusion models. We characterize an implicit denoising trajectory and discuss its vital role in forming the coupled sampling trajectory with a strong shape regularity, regardless of the generated content. We also describe a dynamic programming-based scheme to make the time schedule in sampling better fit the underlying trajectory structure. This simple strategy requires minimal modification to any given ODE-based numerical solvers and incurs negligible computational cost, while delivering superior performance in image generation, especially in $5\sim 10$ function evaluations.
△ Less
Submitted 18 May, 2024;
originally announced May 2024.
-
Transverse polarization measurement of $Λ$ hyperons in $p$Ne collisions at $\sqrt{s_{NN}}$ = 68.4 GeV with the $\mbox{LHCb}$ detector
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1065 additional authors not shown)
Abstract:
A measurement of the transverse polarization of the $Λ$ and $\barΛ$ hyperons in $p$Ne fixed-target collisions at $\sqrt{s_{NN}}$ = 68.4 GeV is presented using data collected by the LHCb detector. The polarization is studied using the decay $Λ\rightarrow p π^-$ together with its charge conjugated process, the integrated values measured are…
▽ More
A measurement of the transverse polarization of the $Λ$ and $\barΛ$ hyperons in $p$Ne fixed-target collisions at $\sqrt{s_{NN}}$ = 68.4 GeV is presented using data collected by the LHCb detector. The polarization is studied using the decay $Λ\rightarrow p π^-$ together with its charge conjugated process, the integrated values measured are
$$ P_Λ = 0.029 \pm 0.019 \, (\rm{stat}) \pm 0.012 \, (\rm{syst}) \, , $$ $$ P_{\barΛ} = 0.003 \pm 0.023 \, (\rm{stat}) \pm 0.014 \,(\rm{syst}) \,. $$
Furthermore, the results are shown as a function of the Feynman~$x$~variable, transverse momentum, pseudorapidity and rapidity of the hyperons, and are compared with previous measurements.
△ Less
Submitted 24 May, 2024; v1 submitted 18 May, 2024;
originally announced May 2024.
-
MicroBundlePillarTrack, A Python package for automated segmentation, tracking, and analysis of pillar deflection in cardiac microbundles
Authors:
Hiba Kobeissi,
Xining Gao,
Samuel J. DePalma,
Jourdan K. Ewoldt,
Miranda C. Wang,
Shoshana L. Das,
Javiera Jilberto,
David Nordsletten,
Brendon M. Baker,
Christopher S. Chen,
Emma Lejeune
Abstract:
Movies of human induced pluripotent stem cell (hiPSC)-derived engineered cardiac tissue (microbundles) contain abundant information about structural and functional maturity. However, extracting these data in a reproducible and high-throughput manner remains a major challenge. Furthermore, it is not straightforward to make direct quantitative comparisons across the multiple in vitro experimental pl…
▽ More
Movies of human induced pluripotent stem cell (hiPSC)-derived engineered cardiac tissue (microbundles) contain abundant information about structural and functional maturity. However, extracting these data in a reproducible and high-throughput manner remains a major challenge. Furthermore, it is not straightforward to make direct quantitative comparisons across the multiple in vitro experimental platforms employed to fabricate these tissues. Here, we present "MicroBundlePillarTrack," an open-source optical flow-based package developed in Python to track the deflection of pillars in cardiac microbundles grown on experimental platforms with two different pillar designs ("Type 1" and "Type 2" design). Our software is able to automatically segment both pillars, track their displacements, and output time-dependent metrics for contractility analysis, including beating amplitude and rate, contractile force, and tissue stress. Because this software is fully automated, it will allow for both faster and more reproducible analyses of larger datasets and it will enable more reliable cross-platform comparisons as compared to existing approaches that require manual steps and are tailored to one specific experimental platform. To complement this open-source software, we share a dataset of 1,540 brightfield example movies on which we have tested our software. Through sharing this data and software, our goal is to directly enable quantitative comparisons across labs, and facilitate future collective progress via the biomedical engineering open-source data and software ecosystem.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Morse index of minimal products of minimal submanifolds in spheres
Authors:
Chang** Wang,
Peng Wang
Abstract:
Tang-Zhang, Choe-Hoppe, showed independently that one can produce minimal submanifolds in spheres via Clifford type minimal product of minimal submanifolds. In this note, we show that the minimal product is immersed by its first eigenfunctions (of its Laplacian) if and only if the two beginning minimal submanifolds are immersed by their first eigenfunctions. Moreover, we give estimates of Morse in…
▽ More
Tang-Zhang, Choe-Hoppe, showed independently that one can produce minimal submanifolds in spheres via Clifford type minimal product of minimal submanifolds. In this note, we show that the minimal product is immersed by its first eigenfunctions (of its Laplacian) if and only if the two beginning minimal submanifolds are immersed by their first eigenfunctions. Moreover, we give estimates of Morse index and nullity of the minimal product. In particular, we show that the Clifford minimal submanifold $\left(\sqrt{\frac{n_1}{n}}S^{n_1},\cdots,\sqrt{\frac{n_k}{n}}S^{n_k}\right)\subset S^{n+k-1}$ has index $(k-1)(n+k+1)$ and nullity $(k-1)\sum_{1\leq i<j\leq k}(n_i+1)(n_j+1)$ (with $n=\sum n_j$).
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Efficient Multimodal Large Language Models: A Survey
Authors:
Yizhang **,
Jian Li,
Yexin Liu,
Tianjun Gu,
Kai Wu,
Zhengkai Jiang,
Muyang He,
Bo Zhao,
Xin Tan,
Zhenye Gan,
Yabiao Wang,
Chengjie Wang,
Lizhuang Ma
Abstract:
In the past year, Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering, visual understanding and reasoning. However, the extensive model size and high training and inference costs have hindered the widespread application of MLLMs in academia and industry. Thus, studying efficient and lightweight MLLMs has enormous potential, e…
▽ More
In the past year, Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering, visual understanding and reasoning. However, the extensive model size and high training and inference costs have hindered the widespread application of MLLMs in academia and industry. Thus, studying efficient and lightweight MLLMs has enormous potential, especially in edge computing scenarios. In this survey, we provide a comprehensive and systematic review of the current state of efficient MLLMs. Specifically, we summarize the timeline of representative efficient MLLMs, research state of efficient structures and strategies, and the applications. Finally, we discuss the limitations of current efficient MLLM research and promising future directions. Please refer to our GitHub repository for more details: https://github.com/lijiannuist/Efficient-Multimodal-LLMs-Survey.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
COMET: NFT Price Prediction with Wallet Profiling
Authors:
Tianfu Wang,
Liwei Deng,
Chao Wang,
Jianxun Lian,
Yue Yan,
Nicholas **g Yuan,
Qi Zhang,
Hui Xiong
Abstract:
As the non-fungible token (NFT) market flourishes, price prediction emerges as a pivotal direction for investors gaining valuable insight to maximize returns. However, existing works suffer from a lack of practical definitions and standardized evaluations, limiting their practical application. Moreover, the influence of users' multi-behaviour transactions that are publicly accessible on NFT price…
▽ More
As the non-fungible token (NFT) market flourishes, price prediction emerges as a pivotal direction for investors gaining valuable insight to maximize returns. However, existing works suffer from a lack of practical definitions and standardized evaluations, limiting their practical application. Moreover, the influence of users' multi-behaviour transactions that are publicly accessible on NFT price is still not explored and exhibits challenges. In this paper, we address these gaps by presenting a practical and hierarchical problem definition. This approach unifies both collection-level and token-level task and evaluation methods, which cater to varied practical requirements of investors. To further understand the impact of user behaviours on the variation of NFT price, we propose a general wallet profiling framework and develop a COmmunity enhanced Multi-bEhavior Transaction graph model, named COMET. COMET profiles wallets with a comprehensive view and considers the impact of diverse relations and interactions within the NFT ecosystem on NFT price variations, thereby improving prediction performance. Extensive experiments conducted in our deployed system demonstrate the superiority of COMET, underscoring its potential in the insight toolkit for NFT investors.
△ Less
Submitted 29 May, 2024; v1 submitted 17 May, 2024;
originally announced May 2024.
-
Simultaneous Deep Learning of Myocardium Segmentation and T2 Quantification for Acute Myocardial Infarction MRI
Authors:
Yirong Zhou,
Chengyan Wang,
Mengtian Lu,
Kunyuan Guo,
Zi Wang,
Dan Ruan,
Rui Guo,
Peijun Zhao,
Jianhua Wang,
Naiming Wu,
Jianzhong Lin,
Yinyin Chen,
Hang **,
Lianxin Xie,
Lilan Wu,
Liuhong Zhu,
Jianjun Zhou,
Congbo Cai,
He Wang,
Xiaobo Qu
Abstract:
In cardiac Magnetic Resonance Imaging (MRI) analysis, simultaneous myocardial segmentation and T2 quantification are crucial for assessing myocardial pathologies. Existing methods often address these tasks separately, limiting their synergistic potential. To address this, we propose SQNet, a dual-task network integrating Transformer and Convolutional Neural Network (CNN) components. SQNet features…
▽ More
In cardiac Magnetic Resonance Imaging (MRI) analysis, simultaneous myocardial segmentation and T2 quantification are crucial for assessing myocardial pathologies. Existing methods often address these tasks separately, limiting their synergistic potential. To address this, we propose SQNet, a dual-task network integrating Transformer and Convolutional Neural Network (CNN) components. SQNet features a T2-refine fusion decoder for quantitative analysis, leveraging global features from the Transformer, and a segmentation decoder with multiple local region supervision for enhanced accuracy. A tight coupling module aligns and fuses CNN and Transformer branch features, enabling SQNet to focus on myocardium regions. Evaluation on healthy controls (HC) and acute myocardial infarction patients (AMI) demonstrates superior segmentation dice scores (89.3/89.2) compared to state-of-the-art methods (87.7/87.9). T2 quantification yields strong linear correlations (Pearson coefficients: 0.84/0.93) with label values for HC/AMI, indicating accurate map**. Radiologist evaluations confirm SQNet's superior image quality scores (4.60/4.58 for segmentation, 4.32/4.42 for T2 quantification) over state-of-the-art methods (4.50/4.44 for segmentation, 3.59/4.37 for T2 quantification). SQNet thus offers accurate simultaneous segmentation and quantification, enhancing cardiac disease diagnosis, such as AMI.
△ Less
Submitted 29 May, 2024; v1 submitted 17 May, 2024;
originally announced May 2024.
-
Infrared Image Super-Resolution via Lightweight Information Split Network
Authors:
Shijie Liu,
Kang Yan,
Feiwei Qin,
Changmiao Wang,
Ruiquan Ge,
Kai Zhang,
Jie Huang,
Yong Peng,
** Cao
Abstract:
Single image super-resolution (SR) is an established pixel-level vision task aimed at reconstructing a high-resolution image from its degraded low-resolution counterpart. Despite the notable advancements achieved by leveraging deep neural networks for SR, most existing deep learning architectures feature an extensive number of layers, leading to high computational complexity and substantial memory…
▽ More
Single image super-resolution (SR) is an established pixel-level vision task aimed at reconstructing a high-resolution image from its degraded low-resolution counterpart. Despite the notable advancements achieved by leveraging deep neural networks for SR, most existing deep learning architectures feature an extensive number of layers, leading to high computational complexity and substantial memory demands. These issues become particularly pronounced in the context of infrared image SR, where infrared devices often have stringent storage and computational constraints. To mitigate these challenges, we introduce a novel, efficient, and precise single infrared image SR model, termed the Lightweight Information Split Network (LISN). The LISN comprises four main components: shallow feature extraction, deep feature extraction, dense feature fusion, and high-resolution infrared image reconstruction. A key innovation within this model is the introduction of the Lightweight Information Split Block (LISB) for deep feature extraction. The LISB employs a sequential process to extract hierarchical features, which are then aggregated based on the relevance of the features under consideration. By integrating channel splitting and shift operations, the LISB successfully strikes an optimal balance between enhanced SR performance and a lightweight framework. Comprehensive experimental evaluations reveal that the proposed LISN achieves superior performance over contemporary state-of-the-art methods in terms of both SR quality and model complexity, affirming its efficacy for practical deployment in resource-constrained infrared imaging applications.
△ Less
Submitted 27 May, 2024; v1 submitted 17 May, 2024;
originally announced May 2024.
-
Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors
Authors:
Jiachen Sun,
Changsheng Wang,
Jiongxiao Wang,
Yiwei Zhang,
Chaowei Xiao
Abstract:
Large language models have become increasingly prominent, also signaling a shift towards multimodality as the next frontier in artificial intelligence, where their embeddings are harnessed as prompts to generate textual content. Vision-language models (VLMs) stand at the forefront of this advancement, offering innovative ways to combine visual and textual data for enhanced understanding and intera…
▽ More
Large language models have become increasingly prominent, also signaling a shift towards multimodality as the next frontier in artificial intelligence, where their embeddings are harnessed as prompts to generate textual content. Vision-language models (VLMs) stand at the forefront of this advancement, offering innovative ways to combine visual and textual data for enhanced understanding and interaction. However, this integration also enlarges the attack surface. Patch-based adversarial attack is considered the most realistic threat model in physical vision applications, as demonstrated in many existing literature. In this paper, we propose to address patched visual prompt injection, where adversaries exploit adversarial patches to generate target content in VLMs. Our investigation reveals that patched adversarial prompts exhibit sensitivity to pixel-wise randomization, a trait that remains robust even against adaptive attacks designed to counteract such defenses. Leveraging this insight, we introduce SmoothVLM, a defense mechanism rooted in smoothing techniques, specifically tailored to protect VLMs from the threat of patched visual prompt injectors. Our framework significantly lowers the attack success rate to a range between 0% and 5.0% on two leading VLMs, while achieving around 67.3% to 95.0% context recovery of the benign images, demonstrating a balance between security and usability.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction
Authors:
Yunfan Jiang,
Chen Wang,
Ruohan Zhang,
Jiajun Wu,
Li Fei-Fei
Abstract:
Learning in simulation and transferring the learned policy to the real world has the potential to enable generalist robots. The key challenge of this approach is to address simulation-to-reality (sim-to-real) gaps. Previous methods often require domain-specific knowledge a priori. We argue that a straightforward way to obtain such knowledge is by asking humans to observe and assist robot policy ex…
▽ More
Learning in simulation and transferring the learned policy to the real world has the potential to enable generalist robots. The key challenge of this approach is to address simulation-to-reality (sim-to-real) gaps. Previous methods often require domain-specific knowledge a priori. We argue that a straightforward way to obtain such knowledge is by asking humans to observe and assist robot policy execution in the real world. The robots can then learn from humans to close various sim-to-real gaps. We propose TRANSIC, a data-driven approach to enable successful sim-to-real transfer based on a human-in-the-loop framework. TRANSIC allows humans to augment simulation policies to overcome various unmodeled sim-to-real gaps holistically through intervention and online correction. Residual policies can be learned from human corrections and integrated with simulation policies for autonomous execution. We show that our approach can achieve successful sim-to-real transfer in complex and contact-rich manipulation tasks such as furniture assembly. Through synergistic integration of policies learned in simulation and from humans, TRANSIC is effective as a holistic approach to addressing various, often coexisting sim-to-real gaps. It displays attractive properties such as scaling with human effort. Videos and code are available at https://transic-robot.github.io/
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
In-situ optical vector analysis based on integrated lithium niobate single-sideband modulators
Authors:
Hanke Feng,
Tong Ge,
Yaowen Hu,
Zhenzheng Wang,
Yiwen Zhang,
Zhaoxi Chen,
Ke Zhang,
Wenzhao Sun,
Cheng Wang
Abstract:
Optical vector analysis (OVA) is an enabling technology for comprehensively characterizing both amplitude and phase responses of optical devices or systems. Conventional OVA technologies are mostly based on discrete optoelectronic components, leading to unsatisfactory system sizes, complexity, and stability. They also encounter challenges in revealing the on-chip characteristics of integrated phot…
▽ More
Optical vector analysis (OVA) is an enabling technology for comprehensively characterizing both amplitude and phase responses of optical devices or systems. Conventional OVA technologies are mostly based on discrete optoelectronic components, leading to unsatisfactory system sizes, complexity, and stability. They also encounter challenges in revealing the on-chip characteristics of integrated photonic devices, which are often overwhelmed by the substantial coupling loss and extra spectral response at chip facets. In this work, we demonstrate a miniaturized OVA system for integrated photonics devices based on broadband single sideband (SSB) modulators on a thin-film lithium niobate (LN) platform. The OVA could provide a direct probe of both amplitude and phase responses of photonic devices with kHz-level resolution and tens of terahertz measurement bandwidth. We perform in-situ characterizations of single and coupled microring resonators fabricated on the same chip as the OVA, unfolding their intrinsic loss and coupling states unambiguously. Furthermore, we achieve the direct measurement of collective phase dynamics and density of states of the Bloch modes in a synthetic frequency crystal, by in-situ OVA of a dynamically modulated microring resonator. Our in-situ OVA system provides a compact, high-precision, and broadband solution for characterizing future integrated photonic devices and circuits, with potential applications ranging from optical communications, biosensing, neuromorphic computing, to quantum information processing.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Semantic Communication via Rate Distortion Perception Bottleneck
Authors:
Zihe Zhao,
Chunyue Wang
Abstract:
With the advancement of Artificial Intelligence (AI) technology, next-generation wireless communication network is facing unprecedented challenge. Semantic communication has become a novel solution to address such challenges, with enhancing the efficiency of bandwidth utilization by transmitting meaningful information and filtering out superfluous data. Unfortunately, recent studies have shown tha…
▽ More
With the advancement of Artificial Intelligence (AI) technology, next-generation wireless communication network is facing unprecedented challenge. Semantic communication has become a novel solution to address such challenges, with enhancing the efficiency of bandwidth utilization by transmitting meaningful information and filtering out superfluous data. Unfortunately, recent studies have shown that classical Shannon information theory primarily focuses on the bit-level distortion, which cannot adequately address the perceptual quality issues of data reconstruction at the receiver end. In this work, we consider the impact of semantic-level distortion on semantic communication. We develop an image inference network based on the Information Bottleneck (IB) framework and concurrently establish an image reconstruction network. This network is designed to achieve joint optimization of perception and bit-level distortion, as well as image inference, associated with compressing information. To maintain consistency with the principles of IB for handling high-dimensional data, we employ variational approximation methods to simplify the optimization problem. Finally, we confirm the existence of the rate distortion perception tradeoff within IB framework through experimental analysis conducted on the MNIST dataset.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
PillarNeXt: Improving the 3D detector by introducing Voxel2Pillar feature encoding and extracting multi-scale features
Authors:
Xusheng Li,
Chengliang Wang,
Shumao Wang,
Zhuo Zeng,
Ji Liu
Abstract:
The multi-line LiDAR is widely used in autonomous vehicles, so point cloud-based 3D detectors are essential for autonomous driving. Extracting rich multi-scale features is crucial for point cloud-based 3D detectors in autonomous driving due to significant differences in the size of different types of objects. However, because of the real-time requirements, large-size convolution kernels are rarely…
▽ More
The multi-line LiDAR is widely used in autonomous vehicles, so point cloud-based 3D detectors are essential for autonomous driving. Extracting rich multi-scale features is crucial for point cloud-based 3D detectors in autonomous driving due to significant differences in the size of different types of objects. However, because of the real-time requirements, large-size convolution kernels are rarely used to extract large-scale features in the backbone. Current 3D detectors commonly use feature pyramid networks to obtain large-scale features; however, some objects containing fewer point clouds are further lost during down-sampling, resulting in degraded performance. Since pillar-based schemes require much less computation than voxel-based schemes, they are more suitable for constructing real-time 3D detectors. Hence, we propose the PillarNeXt, a pillar-based scheme. We redesigned the feature encoding, the backbone, and the neck of the 3D detector. We propose the Voxel2Pillar feature encoding, which uses a sparse convolution constructor to construct pillars with richer point cloud features, especially height features. The Voxel2Pillar adds more learnable parameters to the feature encoding, enabling the initial pillars to have higher performance ability. We extract multi-scale and large-scale features in the proposed fully sparse backbone, which does not utilize large-size convolutional kernels; the backbone consists of the proposed multi-scale feature extraction module. The neck consists of the proposed sparse ConvNeXt, whose simple structure significantly improves the performance. We validate the effectiveness of the proposed PillarNeXt on the Waymo Open Dataset, and the object detection accuracy for vehicles, pedestrians, and cyclists is improved. We also verify the effectiveness of each proposed module in detail through ablation studies.
△ Less
Submitted 19 May, 2024; v1 submitted 16 May, 2024;
originally announced May 2024.
-
Analysis of the BraTS 2023 Intracranial Meningioma Segmentation Challenge
Authors:
Dominic LaBella,
Ujjwal Baid,
Omaditya Khanna,
Shan McBurney-Lin,
Ryan McLean,
Pierre Nedelec,
Arif Rashid,
Nourel Hoda Tahon,
Talissa Altes,
Radhika Bhalerao,
Yaseen Dhemesh,
Devon Godfrey,
Fathi Hilal,
Scott Floyd,
Anastasia Janas,
Anahita Fathi Kazerooni,
John Kirkpatrick,
Collin Kent,
Florian Kofler,
Kevin Leu,
Nazanin Maleki,
Bjoern Menze,
Maxence Pajot,
Zachary J. Reitman,
Jeffrey D. Rudie
, et al. (96 additional authors not shown)
Abstract:
We describe the design and results from the BraTS 2023 Intracranial Meningioma Segmentation Challenge. The BraTS Meningioma Challenge differed from prior BraTS Glioma challenges in that it focused on meningiomas, which are typically benign extra-axial tumors with diverse radiologic and anatomical presentation and a propensity for multiplicity. Nine participating teams each developed deep-learning…
▽ More
We describe the design and results from the BraTS 2023 Intracranial Meningioma Segmentation Challenge. The BraTS Meningioma Challenge differed from prior BraTS Glioma challenges in that it focused on meningiomas, which are typically benign extra-axial tumors with diverse radiologic and anatomical presentation and a propensity for multiplicity. Nine participating teams each developed deep-learning automated segmentation models using image data from the largest multi-institutional systematically expert annotated multilabel multi-sequence meningioma MRI dataset to date, which included 1000 training set cases, 141 validation set cases, and 283 hidden test set cases. Each case included T2, T2/FLAIR, T1, and T1Gd brain MRI sequences with associated tumor compartment labels delineating enhancing tumor, non-enhancing tumor, and surrounding non-enhancing T2/FLAIR hyperintensity. Participant automated segmentation models were evaluated and ranked based on a scoring system evaluating lesion-wise metrics including dice similarity coefficient (DSC) and 95% Hausdorff Distance. The top ranked team had a lesion-wise median dice similarity coefficient (DSC) of 0.976, 0.976, and 0.964 for enhancing tumor, tumor core, and whole tumor, respectively and a corresponding average DSC of 0.899, 0.904, and 0.871, respectively. These results serve as state-of-the-art benchmarks for future pre-operative meningioma automated segmentation algorithms. Additionally, we found that 1286 of 1424 cases (90.3%) had at least 1 compartment voxel abutting the edge of the skull-stripped image edge, which requires further investigation into optimal pre-processing face anonymization steps.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
MMFusion: Multi-modality Diffusion Model for Lymph Node Metastasis Diagnosis in Esophageal Cancer
Authors:
Chengyu Wu,
Chengkai Wang,
Yaqi Wang,
Huiyu Zhou,
Yatao Zhang,
Qifeng Wang,
Shuai Wang
Abstract:
Esophageal cancer is one of the most common types of cancer worldwide and ranks sixth in cancer-related mortality. Accurate computer-assisted diagnosis of cancer progression can help physicians effectively customize personalized treatment plans. Currently, CT-based cancer diagnosis methods have received much attention for their comprehensive ability to examine patients' conditions. However, multi-…
▽ More
Esophageal cancer is one of the most common types of cancer worldwide and ranks sixth in cancer-related mortality. Accurate computer-assisted diagnosis of cancer progression can help physicians effectively customize personalized treatment plans. Currently, CT-based cancer diagnosis methods have received much attention for their comprehensive ability to examine patients' conditions. However, multi-modal based methods may likely introduce information redundancy, leading to underperformance. In addition, efficient and effective interactions between multi-modal representations need to be further explored, lacking insightful exploration of prognostic correlation in multi-modality features. In this work, we introduce a multi-modal heterogeneous graph-based conditional feature-guided diffusion model for lymph node metastasis diagnosis based on CT images as well as clinical measurements and radiomics data. To explore the intricate relationships between multi-modal features, we construct a heterogeneous graph. Following this, a conditional feature-guided diffusion approach is applied to eliminate information redundancy. Moreover, we propose a masked relational representation learning strategy, aiming to uncover the latent prognostic correlations and priorities of primary tumor and lymph node image representations. Various experimental results validate the effectiveness of our proposed method. The code is available at https://github.com/wuchengyu123/MMFusion.
△ Less
Submitted 16 May, 2024; v1 submitted 15 May, 2024;
originally announced May 2024.
-
Laser Printing of Silver and Silver Oxide
Authors:
Jordan M. Adams,
Daniel Heligman,
Ryan O'Dell,
Christine Y. Wang,
Daniel Young
Abstract:
We show that direct laser writing (DLW) in aqueous silver nitrate with a 1030 nm femtosecond (fs) laser results in deposition of a mixture of silver oxide and silver, in contrast to the pure silver deposition previously reported with 780 nm fs DLW. However, adding photoinitiator prevents silver oxide formation in a concentration-dependent manner. As a result, the resistivity of the material can al…
▽ More
We show that direct laser writing (DLW) in aqueous silver nitrate with a 1030 nm femtosecond (fs) laser results in deposition of a mixture of silver oxide and silver, in contrast to the pure silver deposition previously reported with 780 nm fs DLW. However, adding photoinitiator prevents silver oxide formation in a concentration-dependent manner. As a result, the resistivity of the material can also be controlled by photoinitiator concentration with resistivity being reduced from approximately 9e-3 $Ωm$ to 3e-7 $Ωm$. Silver oxide peaks dominate the X-ray diffraction spectra when no photoinitiator is present, while the peaks disappear with photoinitiator concentrations above 0.05wt%. While femtosecond pulses are needed to initiate deposition, a continues-wave laser when well overlapped with the previously written material and supplying enough average power can lead to further printing, suggesting thermal deposition can also occur where the photoinitiator molecule also acts as a general reducing agent that prevents oxide formation. We also compare the surface quality of printed lines for different photoinitiator concentrations and laser printing conditions. A THz polarizer and metamaterial are printed as a demonstration of silver oxide printing.
△ Less
Submitted 3 June, 2024; v1 submitted 15 May, 2024;
originally announced May 2024.
-
Dynamic Activation Pitfalls in LLaMA Models: An Empirical Study
Authors:
Chi Ma,
Mincong Huang,
Chao Wang,
Yujie Wang,
Lei Yu
Abstract:
In this work, we systematically investigate the efficacy of dynamic activation mechanisms within the LLaMA family of language models. Despite the potential of dynamic activation methods to reduce computation and increase speed in models using the ReLU activation function, our empirical findings have uncovered several inherent pitfalls in the current dynamic activation schemes. Through extensive ex…
▽ More
In this work, we systematically investigate the efficacy of dynamic activation mechanisms within the LLaMA family of language models. Despite the potential of dynamic activation methods to reduce computation and increase speed in models using the ReLU activation function, our empirical findings have uncovered several inherent pitfalls in the current dynamic activation schemes. Through extensive experiments across various dynamic activation strategies, we demonstrate that LLaMA models usually underperform when compared to their ReLU counterparts, particularly in scenarios demanding high sparsity ratio. We attribute these deficiencies to a combination of factors: 1) the inherent complexity of dynamically predicting activation heads and neurons; 2) the inadequate sparsity resulting from activation functions; 3) the insufficient preservation of information resulting from KV cache skip**. Our analysis not only sheds light on the limitations of dynamic activation in the context of large-scale LLaMA models but also proposes roadmaps for enhancing the design of future sparsity schemes.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Integrated and DC-powered superconducting microcomb
Authors:
Chen-Guang Wang,
Wuyue Xu,
Chong Li,
Lili Shi,
Junliang Jiang,
Tingting Guo,
Wen-Cheng Yue,
Tianyu Li,
** Zhang,
Yang-Yang Lyu,
Jiazheng Pan,
Xiuhao Deng,
Ying Dong,
Xuecou Tu,
Sining Dong,
Chunhai Cao,
Labao Zhang,
Xiaoqing Jia,
Guozhu Sun,
Lin Kang,
Jian Chen,
Yong-Lei Wang,
Huabing Wang,
Peiheng Wu
Abstract:
Frequency combs, specialized laser sources emitting multiple equidistant frequency lines, have revolutionized science and technology with unprecedented precision and versatility. Recently, integrated frequency combs are emerging as scalable solutions for on-chip photonics. Here, we demonstrate a fully integrated superconducting microcomb that is easy to manufacture, simple to operate, and consumes…
▽ More
Frequency combs, specialized laser sources emitting multiple equidistant frequency lines, have revolutionized science and technology with unprecedented precision and versatility. Recently, integrated frequency combs are emerging as scalable solutions for on-chip photonics. Here, we demonstrate a fully integrated superconducting microcomb that is easy to manufacture, simple to operate, and consumes ultra-low power. Our turnkey apparatus comprises a basic nonlinear superconducting device, a Josephson junction, directly coupled to a superconducting microstrip resonator. We showcase coherent comb generation through self-started mode-locking. Therefore, comb emission is initiated solely by activating a DC bias source, with power consumption as low as tens of picowatts. The resulting comb spectrum resides in the microwave domain and spans multiple octaves. The linewidths of all comb lines can be narrowed down to 1 Hz through a unique coherent injection-locking technique. Our work represents a critical step towards fully integrated microwave photonics and offers the potential for integrated quantum processors.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Tunable superconducting resonators via on-chip control of local magnetic field
Authors:
Chen-Guang Wang,
Wen-Cheng Yue,
Xuecou Tu,
Tianyuan Chi,
Tingting Guo,
Yang-Yang Lyu,
Sining Dong,
Chunhai Cao,
Labao Zhang,
Xiaoqing Jia,
Guozhu Sun,
Lin Kang,
Jian Chen,
Yong-Lei Wang,
Huabing Wang,
Peiheng Wu
Abstract:
Superconducting microwave resonators play a pivotal role in superconducting quantum circuits. The ability to fine-tune their resonant frequencies provides enhanced control and flexibility. Here, we introduce a frequency-tunable superconducting coplanar waveguide resonator. By applying electrical currents through specifically designed ground wires, we achieve the generation and control of a localiz…
▽ More
Superconducting microwave resonators play a pivotal role in superconducting quantum circuits. The ability to fine-tune their resonant frequencies provides enhanced control and flexibility. Here, we introduce a frequency-tunable superconducting coplanar waveguide resonator. By applying electrical currents through specifically designed ground wires, we achieve the generation and control of a localized magnetic field on the central line of the resonator, enabling continuous tuning of its resonant frequency. We demonstrate a frequency tuning range of 54.85 MHz in a 6.21 GHz resonator. This integrated and tunable resonator holds great potential as a dynamically tunable filter and as a key component of communication buses and memory elements in superconducting quantum computing.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.