-
3D StreetUnveiler with Semantic-Aware 2DGS
Authors:
**gwei Xu,
Yikai Wang,
Yiqun Zhao,
Yanwei Fu,
Shenghua Gao
Abstract:
Unveiling an empty street from crowded observations captured by in-car cameras is crucial for autonomous driving. However, removing all temporarily static objects, such as stopped vehicles and standing pedestrians, presents a significant challenge. Unlike object-centric 3D inpainting, which relies on thorough observation in a small scene, street scene cases involve long trajectories that differ fr…
▽ More
Unveiling an empty street from crowded observations captured by in-car cameras is crucial for autonomous driving. However, removing all temporarily static objects, such as stopped vehicles and standing pedestrians, presents a significant challenge. Unlike object-centric 3D inpainting, which relies on thorough observation in a small scene, street scene cases involve long trajectories that differ from previous 3D inpainting tasks. The camera-centric moving environment of captured videos further complicates the task due to the limited degree and time duration of object observation. To address these obstacles, we introduce StreetUnveiler to reconstruct an empty street. StreetUnveiler learns a 3D representation of the empty street from crowded observations. Our representation is based on the hard-label semantic 2D Gaussian Splatting (2DGS) for its scalability and ability to identify Gaussians to be removed. We inpaint rendered image after removing unwanted Gaussians to provide pseudo-labels and subsequently re-optimize the 2DGS. Given its temporal continuous movement, we divide the empty street scene into observed, partial-observed, and unobserved regions, which we propose to locate through a rendered alpha map. This decomposition helps us to minimize the regions that need to be inpainted. To enhance the temporal consistency of the inpainting, we introduce a novel time-reversal framework to inpaint frames in reverse order and use later frames as references for earlier frames to fully utilize the long-trajectory observations. Our experiments conducted on the street scene dataset successfully reconstructed a 3D representation of the empty street. The mesh representation of the empty street can be extracted for further applications. The project page and more visualizations can be found at: https://streetunveiler.github.io
△ Less
Submitted 30 May, 2024; v1 submitted 28 May, 2024;
originally announced May 2024.
-
VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation
Authors:
Qilin Wang,
Zhengkai Jiang,
Chengming Xu,
Jiangning Zhang,
Yabiao Wang,
Xinyi Zhang,
Yun Cao,
Weijian Cao,
Chengjie Wang,
Yanwei Fu
Abstract:
Human image animation involves generating a video from a static image by following a specified pose sequence. Current approaches typically adopt a multi-stage pipeline that separately learns appearance and motion, which often leads to appearance degradation and temporal inconsistencies. To address these issues, we propose VividPose, an innovative end-to-end pipeline based on Stable Video Diffusion…
▽ More
Human image animation involves generating a video from a static image by following a specified pose sequence. Current approaches typically adopt a multi-stage pipeline that separately learns appearance and motion, which often leads to appearance degradation and temporal inconsistencies. To address these issues, we propose VividPose, an innovative end-to-end pipeline based on Stable Video Diffusion (SVD) that ensures superior temporal stability. To enhance the retention of human identity, we propose an identity-aware appearance controller that integrates additional facial information without compromising other appearance details such as clothing texture and background. This approach ensures that the generated videos maintain high fidelity to the identity of human subject, preserving key facial features across various poses. To accommodate diverse human body shapes and hand movements, we introduce a geometry-aware pose controller that utilizes both dense rendering maps from SMPL-X and sparse skeleton maps. This enables accurate alignment of pose and shape in the generated videos, providing a robust framework capable of handling a wide range of body shapes and dynamic hand movements. Extensive qualitative and quantitative experiments on the UBCFashion and TikTok benchmarks demonstrate that our method achieves state-of-the-art performance. Furthermore, VividPose exhibits superior generalization capabilities on our proposed in-the-wild dataset. Codes and models will be available.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Towards a Generalist and Blind RGB-X Tracker
Authors:
Yuedong Tan,
Zongwei Wu,
Yuqian Fu,
Zhuyun Zhou,
Guolei Sun,
Chao Ma,
Danda Pani Paudel,
Luc Van Gool,
Radu Timofte
Abstract:
With the emergence of a single large model capable of successfully solving a multitude of tasks in NLP, there has been growing research interest in achieving similar goals in computer vision. On the one hand, most of these generic models, referred to as generalist vision models, aim at producing unified outputs serving different tasks. On the other hand, some existing models aim to combine differe…
▽ More
With the emergence of a single large model capable of successfully solving a multitude of tasks in NLP, there has been growing research interest in achieving similar goals in computer vision. On the one hand, most of these generic models, referred to as generalist vision models, aim at producing unified outputs serving different tasks. On the other hand, some existing models aim to combine different input types (aka data modalities), which are then processed by a single large model. Yet, this step of combination remains specialized, which falls short of serving the initial ambition. In this paper, we showcase that such specialization (during unification) is unnecessary, in the context of RGB-X video object tracking. Our single model tracker, termed XTrack, can remain blind to any modality X during inference time. Our tracker employs a mixture of modal experts comprising those dedicated to shared commonality and others capable of flexibly performing reasoning conditioned on input modality. Such a design ensures the unification of input modalities towards a common latent space, without weakening the modality-specific information representation. With this idea, our training process is extremely simple, integrating multi-label classification loss with a routing function, thereby effectively aligning and unifying all modalities together, even from only paired data. Thus, during inference, we can adopt any modality without relying on the inductive bias of the modal prior and achieve generalist performance. Without any bells and whistles, our generalist and blind tracker can achieve competitive performance compared to well-established modal-specific models on 5 benchmarks across 3 auxiliary modalities, covering commonly used depth, thermal, and event data.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Deciphering Movement: Unified Trajectory Generation Model for Multi-Agent
Authors:
Yi Xu,
Yun Fu
Abstract:
Understanding multi-agent behavior is critical across various fields. The conventional approach involves analyzing agent movements through three primary tasks: trajectory prediction, imputation, and spatial-temporal recovery. Considering the unique input formulation and constraint of these tasks, most existing methods are tailored to address only one specific task. However, in real-world applicati…
▽ More
Understanding multi-agent behavior is critical across various fields. The conventional approach involves analyzing agent movements through three primary tasks: trajectory prediction, imputation, and spatial-temporal recovery. Considering the unique input formulation and constraint of these tasks, most existing methods are tailored to address only one specific task. However, in real-world applications, these scenarios frequently occur simultaneously. Consequently, methods designed for one task often fail to adapt to others, resulting in performance drops. To overcome this limitation, we propose a Unified Trajectory Generation model, UniTraj, that processes arbitrary trajectories as masked inputs, adaptable to diverse scenarios. Specifically, we introduce a Ghost Spatial Masking (GSM) module embedded within a Transformer encoder for spatial feature extraction. We further extend recent successful State Space Models (SSMs), particularly the Mamba model, into a Bidirectional Temporal Mamba to effectively capture temporal dependencies. Additionally, we incorporate a Bidirectional Temporal Scaled (BTS) module to comprehensively scan trajectories while maintaining the temporal missing relationships within the sequence. We curate and benchmark three practical sports game datasets, Basketball-U, Football-U, and Soccer-U, for evaluation. Extensive experiments demonstrate the superior performance of our model. To the best of our knowledge, this is the first work that addresses this unified problem through a versatile generative framework, thereby enhancing our understanding of multi-agent movement. Our datasets, code, and model weights are available at https://github.com/colorfulfuture/UniTraj-pytorch.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Training-free Editioning of Text-to-Image Models
Authors:
**qi Wang,
Yunfei Fu,
Zhangcan Ding,
Bailin Deng,
Yu-Kun Lai,
Yipeng Qin
Abstract:
Inspired by the software industry's practice of offering different editions or versions of a product tailored to specific user groups or use cases, we propose a novel task, namely, training-free editioning, for text-to-image models. Specifically, we aim to create variations of a base text-to-image model without retraining, enabling the model to cater to the diverse needs of different user groups o…
▽ More
Inspired by the software industry's practice of offering different editions or versions of a product tailored to specific user groups or use cases, we propose a novel task, namely, training-free editioning, for text-to-image models. Specifically, we aim to create variations of a base text-to-image model without retraining, enabling the model to cater to the diverse needs of different user groups or to offer distinct features and functionalities. To achieve this, we propose that different editions of a given text-to-image model can be formulated as concept subspaces in the latent space of its text encoder (e.g., CLIP). In such a concept subspace, all points satisfy a specific user need (e.g., generating images of a cat lying on the grass/ground/falling leaves). Technically, we apply Principal Component Analysis (PCA) to obtain the desired concept subspaces from representative text embedding that correspond to a specific user need or requirement. Projecting the text embedding of a given prompt into these low-dimensional subspaces enables efficient model editioning without retraining. Intuitively, our proposed editioning paradigm enables a service provider to customize the base model into its "cat edition" (or other editions) that restricts image generation to cats, regardless of the user's prompt (e.g., dogs, people, etc.). This introduces a new dimension for product differentiation, targeted functionality, and pricing strategies, unlocking novel business models for text-to-image generators. Extensive experimental results demonstrate the validity of our approach and its potential to enable a wide range of customized text-to-image model editions across various domains and applications.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Unsupervised Generative Feature Transformation via Graph Contrastive Pre-training and Multi-objective Fine-tuning
Authors:
Wangyang Ying,
Dongjie Wang,
Xuanming Hu,
Yuanchun Zhou,
Charu C. Aggarwal,
Yanjie Fu
Abstract:
Feature transformation is to derive a new feature set from original features to augment the AI power of data. In many science domains such as material performance screening, while feature transformation can model material formula interactions and compositions and discover performance drivers, supervised labels are collected from expensive and lengthy experiments. This issue motivates an Unsupervis…
▽ More
Feature transformation is to derive a new feature set from original features to augment the AI power of data. In many science domains such as material performance screening, while feature transformation can model material formula interactions and compositions and discover performance drivers, supervised labels are collected from expensive and lengthy experiments. This issue motivates an Unsupervised Feature Transformation Learning (UFTL) problem. Prior literature, such as manual transformation, supervised feedback guided search, and PCA, either relies on domain knowledge or expensive supervised feedback, or suffers from large search space, or overlooks non-linear feature-feature interactions. UFTL imposes a major challenge on existing methods: how to design a new unsupervised paradigm that captures complex feature interactions and avoids large search space? To fill this gap, we connect graph, contrastive, and generative learning to develop a measurement-pretrain-finetune paradigm for UFTL. For unsupervised feature set utility measurement, we propose a feature value consistency preservation perspective and develop a mean discounted cumulative gain like unsupervised metric to evaluate feature set utility. For unsupervised feature set representation pretraining, we regard a feature set as a feature-feature interaction graph, and develop an unsupervised graph contrastive learning encoder to embed feature sets into vectors. For generative transformation finetuning, we regard a feature set as a feature cross sequence and feature transformation as sequential generation. We develop a deep generative feature transformation model that coordinates the pretrained feature set encoder and the gradient information extracted from a feature set utility evaluator to optimize a transformed feature generator.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Image-Text-Image Knowledge Transferring for Lifelong Person Re-Identification with Hybrid Clothing States
Authors:
Qizao Wang,
Xuelin Qian,
Bin Li,
Yanwei Fu,
Xiangyang Xue
Abstract:
With the continuous expansion of intelligent surveillance networks, lifelong person re-identification (LReID) has received widespread attention, pursuing the need of self-evolution across different domains. However, existing LReID studies accumulate knowledge with the assumption that people would not change their clothes. In this paper, we propose a more practical task, namely lifelong person re-i…
▽ More
With the continuous expansion of intelligent surveillance networks, lifelong person re-identification (LReID) has received widespread attention, pursuing the need of self-evolution across different domains. However, existing LReID studies accumulate knowledge with the assumption that people would not change their clothes. In this paper, we propose a more practical task, namely lifelong person re-identification with hybrid clothing states (LReID-Hybrid), which takes a series of cloth-changing and cloth-consistent domains into account during lifelong learning. To tackle the challenges of knowledge granularity mismatch and knowledge presentation mismatch that occurred in LReID-Hybrid, we take advantage of the consistency and generalization of the text space, and propose a novel framework, dubbed $Teata$, to effectively align, transfer and accumulate knowledge in an "image-text-image" closed loop. Concretely, to achieve effective knowledge transfer, we design a Structured Semantic Prompt (SSP) learning to decompose the text prompt into several structured pairs to distill knowledge from the image space with a unified granularity of text description. Then, we introduce a Knowledge Adaptation and Projection strategy (KAP), which tunes text knowledge via a slow-paced learner to adapt to different tasks without catastrophic forgetting. Extensive experiments demonstrate the superiority of our proposed $Teata$ for LReID-Hybrid as well as on conventional LReID benchmarks over advanced methods.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
Content and Salient Semantics Collaboration for Cloth-Changing Person Re-Identification
Authors:
Qizao Wang,
Xuelin Qian,
Bin Li,
Lifeng Chen,
Yanwei Fu,
Xiangyang Xue
Abstract:
Cloth-changing person Re-IDentification (Re-ID) aims at recognizing the same person with clothing changes across non-overlap** cameras. Conventional person Re-ID methods usually bias the model's focus on cloth-related appearance features rather than identity-sensitive features associated with biological traits. Recently, advanced cloth-changing person Re-ID methods either resort to identity-rela…
▽ More
Cloth-changing person Re-IDentification (Re-ID) aims at recognizing the same person with clothing changes across non-overlap** cameras. Conventional person Re-ID methods usually bias the model's focus on cloth-related appearance features rather than identity-sensitive features associated with biological traits. Recently, advanced cloth-changing person Re-ID methods either resort to identity-related auxiliary modalities (e.g., sketches, silhouettes, keypoints and 3D shapes) or clothing labels to mitigate the impact of clothes. However, relying on unpractical and inflexible auxiliary modalities or annotations limits their real-world applicability. In this paper, we promote cloth-changing person Re-ID by effectively leveraging abundant semantics present within pedestrian images without the need for any auxiliaries. Specifically, we propose the Content and Salient Semantics Collaboration (CSSC) framework, facilitating cross-parallel semantics interaction and refinement. Our framework is simple yet effective, and the vital design is the Semantics Mining and Refinement (SMR) module. It extracts robust identity features about content and salient semantics, while mitigating interference from clothing appearances effectively. By capitalizing on the mined abundant semantic features, our proposed approach achieves state-of-the-art performance on three cloth-changing benchmarks as well as conventional benchmarks, demonstrating its superiority over advanced competitors.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
Evolutionary Large Language Model for Automated Feature Transformation
Authors:
Nanxu Gong,
Chandan K. Reddy,
Wangyang Ying,
Yanjie Fu
Abstract:
Feature transformation aims to reconstruct the feature space of raw features to enhance the performance of downstream models. However, the exponential growth in the combinations of features and operations poses a challenge, making it difficult for existing methods to efficiently explore a wide space. Additionally, their optimization is solely driven by the accuracy of downstream models in specific…
▽ More
Feature transformation aims to reconstruct the feature space of raw features to enhance the performance of downstream models. However, the exponential growth in the combinations of features and operations poses a challenge, making it difficult for existing methods to efficiently explore a wide space. Additionally, their optimization is solely driven by the accuracy of downstream models in specific domains, neglecting the acquisition of general feature knowledge. To fill this research gap, we propose an evolutionary LLM framework for automated feature transformation. This framework consists of two parts: 1) constructing a multi-population database through an RL data collector while utilizing evolutionary algorithm strategies for database maintenance, and 2) utilizing the ability of Large Language Model (LLM) in sequence understanding, we employ few-shot prompts to guide LLM in generating superior samples based on feature transformation sequence distinction. Leveraging the multi-population database initially provides a wide search scope to discover excellent populations. Through culling and evolution, the high-quality populations are afforded greater opportunities, thereby furthering the pursuit of optimal individuals. Through the integration of LLMs with evolutionary algorithms, we achieve efficient exploration within a vast space, while harnessing feature knowledge to propel optimization, thus realizing a more adaptable search paradigm. Finally, we empirically demonstrate the effectiveness and generality of our proposed method.
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
Towards Global Optimal Visual In-Context Learning Prompt Selection
Authors:
Chengming Xu,
Chen Liu,
Yikai Wang,
Yanwei Fu
Abstract:
Visual In-Context Learning (VICL) is a prevailing way to transfer visual foundation models to new tasks by leveraging contextual information contained in in-context examples to enhance learning and prediction of query sample. The fundamental problem in VICL is how to select the best prompt to activate its power as much as possible, which is equivalent to the ranking problem to test the in-context…
▽ More
Visual In-Context Learning (VICL) is a prevailing way to transfer visual foundation models to new tasks by leveraging contextual information contained in in-context examples to enhance learning and prediction of query sample. The fundamental problem in VICL is how to select the best prompt to activate its power as much as possible, which is equivalent to the ranking problem to test the in-context behavior of each candidate in the alternative set and select the best one. To utilize more appropriate ranking metric and leverage more comprehensive information among the alternative set, we propose a novel in-context example selection framework to approximately identify the global optimal prompt, i.e. choosing the best performing in-context examples from all alternatives for each query sample. Our method, dubbed Partial2Global, adopts a transformer-based list-wise ranker to provide a more comprehensive comparison within several alternatives, and a consistency-aware ranking aggregator to generate globally consistent ranking. The effectiveness of Partial2Global is validated through experiments on foreground segmentation, single object detection and image colorization, demonstrating that Partial2Global selects consistently better in-context examples compared with other methods, and thus establish the new state-of-the-arts.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Cross-Task Defense: Instruction-Tuning LLMs for Content Safety
Authors:
Yu Fu,
Wen Xiao,
Jia Chen,
Jiachen Li,
Evangelos Papalexakis,
Aichi Chien,
Yue Dong
Abstract:
Recent studies reveal that Large Language Models (LLMs) face challenges in balancing safety with utility, particularly when processing long texts for NLP tasks like summarization and translation. Despite defenses against malicious short questions, the ability of LLMs to safely handle dangerous long content, such as manuals teaching illicit activities, remains unclear. Our work aims to develop robu…
▽ More
Recent studies reveal that Large Language Models (LLMs) face challenges in balancing safety with utility, particularly when processing long texts for NLP tasks like summarization and translation. Despite defenses against malicious short questions, the ability of LLMs to safely handle dangerous long content, such as manuals teaching illicit activities, remains unclear. Our work aims to develop robust defenses for LLMs in processing malicious documents alongside benign NLP task queries. We introduce a defense dataset comprised of safety-related examples and propose single-task and mixed-task losses for instruction tuning. Our empirical results demonstrate that LLMs can significantly enhance their capacity to safely manage dangerous content with appropriate instruction tuning. Additionally, strengthening the defenses of tasks most susceptible to misuse is effective in protecting LLMs against processing harmful information. We also observe that trade-offs between utility and safety exist in defense strategies, where Llama2, utilizing our proposed approach, displays a significantly better balance compared to Llama1.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
A Solution-based LLM API-using Methodology for Academic Information Seeking
Authors:
Yuanchun Wang,
Jifan Yu,
Zijun Yao,
**g Zhang,
Yuyang Xie,
Shangqing Tu,
Yiyang Fu,
Youhe Feng,
**kai Zhang,
**gyao Zhang,
Bowen Huang,
Yuanyao Li,
Huihui Yuan,
Lei Hou,
Juanzi Li,
Jie Tang
Abstract:
Applying large language models (LLMs) for academic API usage shows promise in reducing researchers' academic information seeking efforts. However, current LLM API-using methods struggle with complex API coupling commonly encountered in academic queries. To address this, we introduce SoAy, a solution-based LLM API-using methodology for academic information seeking. It uses code with a solution as t…
▽ More
Applying large language models (LLMs) for academic API usage shows promise in reducing researchers' academic information seeking efforts. However, current LLM API-using methods struggle with complex API coupling commonly encountered in academic queries. To address this, we introduce SoAy, a solution-based LLM API-using methodology for academic information seeking. It uses code with a solution as the reasoning method, where a solution is a pre-constructed API calling sequence. The addition of the solution reduces the difficulty for the model to understand the complex relationships between APIs. Code improves the efficiency of reasoning.
To evaluate SoAy, we introduce SoAyBench, an evaluation benchmark accompanied by SoAyEval, built upon a cloned environment of APIs from AMiner. Experimental results demonstrate a 34.58-75.99\% performance improvement compared to state-of-the-art LLM API-based baselines. All datasets, codes, tuned models, and deployed online services are publicly accessible at https://github.com/RUCKBReasoning/SoAy.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Ultra-sensitive solid-state organic molecular microwave quantum receiver
Authors:
Bo Zhang,
Yuchen Han,
Hong-Liang Wu,
Hao Wu,
Shuo Yang,
Mark Oxborrow,
Qing Zhao,
Yue Fu,
Weibin Li,
Yeliang Wang,
Dezhi Zheng,
Jun Zhang
Abstract:
High-accuracy microwave sensing is widely demanded in various fields, ranging from cosmology to microwave quantum technology. Quantum receivers based on inorganic solid-state spin systems are promising candidates for such purpose because of the stability and compatibility, but their best sensitivity is currently limited to a few pT/$\sqrt{\rm{Hz}}$. Here, by utilising an enhanced readout scheme wi…
▽ More
High-accuracy microwave sensing is widely demanded in various fields, ranging from cosmology to microwave quantum technology. Quantum receivers based on inorganic solid-state spin systems are promising candidates for such purpose because of the stability and compatibility, but their best sensitivity is currently limited to a few pT/$\sqrt{\rm{Hz}}$. Here, by utilising an enhanced readout scheme with the state-of-the-art solid-state maser technology, we develop a robust microwave quantum receiver functioned by organic molecular spins at ambient conditions. Owing to the maser amplification, the sensitivity of the receiver achieves 6.14 $\pm$ 0.17 fT/$\sqrt{\rm{Hz}}$ which exceeds three orders of magnitude than that of the inorganic solid-state quantum receivers. The heterodyne detection without additional local oscillators improves bandwidth of the receiver and allows frequency detection. The scheme can be extended to other solid-state spin systems without complicated control pulses and thus enables practical applications such as electron spin resonance spectroscopy, dark matter searches, and astronomical observations.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
TopoLogic: An Interpretable Pipeline for Lane Topology Reasoning on Driving Scenes
Authors:
Yan** Fu,
Wenbin Liao,
Xinyuan Liu,
Hang xu,
Yike Ma,
Feng Dai,
Yucheng Zhang
Abstract:
As an emerging task that integrates perception and reasoning, topology reasoning in autonomous driving scenes has recently garnered widespread attention. However, existing work often emphasizes "perception over reasoning": they typically boost reasoning performance by enhancing the perception of lanes and directly adopt MLP to learn lane topology from lane query. This paradigm overlooks the geomet…
▽ More
As an emerging task that integrates perception and reasoning, topology reasoning in autonomous driving scenes has recently garnered widespread attention. However, existing work often emphasizes "perception over reasoning": they typically boost reasoning performance by enhancing the perception of lanes and directly adopt MLP to learn lane topology from lane query. This paradigm overlooks the geometric features intrinsic to the lanes themselves and are prone to being influenced by inherent endpoint shifts in lane detection.
To tackle this issue, we propose an interpretable method for lane topology reasoning based on lane geometric distance and lane query similarity, named TopoLogic.
This method mitigates the impact of endpoint shifts in geometric space, and introduces explicit similarity calculation in semantic space as a complement. By integrating results from both spaces, our methods provides more comprehensive information for lane topology.
Ultimately, our approach significantly outperforms the existing state-of-the-art methods on the mainstream benchmark OpenLane-V2 (23.9 v.s. 10.9 in TOP$_{ll}$ and 44.1 v.s. 39.8 in OLS on subset_A. Additionally, our proposed geometric distance topology reasoning method can be incorporated into well-trained models without re-training, significantly boost the performance of lane topology reasoning. The code is released at https://github.com/Franpin/TopoLogic.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Algebraic Independence of Special Points on Shimura Varieties
Authors:
Yu Fu,
Roy Zhao
Abstract:
Given a correspondence $V$ between a connected Shimura variety $S$, a commutative connected algebraic group $G$, and $n \in \mathbb{N}$, we prove that the $V$-images of any $n$ special points on $S$ outside a proper Zariski closed subset are algebraically independent. Our result unifies previous unlikely intersection results on multiplicative independence and linear independence. We prove multipli…
▽ More
Given a correspondence $V$ between a connected Shimura variety $S$, a commutative connected algebraic group $G$, and $n \in \mathbb{N}$, we prove that the $V$-images of any $n$ special points on $S$ outside a proper Zariski closed subset are algebraically independent. Our result unifies previous unlikely intersection results on multiplicative independence and linear independence. We prove multiplicative independence of differences of singular moduli, generalizing previous results by Pila-Tsimerman, and Aslanlyan-Eterović-Fowler. We also give an application to abelian varieties by proving that the special points of $S$ whose $V$-images lie in a finite-rank subgroup of $T$ are contained in a finite union of proper special subvarieties of $S$, only dependent on the rank of the subgroup. In this way, our result is a generalization of the works of Pila-Tsimerman and Buium-Poonen.
△ Less
Submitted 16 June, 2024; v1 submitted 22 May, 2024;
originally announced May 2024.
-
Euclid. I. Overview of the Euclid mission
Authors:
Euclid Collaboration,
Y. Mellier,
Abdurro'uf,
J. A. Acevedo Barroso,
A. Achúcarro,
J. Adamek,
R. Adam,
G. E. Addison,
N. Aghanim,
M. Aguena,
V. Ajani,
Y. Akrami,
A. Al-Bahlawan,
A. Alavi,
I. S. Albuquerque,
G. Alestas,
G. Alguero,
A. Allaoui,
S. W. Allen,
V. Allevato,
A. V. Alonso-Tetilla,
B. Altieri,
A. Alvarez-Candal,
A. Amara,
L. Amendola
, et al. (1086 additional authors not shown)
Abstract:
The current standard model of cosmology successfully describes a variety of measurements, but the nature of its main ingredients, dark matter and dark energy, remains unknown. Euclid is a medium-class mission in the Cosmic Vision 2015-2025 programme of the European Space Agency (ESA) that will provide high-resolution optical imaging, as well as near-infrared imaging and spectroscopy, over about 14…
▽ More
The current standard model of cosmology successfully describes a variety of measurements, but the nature of its main ingredients, dark matter and dark energy, remains unknown. Euclid is a medium-class mission in the Cosmic Vision 2015-2025 programme of the European Space Agency (ESA) that will provide high-resolution optical imaging, as well as near-infrared imaging and spectroscopy, over about 14,000 deg^2 of extragalactic sky. In addition to accurate weak lensing and clustering measurements that probe structure formation over half of the age of the Universe, its primary probes for cosmology, these exquisite data will enable a wide range of science. This paper provides a high-level overview of the mission, summarising the survey characteristics, the various data-processing steps, and data products. We also highlight the main science objectives and expected performance.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Study of the decays $χ_{cJ}\toΛ\barΛω$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using $(27.12\pm 0.14)\times10^{8}$ $ψ(3686)$ events collected with the BESIII detector, we present the first observation of the decays $χ_{cJ}\toΛ\barΛω$, where $J=0, 1, 2$, with statistical significances of $11.7 σ, 11.2 σ$, and $11.8 σ$. The branching fractions of these decays are determined to be $\mathcal{B}(χ_{c0}\toΛ\barΛω)=({2.37 \pm 0.22 \pm 0.23}) \times 10^{-4}$,…
▽ More
Using $(27.12\pm 0.14)\times10^{8}$ $ψ(3686)$ events collected with the BESIII detector, we present the first observation of the decays $χ_{cJ}\toΛ\barΛω$, where $J=0, 1, 2$, with statistical significances of $11.7 σ, 11.2 σ$, and $11.8 σ$. The branching fractions of these decays are determined to be $\mathcal{B}(χ_{c0}\toΛ\barΛω)=({2.37 \pm 0.22 \pm 0.23}) \times 10^{-4}$, $\mathcal{B}(χ_{c1}\toΛ\barΛω)=({1.01 \pm 0.10 \pm 0.11}) \times 10^{-4}$, and $\mathcal{B}(χ_{c2}\toΛ\barΛω)=({1.40 \pm 0.13 \pm 0.17}) \times 10^{-4}$, where the first uncertainties are statistical and the second are systematic. We observe no clear intermediate structures.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Precision measurement of the branching fraction of \boldmath $J/ψ\rightarrow K^+K^-$ via $ψ(2S)\rightarrow π^+π^-J/ψ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (604 additional authors not shown)
Abstract:
Using a sample of $448.1 \times 10^6$ $ψ(2S)$ events collected with the BESIII detector, we perform a study of the decay $J/ψ\rightarrow K^+K^-$ via $ψ(2S)\rightarrow π^+π^-J/ψ$.
The branching fraction of $J/ψ\rightarrow K^+K^-$ is determined to be $\mathcal{B}_{K^+K^-}=(3.072\pm 0.023({\rm stat.})\pm 0.050({\rm syst.}))\times 10^{-4}$, which is consistent with previous measurements but with sig…
▽ More
Using a sample of $448.1 \times 10^6$ $ψ(2S)$ events collected with the BESIII detector, we perform a study of the decay $J/ψ\rightarrow K^+K^-$ via $ψ(2S)\rightarrow π^+π^-J/ψ$.
The branching fraction of $J/ψ\rightarrow K^+K^-$ is determined to be $\mathcal{B}_{K^+K^-}=(3.072\pm 0.023({\rm stat.})\pm 0.050({\rm syst.}))\times 10^{-4}$, which is consistent with previous measurements but with significantly improved precision.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Fast Estimation of Relative Transformation Based on Fusion of Odometry and UWB Ranging Data
Authors:
Yuan Fu,
Zheng Zhang,
Guangyang Zeng,
Chun Liu,
Junfeng Wu,
Xiaoqiang Ren
Abstract:
In this paper, we investigate the problem of estimating the 4-DOF (three-dimensional position and orientation) robot-robot relative frame transformation using odometers and distance measurements between robots. Firstly, we apply a two-step estimation method based on maximum likelihood estimation. Specifically, a good initial value is obtained through unconstrained least squares and projection, fol…
▽ More
In this paper, we investigate the problem of estimating the 4-DOF (three-dimensional position and orientation) robot-robot relative frame transformation using odometers and distance measurements between robots. Firstly, we apply a two-step estimation method based on maximum likelihood estimation. Specifically, a good initial value is obtained through unconstrained least squares and projection, followed by a more accurate estimate achieved through one-step Gauss-Newton iteration. Additionally, the optimal installation positions of Ultra-Wideband (UWB) are provided, and the minimum operating time under different quantities of UWB devices is determined. Simulation demonstrates that the two-step approach offers faster computation with guaranteed accuracy while effectively addressing the relative transformation estimation problem within limited space constraints. Furthermore, this method can be applied to real-time relative transformation estimation when a specific number of UWB devices are installed.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Wrinkling of differentially growing bilayers with similar film and substrate moduli
Authors:
Jiajia Shen,
Yibin Fu,
Alberto Pirrera,
Rainer M. J. Groh
Abstract:
The study of growth-induced surface wrinkling in constrained bilayers comprising a thin film attached to a thick substrate is a canonical model for understanding pattern formation in many biological systems. While the bilayer model has received much prior attention, the nonlinear behaviour for arrangements with similar film and substrate properties, or substrate growth that outpaces film growth, r…
▽ More
The study of growth-induced surface wrinkling in constrained bilayers comprising a thin film attached to a thick substrate is a canonical model for understanding pattern formation in many biological systems. While the bilayer model has received much prior attention, the nonlinear behaviour for arrangements with similar film and substrate properties, or substrate growth that outpaces film growth, remains poorly understood. This paper therefore focuses on these cases in which the substrate's elasticity dominates surface wrinkling. We study the critical states, and the initial and advanced post-critical behaviour of growing bilayers with film-to-substrate modulus ratios in the region of $2.5$--$50$, and cases where the substrate grows faster than the film. Based on nonlinear elasticity, we formulate analytical models for linear buckling analyses and asymptotic projections around the critical point, and use finite element (FE) models coupled to continuation and branch-switching algorithms to uncover the deep post-critical regime. It is shown that a rapidly growing substrate may change the critical mode from film-governed sinusoidal wrinkling to substrate-governed Biot wrinkling depending on the stiffness ratio and growth ratio. We present a phase change diagram of the post-critical modal landscape split into sinusoidal wrinkling, period doubling, period quadrupling, and creasing regimes in terms of the stiffness ratio and growth ratio. While the post-critical regime of film- and substrate-dominated bilayers (either in terms of dominant elasticity or growth rate) is governed by sinusoidal wrinkling and Biot creasing, respectively, the intermediate regions allow for period doubling and quadrupling bifurcations. Finally, we demonstrate the existence of multi-stability in the advanced post-buckling regimes for growing bilayers where growth in the substrate surpasses that of the film.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Braiding Topology of Non-Hermitian Open-Boundary Bands
Authors:
Yongxu Fu,
Yi Zhang
Abstract:
There has been much recent interest and progress on topological structures of the non-Hermitian Bloch bands. Here, we study the topological structures of non-Bloch bands of non-Hermitian multi-band quantum systems under open boundary conditions, which has received limited attention in prior studies. Using a continuity criterion and an efficient sub-GBZ algorithm, we establish a homotopic character…
▽ More
There has been much recent interest and progress on topological structures of the non-Hermitian Bloch bands. Here, we study the topological structures of non-Bloch bands of non-Hermitian multi-band quantum systems under open boundary conditions, which has received limited attention in prior studies. Using a continuity criterion and an efficient sub-GBZ algorithm, we establish a homotopic characterization -- braiding topology, e.g., characterized by the band's total vorticity -- for open-boundary bands and sub-GBZs. Such topological identification is robust without topological transition and emergent degenerate points, such as exceptional points. We further analyze the transition's impact on bands and spectral flows, including interesting properties unique to open boundaries, and numerically demonstrate our conclusions with tight-binding model examples. We unveil a crucial insight that open-boundary bands interchange their portions after encountering certain exceptional points. Our results enrich our foundational understanding of topological characterizations for generic non-Hermitian quantum systems.
△ Less
Submitted 11 June, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
Improved measurement of the branching fraction of $h_{c}\rightarrowγη^\prime/η$ and search for $h_{c}\rightarrowγπ^0$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (645 additional authors not shown)
Abstract:
The processes $h_c\rightarrowγP(P = η^\prime,~η,~π^{0}))$ are studied with a sample of $(27.12\pm0.14)\times10^{8}$ $ψ(3686)$ events collected by the BESIII detector at the BEPCII collider. The branching fractions of $h_c\rightarrowγη^\prime$ and $h_c\rightarrowγη$ are measured to be $(1.40\pm0.11\pm0.04\pm0.10)\times10^{-3}$ and $(3.77\pm0.55\pm0.13\pm0.26)\times10^{-4}$, respectively, where the…
▽ More
The processes $h_c\rightarrowγP(P = η^\prime,~η,~π^{0}))$ are studied with a sample of $(27.12\pm0.14)\times10^{8}$ $ψ(3686)$ events collected by the BESIII detector at the BEPCII collider. The branching fractions of $h_c\rightarrowγη^\prime$ and $h_c\rightarrowγη$ are measured to be $(1.40\pm0.11\pm0.04\pm0.10)\times10^{-3}$ and $(3.77\pm0.55\pm0.13\pm0.26)\times10^{-4}$, respectively, where the first uncertainties are statistical, the second systematic, and the third from the branching fraction of $ψ(3686)\rightarrowπ^{0}h_c$. The ratio $R_{h_c}=\frac{\mathscr{B}(h_c\rightarrowγη)}{\mathscr{B}(h_c\rightarrowγη^\prime)}$ is calculated to be $(27.0\pm4.4\pm1.0)\%$. The measurements are consistent with the previous results with improved precision by a factor of 2. The results are valuable for gaining a deeper understanding of $η-η^\prime$ mixing, and its manifestation within quantum chromodynamics. No significant signal is found for the decay $h_c\rightarrowγπ^{0}$, and an upper limit is placed on its branching fraction of $\mathscr{B}(h_c\rightarrowγπ^{0})<5.0\times10^{-5}$, at the 90\% confidence level.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Analysis of singularly perturbed stochastic chemical reaction networks motivated by applications to epigenetic cell memory
Authors:
Simone Bruno,
Felipe A. Campos,
Yi Fu,
Domitilla Del Vecchio,
Ruth J. Williams
Abstract:
Epigenetic cell memory, the inheritance of gene expression patterns across subsequent cell divisions, is a critical property of multi-cellular organisms. In recent work [10], a subset of the authors observed in a simulation study how the stochastic dynamics and time-scale differences between establishment and erasure processes in chromatin modifications (such as histone modifications and DNA methy…
▽ More
Epigenetic cell memory, the inheritance of gene expression patterns across subsequent cell divisions, is a critical property of multi-cellular organisms. In recent work [10], a subset of the authors observed in a simulation study how the stochastic dynamics and time-scale differences between establishment and erasure processes in chromatin modifications (such as histone modifications and DNA methylation) can have a critical effect on epigenetic cell memory. In this paper, we provide a mathematical framework to rigorously validate and extend beyond these computational findings. Viewing our stochastic model of a chromatin modification circuit as a singularly perturbed, finite state, continuous time Markov chain, we extend beyond existing theory in order to characterize the leading coefficients in the series expansions of stationary distributions and mean first passage times. In particular, we characterize the limiting stationary distribution in terms of a reduced Markov chain, provide an algorithm to determine the orders of the poles of mean first passage times, and determine how changing erasure rates affects system behavior. The theoretical tools developed in this paper not only allow us to set a rigorous mathematical basis for the computational findings of our prior work, highlighting the effect of chromatin modification dynamics on epigenetic cell memory, but they can also be applied to other singularly perturbed Markov chains beyond the applications in this paper, especially those associated with chemical reaction networks.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
A Comprehensive Survey on Data Augmentation
Authors:
Zaitian Wang,
Pengfei Wang,
Kunpeng Liu,
Pengyang Wang,
Yanjie Fu,
Chang-Tien Lu,
Charu C. Aggarwal,
Jian Pei,
Yuanchun Zhou
Abstract:
Data augmentation is a series of techniques that generate high-quality artificial data by manipulating existing data samples. By leveraging data augmentation techniques, AI models can achieve significantly improved applicability in tasks involving scarce or imbalanced datasets, thereby substantially enhancing AI models' generalization capabilities. Existing literature surveys only focus on a certa…
▽ More
Data augmentation is a series of techniques that generate high-quality artificial data by manipulating existing data samples. By leveraging data augmentation techniques, AI models can achieve significantly improved applicability in tasks involving scarce or imbalanced datasets, thereby substantially enhancing AI models' generalization capabilities. Existing literature surveys only focus on a certain type of specific modality data, and categorize these methods from modality-specific and operation-centric perspectives, which lacks a consistent summary of data augmentation methods across multiple modalities and limits the comprehension of how existing data samples serve the data augmentation process. To bridge this gap, we propose a more enlightening taxonomy that encompasses data augmentation techniques for different common data modalities. Specifically, from a data-centric perspective, this survey proposes a modality-independent taxonomy by investigating how to take advantage of the intrinsic relationship between data samples, including single-wise, pair-wise, and population-wise sample data augmentation methods. Additionally, we categorize data augmentation methods across five data modalities through a unified inductive approach.
△ Less
Submitted 17 May, 2024; v1 submitted 15 May, 2024;
originally announced May 2024.
-
Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance Analysis
Authors:
Yao Fu
Abstract:
Transformer-based long context generative models power emerging AI applications like hour-long video understanding and project-level coding agent. Deploying long context transformers (e.g., 100K to 10M tokens) is prohibitively expensive compared to short context (e.g., 4K tokens) model variants. Reducing the cost of long-context transformers is becoming a pressing research and engineering challeng…
▽ More
Transformer-based long context generative models power emerging AI applications like hour-long video understanding and project-level coding agent. Deploying long context transformers (e.g., 100K to 10M tokens) is prohibitively expensive compared to short context (e.g., 4K tokens) model variants. Reducing the cost of long-context transformers is becoming a pressing research and engineering challenge starting from the year of 2024. This work describes a concurrent programming framework for quantitatively analyzing the efficiency challenges in serving multiple long-context requests under limited size of GPU high-bandwidth memory (HBM) regime. We give a detailed analysis of how all additional computational costs, compared to 4K context, trace back to \textit{one single source: the large size of the KV cache}. We use a 34B GPT-3.5 level model of 50K context on A100 NVLink as a running example, and describe how its large KV cache causes four types of deployment challenges: (1) prefilling long inputs takes much longer compute time and GPU memory than short inputs; (2) after prefilling, the large KV cache residing on the GPU HBM substantially restricts the number of concurrent users being served; (3) during decoding, repeatedly reading the KV cache from HBM to SM largely increases latency; (4) when KV cache memory overflows, swap** it from HBM to DDR causes significant context switching latency. We use this framework to analyze existing works and identify possibilities of combining them to build end-to-end systems. Overall, this work offers a foundational framework for analyzing long context transformer deployment and identifies directions towards reducing the inference cost of 1M context to be as cheap as 4K.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Search for the radiative transition $χ_{c1}(3872)\toγψ_2(3823)$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (635 additional authors not shown)
Abstract:
Using 9.0 $\rm fb^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies from 4.178 to 4.278 GeV with the BESIII detector at the BEPCII collider, we perform the first search for the radiative transition $χ_{c1}(3872)\toγψ_2(3823)$. No $χ_{c1}(3872)\toγψ_2(3823)$ signal is observed. The upper limit on the ratio of branching fractions…
▽ More
Using 9.0 $\rm fb^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies from 4.178 to 4.278 GeV with the BESIII detector at the BEPCII collider, we perform the first search for the radiative transition $χ_{c1}(3872)\toγψ_2(3823)$. No $χ_{c1}(3872)\toγψ_2(3823)$ signal is observed. The upper limit on the ratio of branching fractions $\mathcal{B}(χ_{c1}(3872)\toγψ_2(3823), ψ_2(3823)\toγχ_{c1})/\mathcal{B}(χ_{c1}(3872)\toπ^+π^- J/ψ)$ is set as 0.075 at the 90\% confidence level. Our result contradicts theoretical predictions under the assumption that the $χ_{c1}(3872)$ is the pure charmonium state $χ_{c1}(2P)$.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Multi-Object Tracking in the Dark
Authors:
Xinzhe Wang,
Kang Ma,
Qiankun Liu,
Yunhao Zou,
Ying Fu
Abstract:
Low-light scenes are prevalent in real-world applications (e.g. autonomous driving and surveillance at night). Recently, multi-object tracking in various practical use cases have received much attention, but multi-object tracking in dark scenes is rarely considered. In this paper, we focus on multi-object tracking in dark scenes. To address the lack of datasets, we first build a Low-light Multi-Ob…
▽ More
Low-light scenes are prevalent in real-world applications (e.g. autonomous driving and surveillance at night). Recently, multi-object tracking in various practical use cases have received much attention, but multi-object tracking in dark scenes is rarely considered. In this paper, we focus on multi-object tracking in dark scenes. To address the lack of datasets, we first build a Low-light Multi-Object Tracking (LMOT) dataset. LMOT provides well-aligned low-light video pairs captured by our dual-camera system, and high-quality multi-object tracking annotations for all videos. Then, we propose a low-light multi-object tracking method, termed as LTrack. We introduce the adaptive low-pass downsample module to enhance low-frequency components of images outside the sensor noises. The degradation suppression learning strategy enables the model to learn invariant information under noise disturbance and image quality degradation. These components improve the robustness of multi-object tracking in dark scenes. We conducted a comprehensive analysis of our LMOT dataset and proposed LTrack. Experimental results demonstrate the superiority of the proposed method and its competitiveness in real night low-light scenes. Dataset and Code: https: //github.com/ying-fu/LMOT
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Measurement of the ${e}^{+}{e}^{-}\to p \bar{p}π^{0}$ cross section at $\sqrt{s}=2.1000-3.0800$ GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
The process $e^{+}e^{-}\to p\bar{p}π^{0}$ is studied at 20 center-of-mass energies ranging from 2.1000 to 3.0800 GeV using 636.8 pb$^{-1}$ of data collected with the BESIII detector operating at the BEPCII collider. The Born cross sections for $e^{+}e^{-}\to p\bar{p}π^{0}$ are measured with high precision. Since the lowest center-of-mass energy, 2.1000 GeV, is less than 90 MeV above the…
▽ More
The process $e^{+}e^{-}\to p\bar{p}π^{0}$ is studied at 20 center-of-mass energies ranging from 2.1000 to 3.0800 GeV using 636.8 pb$^{-1}$ of data collected with the BESIII detector operating at the BEPCII collider. The Born cross sections for $e^{+}e^{-}\to p\bar{p}π^{0}$ are measured with high precision. Since the lowest center-of-mass energy, 2.1000 GeV, is less than 90 MeV above the $p\bar{p}π^0$ energy threshold, we can probe the threshold behavior for this reaction. However, no anomalous threshold enhancement is found in the cross sections for $e^{+}e^{-}\to p\bar{p}π^{0}$.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Enhancing Deep Knowledge Tracing via Diffusion Models for Personalized Adaptive Learning
Authors:
Ming Kuo,
Shouvon Sarker,
Lijun Qian,
Yujian Fu,
Xiangfang Li,
Xishuang Dong
Abstract:
In contrast to pedagogies like evidence-based teaching, personalized adaptive learning (PAL) distinguishes itself by closely monitoring the progress of individual students and tailoring the learning path to their unique knowledge and requirements. A crucial technique for effective PAL implementation is knowledge tracing, which models students' evolving knowledge to predict their future performance…
▽ More
In contrast to pedagogies like evidence-based teaching, personalized adaptive learning (PAL) distinguishes itself by closely monitoring the progress of individual students and tailoring the learning path to their unique knowledge and requirements. A crucial technique for effective PAL implementation is knowledge tracing, which models students' evolving knowledge to predict their future performance. Based on these predictions, personalized recommendations for resources and learning paths can be made to meet individual needs. Recent advancements in deep learning have successfully enhanced knowledge tracking through Deep Knowledge Tracing (DKT). This paper introduces generative AI models to further enhance DKT. Generative AI models, rooted in deep learning, are trained to generate synthetic data, addressing data scarcity challenges in various applications across fields such as natural language processing (NLP) and computer vision (CV). This study aims to tackle data shortage issues in student learning records to enhance DKT performance for PAL. Specifically, it employs TabDDPM, a diffusion model, to generate synthetic educational records to augment training data for enhancing DKT. The proposed method's effectiveness is validated through extensive experiments on ASSISTments datasets. The experimental results demonstrate that the AI-generated data by TabDDPM significantly improves DKT performance, particularly in scenarios with small data for training and large data for testing.
△ Less
Submitted 24 April, 2024;
originally announced May 2024.
-
Long Context Alignment with Short Instructions and Synthesized Positions
Authors:
Wenhao Wu,
Yizhong Wang,
Yao Fu,
Xiang Yue,
Dawei Zhu,
Sujian Li
Abstract:
Effectively handling instructions with extremely long context remains a challenge for Large Language Models (LLMs), typically necessitating high-quality long data and substantial computational resources. This paper introduces Step-Skip** Alignment (SkipAlign), a new technique designed to enhance the long-context capabilities of LLMs in the phase of alignment without the need for additional effor…
▽ More
Effectively handling instructions with extremely long context remains a challenge for Large Language Models (LLMs), typically necessitating high-quality long data and substantial computational resources. This paper introduces Step-Skip** Alignment (SkipAlign), a new technique designed to enhance the long-context capabilities of LLMs in the phase of alignment without the need for additional efforts beyond training with original data length. SkipAlign is developed on the premise that long-range dependencies are fundamental to enhancing an LLM's capacity of long context. Departing from merely expanding the length of input samples, SkipAlign synthesizes long-range dependencies from the aspect of positions indices. This is achieved by the strategic insertion of skipped positions within instruction-following samples, which utilizes the semantic structure of the data to effectively expand the context. Through extensive experiments on base models with a variety of context window sizes, SkipAlign demonstrates its effectiveness across a spectrum of long-context tasks. Particularly noteworthy is that with a careful selection of the base model and alignment datasets, SkipAlign with only 6B parameters achieves it's best performance and comparable with strong baselines like GPT-3.5-Turbo-16K on LongBench.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose
Authors:
Kaiwen Jiang,
Yang Fu,
Mukund Varma T,
Yash Belhe,
Xiaolong Wang,
Hao Su,
Ravi Ramamoorthi
Abstract:
Novel view synthesis from a sparse set of input images is a challenging problem of great practical interest, especially when camera poses are absent or inaccurate. Direct optimization of camera poses and usage of estimated depths in neural radiance field algorithms usually do not produce good results because of the coupling between poses and depths, and inaccuracies in monocular depth estimation.…
▽ More
Novel view synthesis from a sparse set of input images is a challenging problem of great practical interest, especially when camera poses are absent or inaccurate. Direct optimization of camera poses and usage of estimated depths in neural radiance field algorithms usually do not produce good results because of the coupling between poses and depths, and inaccuracies in monocular depth estimation. In this paper, we leverage the recent 3D Gaussian splatting method to develop a novel construct-and-optimize method for sparse view synthesis without camera poses. Specifically, we construct a solution progressively by using monocular depth and projecting pixels back into the 3D world. During construction, we optimize the solution by detecting 2D correspondences between training views and the corresponding rendered images. We develop a unified differentiable pipeline for camera registration and adjustment of both camera poses and depths, followed by back-projection. We also introduce a novel notion of an expected surface in Gaussian splatting, which is critical to our optimization. These steps enable a coarse solution, which can then be low-pass filtered and refined using standard optimization methods. We demonstrate results on the Tanks and Temples and Static Hikes datasets with as few as three widely-spaced views, showing significantly better quality than competing methods, including those with approximate camera pose information. Moreover, our results improve with more views and outperform previous InstantNGP and Gaussian Splatting algorithms even when using half the dataset. Project page: https://raymondjiangkw.github.io/cogs.github.io/
△ Less
Submitted 10 June, 2024; v1 submitted 6 May, 2024;
originally announced May 2024.
-
A Generalization Theory of Cross-Modality Distillation with Contrastive Learning
Authors:
Hangyu Lin,
Chen Liu,
Chengming Xu,
Zhengqi Gao,
Yanwei Fu,
Yuan Yao
Abstract:
Cross-modality distillation arises as an important topic for data modalities containing limited knowledge such as depth maps and high-quality sketches. Such techniques are of great importance, especially for memory and privacy-restricted scenarios where labeled training data is generally unavailable. To solve the problem, existing label-free methods leverage a few pairwise unlabeled data to distil…
▽ More
Cross-modality distillation arises as an important topic for data modalities containing limited knowledge such as depth maps and high-quality sketches. Such techniques are of great importance, especially for memory and privacy-restricted scenarios where labeled training data is generally unavailable. To solve the problem, existing label-free methods leverage a few pairwise unlabeled data to distill the knowledge by aligning features or statistics between the source and target modalities. For instance, one typically aims to minimize the L2 distance or contrastive loss between the learned features of pairs of samples in the source (e.g. image) and the target (e.g. sketch) modalities. However, most algorithms in this domain only focus on the experimental results but lack theoretical insight. To bridge the gap between the theory and practical method of cross-modality distillation, we first formulate a general framework of cross-modality contrastive distillation (CMCD), built upon contrastive learning that leverages both positive and negative correspondence, towards a better distillation of generalizable features. Furthermore, we establish a thorough convergence analysis that reveals that the distance between source and target modalities significantly impacts the test error on downstream tasks within the target modality which is also validated by the empirical results. Extensive experimental results show that our algorithm outperforms existing algorithms consistently by a margin of 2-3\% across diverse modalities and tasks, covering modalities of image, sketch, depth map, and audio and tasks of recognition and segmentation.
△ Less
Submitted 28 May, 2024; v1 submitted 6 May, 2024;
originally announced May 2024.
-
Distance between two manifolds, topological phase transitions and scaling laws
Authors:
ZhaoXiang Fang,
Ming Gong,
Guang-Can Guo,
Yongxu Fu,
Long Xiong
Abstract:
Topological phases are generally characterized by topological invariants denoted by integer numbers. However, different topological systems often require different topological invariants to measure, such as geometric phases, topological orders, winding numbers, etc. Moreover, geometric phases and its associated definitions usually fail at critical points. Therefore, it's challenging to predict wha…
▽ More
Topological phases are generally characterized by topological invariants denoted by integer numbers. However, different topological systems often require different topological invariants to measure, such as geometric phases, topological orders, winding numbers, etc. Moreover, geometric phases and its associated definitions usually fail at critical points. Therefore, it's challenging to predict what would occur during the transformation between two different topological phases. To address these issues, in this work, we propose a general definition based on fidelity and trace distance from quantum information theory: manifold distance. This definition does not rely on the berry connection of the manifolds but rather on the information of the two manifolds - their ground state wave functions. Thus, it can measure different topological systems (including traditional band topology models, non-Hermitian systems, and topological order models, etc.) and exhibit some universal laws during the transformation between two topological phases. Our research demonstrates that when the properties of two manifolds are identical, the distance and associated higher-order derivatives between them can smoothly transition to each other. However, for two different topological manifolds, the higher-order derivatives exhibit various divergent behaviors near the critical points. For subsequent studies, we expect the method to be generalized to real-space or non-lattice models, in order to facilitate the study of a wider range of physical platforms such as open systems and many-body localization.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Multi-User Multi-Application Packet Scheduling for Application-Specific QoE Enhancement Based on Knowledge-Embedded DDPG in 6G RAN
Authors:
Yongqin Fu,
Xianbin Wang
Abstract:
The rapidly growing diversity of concurrent applications from both different users and same devices calls for application-specific Quality of Experience (QoE) enhancement of future wireless communications. Achieving this goal relies on application-specific packet scheduling, as it is vital for achieving tailored QoE enhancement by realizing the application-specific Quality of Service (QoS) require…
▽ More
The rapidly growing diversity of concurrent applications from both different users and same devices calls for application-specific Quality of Experience (QoE) enhancement of future wireless communications. Achieving this goal relies on application-specific packet scheduling, as it is vital for achieving tailored QoE enhancement by realizing the application-specific Quality of Service (QoS) requirements and for optimal perceived QoE values. However, the intertwining diversified QoE perception mechanisms, fairness among concurrent applications, and the impact of network dynamics inevitably complicate tailored packet scheduling. To achieve concurrent application-specific QoE enhancement, the problem of multi-user multi-application packet scheduling in downlink 6G radio access network (RAN) is first formulated as a Markov decision process (MDP) problem in this paper. For solving this problem, a deep deterministic policy gradient (DDPG)-based solution is proposed. However, due to the high dimensionalities of both the state and action spaces, the trained DDPG agents might generate decisions causing unnecessary resource waste. Hence, a knowledge embedding method is proposed to adjust the decisions of the DDPG agents according to human insights. Extensive experiments are conducted, which demonstrate the superiority of DDPG-based packet schedulers over baseline algorithms and the effectiveness of the proposed knowledge embedding technique.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Stepwise ionization of Mo$^{14+}$ ions in EBIT: The importance of the metastable level
Authors:
Cunqiang Wu,
Xiaobin Ding,
Qi Guo,
Ke Yao,
Jialin Liu,
Yunqing Fu,
Chenzhong Dong
Abstract:
The visible spectrum of Mo$^{15+}$ ions was measured using a high-temperature superconducting electron-beam ion trap at the Shanghai EBIT Laboratory, with an electron beam energy $E_{e}$=400 eV, significantly lower than the ionization potential (IP=544.0 eV) of Mo$^{14+}$ ions in the ground state. To expound on the experiment, the energy level structure, radiative transition properties, electron-i…
▽ More
The visible spectrum of Mo$^{15+}$ ions was measured using a high-temperature superconducting electron-beam ion trap at the Shanghai EBIT Laboratory, with an electron beam energy $E_{e}$=400 eV, significantly lower than the ionization potential (IP=544.0 eV) of Mo$^{14+}$ ions in the ground state. To expound on the experiment, the energy level structure, radiative transition properties, electron-impact excitation, and electron-impact ionization cross section for both the ground state and low-lying excited state of the Mo$^{14+}$ ions were calculated using Dirac-Fock-Slater method with a local central potential and distorted wave approximation. The results demonstrated reasonable agreement with both available experimental and theoretical data. Through an analysis of the related atomic processes of Mo$^{14+}$ ion, a scenario involving the stepwise ionization of the metastable state 3p$^{6}$3d$^{9}$4s was proposed to explain the presence of the Mo$^{15+}$ ions with a lower energy of the incident electron. Finally, the significance of the metastable levels in ionizing Mo$^{14+}$ ions is highlighted.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
DAM: A Universal Dual Attention Mechanism for Multimodal Timeseries Cryptocurrency Trend Forecasting
Authors:
Yihang Fu,
Mingyu Zhou,
Luyao Zhang
Abstract:
In the distributed systems landscape, Blockchain has catalyzed the rise of cryptocurrencies, merging enhanced security and decentralization with significant investment opportunities. Despite their potential, current research on cryptocurrency trend forecasting often falls short by simplistically merging sentiment data without fully considering the nuanced interplay between financial market dynamic…
▽ More
In the distributed systems landscape, Blockchain has catalyzed the rise of cryptocurrencies, merging enhanced security and decentralization with significant investment opportunities. Despite their potential, current research on cryptocurrency trend forecasting often falls short by simplistically merging sentiment data without fully considering the nuanced interplay between financial market dynamics and external sentiment influences. This paper presents a novel Dual Attention Mechanism (DAM) for forecasting cryptocurrency trends using multimodal time-series data. Our approach, which integrates critical cryptocurrency metrics with sentiment data from news and social media analyzed through CryptoBERT, addresses the inherent volatility and prediction challenges in cryptocurrency markets. By combining elements of distributed systems, natural language processing, and financial forecasting, our method outperforms conventional models like LSTM and Transformer by up to 20\% in prediction accuracy. This advancement deepens the understanding of distributed systems and has practical implications in financial markets, benefiting stakeholders in cryptocurrency and blockchain technologies. Moreover, our enhanced forecasting approach can significantly support decentralized science (DeSci) by facilitating strategic planning and the efficient adoption of blockchain technologies, improving operational efficiency and financial risk management in the rapidly evolving digital asset domain, thus ensuring optimal resource allocation.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Particle production from gluon-nucleon interactions in relativistic heavy ion collisions
Authors:
Yong-** Fu,
Fei-Jie Huang,
Qi-Hui Chen
Abstract:
We propose a particle production mechanism analogous to the particle photoproduction processes, arising from the gluon-nucleon interactions in relativistic heavy ion collisions. The comparison is made on the effect of the gluon-nucleon interactions on the photon production in Au+Au collisions at $\sqrt{s_{NN}}=$200 GeV and Pb+Pb collisions at $\sqrt{s_{NN}}=$2.76 TeV. The numerical results indicat…
▽ More
We propose a particle production mechanism analogous to the particle photoproduction processes, arising from the gluon-nucleon interactions in relativistic heavy ion collisions. The comparison is made on the effect of the gluon-nucleon interactions on the photon production in Au+Au collisions at $\sqrt{s_{NN}}=$200 GeV and Pb+Pb collisions at $\sqrt{s_{NN}}=$2.76 TeV. The numerical results indicate that as the collision energy increases, the contribution of gluon-nucleon interactions becomes more prominent.
△ Less
Submitted 22 June, 2024; v1 submitted 1 May, 2024;
originally announced May 2024.
-
New upper bounds on the number of non-zero weights of constacyclic codes
Authors:
Li Chen,
Yuqing Fu,
Hongwei Liu
Abstract:
For any simple-root constacyclic code $\mathcal{C}$ over a finite field $\mathbb{F}_q$, as far as we know, the group $\mathcal{G}$ generated by the multiplier, the constacyclic shift and the scalar multiplications is the largest subgroup of the automorphism group ${\rm Aut}(\mathcal{C})$ of $\mathcal{C}$. In this paper, by calculating the number of $\mathcal{G}$-orbits of…
▽ More
For any simple-root constacyclic code $\mathcal{C}$ over a finite field $\mathbb{F}_q$, as far as we know, the group $\mathcal{G}$ generated by the multiplier, the constacyclic shift and the scalar multiplications is the largest subgroup of the automorphism group ${\rm Aut}(\mathcal{C})$ of $\mathcal{C}$. In this paper, by calculating the number of $\mathcal{G}$-orbits of $\mathcal{C}\backslash\{\bf 0\}$, we give an explicit upper bound on the number of non-zero weights of $\mathcal{C}$ and present a necessary and sufficient condition for $\mathcal{C}$ to meet the upper bound. Some examples in this paper show that our upper bound is tight and better than the upper bounds in [Zhang and Cao, FFA, 2024]. In particular, our main results provide a new method to construct few-weight constacyclic codes. Furthermore, for the constacyclic code $\mathcal{C}$ belonging to two special types, we obtain a smaller upper bound on the number of non-zero weights of $\mathcal{C}$ by substituting $\mathcal{G}$ with a larger subgroup of ${\rm Aut}(\mathcal{C})$. The results derived in this paper generalize the main results in [Chen, Fu and Liu, IEEE-TIT, 2024]}.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Constrained Decoding for Secure Code Generation
Authors:
Yanjun Fu,
Ethan Baker,
Yu Ding,
Yizheng Chen
Abstract:
Code Large Language Models (Code LLMs) have been increasingly used by developers to boost productivity, but they often generate vulnerable code. Thus, there is an urgent need to ensure that code generated by Code LLMs is correct and secure. Previous research has primarily focused on generating secure code, overlooking the fact that secure code also needs to be correct. This oversight can lead to a…
▽ More
Code Large Language Models (Code LLMs) have been increasingly used by developers to boost productivity, but they often generate vulnerable code. Thus, there is an urgent need to ensure that code generated by Code LLMs is correct and secure. Previous research has primarily focused on generating secure code, overlooking the fact that secure code also needs to be correct. This oversight can lead to a false sense of security. Currently, the community lacks a method to measure actual progress in this area, and we need solutions that address both security and correctness of code generation.
This paper introduces a new benchmark, CodeGuard+, along with two new metrics, to measure Code LLMs' ability to generate both secure and correct code. Using our new evaluation methods, we show that the state-of-the-art defense technique, prefix tuning, may not be as strong as previously believed, since it generates secure code but sacrifices functional correctness. We also demonstrate that different decoding methods significantly affect the security of Code LLMs.
Furthermore, we explore a new defense direction: constrained decoding for secure code generation. We propose new constrained decoding techniques to generate secure code. Our results reveal that constrained decoding is more effective than prefix tuning to improve the security of Code LLMs, without requiring a specialized training dataset. Moreover, our evaluations over eight state-of-the-art Code LLMs show that constrained decoding has strong performance to improve the security of Code LLMs, and our technique outperforms GPT-4.
△ Less
Submitted 7 June, 2024; v1 submitted 30 April, 2024;
originally announced May 2024.
-
Superresolution imaging of two incoherent optical sources with unequal brightnesses
Authors:
Jian-Dong Zhang,
Yiwen Fu,
Lili Hou,
Shuai Wang
Abstract:
Resolving the separation between two incoherent optical sources with high precision is of great significance for fluorescence imaging and astronomical observations. In this paper, we focus on a more general scenario where two sources have unequal brightnesses. We give the ultimate precision limit with respect to separation by using the quantum Fisher information. Through the calculation of the cla…
▽ More
Resolving the separation between two incoherent optical sources with high precision is of great significance for fluorescence imaging and astronomical observations. In this paper, we focus on a more general scenario where two sources have unequal brightnesses. We give the ultimate precision limit with respect to separation by using the quantum Fisher information. Through the calculation of the classical Fisher information, we analyze and compare several specific measurement schemes including direct measurement, Gaussian mode measurement and zero-photon measurement. The results indicate that Gaussian mode measurement is the nearly optimal for a small separation. Our work provides a positive complement to the aspect of superresolution imaging of incoherent sources.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Neuro-Symbolic Embedding for Short and Effective Feature Selection via Autoregressive Generation
Authors:
Nanxu Gong,
Wangyang Ying,
Dongjie Wang,
Yanjie Fu
Abstract:
Feature selection aims to identify the optimal feature subset for enhancing downstream models. Effective feature selection can remove redundant features, save computational resources, accelerate the model learning process, and improve the model overall performance. However, existing works are often time-intensive to identify the effective feature subset within high-dimensional feature spaces. Mean…
▽ More
Feature selection aims to identify the optimal feature subset for enhancing downstream models. Effective feature selection can remove redundant features, save computational resources, accelerate the model learning process, and improve the model overall performance. However, existing works are often time-intensive to identify the effective feature subset within high-dimensional feature spaces. Meanwhile, these methods mainly utilize a single downstream task performance as the selection criterion, leading to the selected subsets that are not only redundant but also lack generalizability. To bridge these gaps, we reformulate feature selection through a neuro-symbolic lens and introduce a novel generative framework aimed at identifying short and effective feature subsets. More specifically, we found that feature ID tokens of the selected subset can be formulated as symbols to reflect the intricate correlations among features. Thus, in this framework, we first create a data collector to automatically collect numerous feature selection samples consisting of feature ID tokens, model performance, and the measurement of feature subset redundancy. Building on the collected data, an encoder-decoder-evaluator learning paradigm is developed to preserve the intelligence of feature selection into a continuous embedding space for efficient search. Within the learned embedding space, we leverage a multi-gradient search algorithm to find more robust and generalized embeddings with the objective of improving model performance and reducing feature subset redundancy. These embeddings are then utilized to reconstruct the feature ID tokens for executing the final feature selection. Ultimately, comprehensive experiments and case studies are conducted to validate the effectiveness of the proposed framework.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Effective Unsupervised Constrained Text Generation based on Perturbed Masking
Authors:
Yingwen Fu,
Wenjie Ou,
Zhou Yu,
Yue Lin
Abstract:
Unsupervised constrained text generation aims to generate text under a given set of constraints without any supervised data. Current state-of-the-art methods stochastically sample edit positions and actions, which may cause unnecessary search steps. In this paper, we propose PMCTG to improve effectiveness by searching for the best edit position and action in each step. Specifically, PMCTG extends…
▽ More
Unsupervised constrained text generation aims to generate text under a given set of constraints without any supervised data. Current state-of-the-art methods stochastically sample edit positions and actions, which may cause unnecessary search steps. In this paper, we propose PMCTG to improve effectiveness by searching for the best edit position and action in each step. Specifically, PMCTG extends perturbed masking technique to effectively search for the most incongruent token to edit. Then it introduces four multi-aspect scoring functions to select edit action to further reduce search difficulty. Since PMCTG does not require supervised data, it could be applied to different generation tasks. We show that under the unsupervised setting, PMCTG achieves new state-of-the-art results in two representative tasks, namely keywords-to-sentence generation and paraphrasing.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Parts-per-billion Trace Element Detection in Anhydrous Minerals by Micro-scale Quantitative NMR
Authors:
Yunhua Fu,
Renbiao Tao,
Lifei Zhang,
Shijie Li,
Ya-Nan Yang,
Dehan Shen,
Zilong Wang,
Thomas Meier
Abstract:
Nominally anhydrous minerals (NAMs) composing Earth's and planetary rocks incorporate microscopic amounts of volatiles. However, volatile distribution in NAMs and their effect on physical properties of rocks remain controversial. Thus, constraining trace volatile concentrations in NAMs is tantamount to our understanding of the evolution of rocky planets and planetesimals. Here, we present a novel…
▽ More
Nominally anhydrous minerals (NAMs) composing Earth's and planetary rocks incorporate microscopic amounts of volatiles. However, volatile distribution in NAMs and their effect on physical properties of rocks remain controversial. Thus, constraining trace volatile concentrations in NAMs is tantamount to our understanding of the evolution of rocky planets and planetesimals. Here, we present a novel approach of trace-element quantification using micro-scale Nuclear Magnetic Resonance (NMR) spectroscopy. This approach employs the principle of enhanced mass-sensitivity in NMR microcoils formerly used in \textit{in-situ} high pressure experiments. We were able to demonstrate that this method is in excellent agreement with standard methods across their respective detection capabilities. We show that by simultaneous detection of internal reference nuclei, the quantification sensitivity can be substantially increased, leading to quantifiable trace volatile element amounts of about $50$ wt-ppb measured in a micro-meter sized single anorthitic mineral grain, greatly enhancing detection capabilities of volatiles in geologically important systems.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
A Dynamic Kernel Prior Model for Unsupervised Blind Image Super-Resolution
Authors:
Zhixiong Yang,
**gyuan Xia,
Shengxi Li,
Xinghua Huang,
Shuanghui Zhang,
Zhen Liu,
Yaowen Fu,
Yongxiang Liu
Abstract:
Deep learning-based methods have achieved significant successes on solving the blind super-resolution (BSR) problem. However, most of them request supervised pre-training on labelled datasets. This paper proposes an unsupervised kernel estimation model, named dynamic kernel prior (DKP), to realize an unsupervised and pre-training-free learning-based algorithm for solving the BSR problem. DKP can a…
▽ More
Deep learning-based methods have achieved significant successes on solving the blind super-resolution (BSR) problem. However, most of them request supervised pre-training on labelled datasets. This paper proposes an unsupervised kernel estimation model, named dynamic kernel prior (DKP), to realize an unsupervised and pre-training-free learning-based algorithm for solving the BSR problem. DKP can adaptively learn dynamic kernel priors to realize real-time kernel estimation, and thereby enables superior HR image restoration performances. This is achieved by a Markov chain Monte Carlo sampling process on random kernel distributions. The learned kernel prior is then assigned to optimize a blur kernel estimation network, which entails a network-based Langevin dynamic optimization strategy. These two techniques ensure the accuracy of the kernel estimation. DKP can be easily used to replace the kernel estimation models in the existing methods, such as Double-DIP and FKP-DIP, or be added to the off-the-shelf image restoration model, such as diffusion model. In this paper, we incorporate our DKP model with DIP and diffusion model, referring to DIP-DKP and Diff-DKP, for validations. Extensive simulations on Gaussian and motion kernel scenarios demonstrate that the proposed DKP model can significantly improve the kernel estimation with comparable runtime and memory usage, leading to state-of-the-art BSR results. The code is available at https://github.com/XYLGroup/DKP.
△ Less
Submitted 25 April, 2024; v1 submitted 23 April, 2024;
originally announced April 2024.
-
Retrieval Head Mechanistically Explains Long-Context Factuality
Authors:
Wenhao Wu,
Yizhong Wang,
Guangxuan Xiao,
Hao Peng,
Yao Fu
Abstract:
Despite the recent progress in long-context language models, it remains elusive how transformer-based models exhibit the capability to retrieve relevant information from arbitrary locations within the long context. This paper aims to address this question. Our systematic investigation across a wide spectrum of models reveals that a special type of attention heads are largely responsible for retrie…
▽ More
Despite the recent progress in long-context language models, it remains elusive how transformer-based models exhibit the capability to retrieve relevant information from arbitrary locations within the long context. This paper aims to address this question. Our systematic investigation across a wide spectrum of models reveals that a special type of attention heads are largely responsible for retrieving information, which we dub retrieval heads. We identify intriguing properties of retrieval heads:(1) universal: all the explored models with long-context capability have a set of retrieval heads; (2) sparse: only a small portion (less than 5\%) of the attention heads are retrieval. (3) intrinsic: retrieval heads already exist in models pretrained with short context. When extending the context length by continual pretraining, it is still the same set of heads that perform information retrieval. (4) dynamically activated: take Llama-2 7B for example, 12 retrieval heads always attend to the required information no matter how the context is changed. The rest of the retrieval heads are activated in different contexts. (5) causal: completely pruning retrieval heads leads to failure in retrieving relevant information and results in hallucination, while pruning random non-retrieval heads does not affect the model's retrieval ability. We further show that retrieval heads strongly influence chain-of-thought (CoT) reasoning, where the model needs to frequently refer back the question and previously-generated context. Conversely, tasks where the model directly generates the answer using its intrinsic knowledge are less impacted by masking out retrieval heads. These observations collectively explain which internal part of the model seeks information from the input tokens. We believe our insights will foster future research on reducing hallucination, improving reasoning, and compressing the KV cache.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Improved Algorithm for Reachability in $d$-VASS
Authors:
Yuxi Fu,
Qizhe Yang,
Yangluo Zheng
Abstract:
An $\mathsf{F}_{d}$ upper bound for the reachability problem in vector addition systems with states (VASS) in fixed dimension is given, where $\mathsf{F}_d$ is the $d$-th level of the Grzegorczyk hierarchy of complexity classes. The new algorithm combines the idea of the linear path scheme characterization of the reachability in the $2$-dimension VASSes with the general decomposition algorithm by…
▽ More
An $\mathsf{F}_{d}$ upper bound for the reachability problem in vector addition systems with states (VASS) in fixed dimension is given, where $\mathsf{F}_d$ is the $d$-th level of the Grzegorczyk hierarchy of complexity classes. The new algorithm combines the idea of the linear path scheme characterization of the reachability in the $2$-dimension VASSes with the general decomposition algorithm by Mayr, Kosaraju and Lambert. The result improves the $\mathsf{F}_{d + 4}$ upper bound due to Leroux and Schmitz (LICS 2019).
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
1st Place Solution to the 1st SkatingVerse Challenge
Authors:
Tao Sun,
Yuanzi Fu,
Kaicheng Yang,
Jian Wu,
Ziyong Feng
Abstract:
This paper presents the winning solution for the 1st SkatingVerse Challenge. We propose a method that involves several steps. To begin, we leverage the DINO framework to extract the Region of Interest (ROI) and perform precise crop** of the raw video footage. Subsequently, we employ three distinct models, namely Unmasked Teacher, UniformerV2, and InfoGCN, to capture different aspects of the data…
▽ More
This paper presents the winning solution for the 1st SkatingVerse Challenge. We propose a method that involves several steps. To begin, we leverage the DINO framework to extract the Region of Interest (ROI) and perform precise crop** of the raw video footage. Subsequently, we employ three distinct models, namely Unmasked Teacher, UniformerV2, and InfoGCN, to capture different aspects of the data. By ensembling the prediction results based on logits, our solution attains an impressive leaderboard score of 95.73%.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Study of $e^+e^-\toωX(3872)$ and $γX(3872)$ from 4.66 to 4.95 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (634 additional authors not shown)
Abstract:
Using data samples with an integrated luminosity of $4.5~\text{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies ranging from 4.66 to 4.95 GeV, we study the processes of $e^+e^-\toωX(3872)$ and $e^+e^-\toγX(3872)$. With the $e^+e^-\toωX(3872)$ process, the branching fraction ratio $R\equiv\frac{\mathcal{B}(X(3872)\toγJ/ψ)}{\mathcal{B}(X(3872)\toπ^+π^- J/ψ)}$ is measured to be…
▽ More
Using data samples with an integrated luminosity of $4.5~\text{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies ranging from 4.66 to 4.95 GeV, we study the processes of $e^+e^-\toωX(3872)$ and $e^+e^-\toγX(3872)$. With the $e^+e^-\toωX(3872)$ process, the branching fraction ratio $R\equiv\frac{\mathcal{B}(X(3872)\toγJ/ψ)}{\mathcal{B}(X(3872)\toπ^+π^- J/ψ)}$ is measured to be $0.38\pm0.20_\text{stat.}\pm0.01_\text{syst.}$ ($R< 0.83$ at 90\% confidence level). In addition, we measure the ratio of the average cross section of $e^+e^-\toωX(3872)$ to $e^+e^-\toωχ_{c1}(ωχ_{c2})$ to be $σ_{ωX(3872)}/σ_{ωχ_{c1}}~(σ_{ωX(3872)}/σ_{ωχ_{c2}})=5.2\pm1.0_\text{stat.}\pm1.9_\text{syst.}~ (5.5\pm1.1_\text{stat.}\pm2.4_\text{syst.})$. Finally, we search for the process of $e^+e^-\toγX(3872)$, and no obvious signal is observed. The upper limit on the ratio of the average cross section of $e^+e^-\toγX(3872)$ to $e^+e^-\toωX(3872)$ is set as $σ_{γX(3872)}/σ_{ωX(3872)}<0.23$ at 90\% confidence level.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
Wrinkling instability of 3D auxetic bilayers in tension
Authors:
Sairam Pamulaparthi Venkata,
Yuxin Fu,
Yibin Fu,
Valentina Balbi,
Michel Destrade
Abstract:
Bilayers (soft substrates coated with stiff films) are commonly found in nature with examples including skin tissue, vesicles, or organ membranes. They exhibit various types of instabilities when subjected to compression, depending on the contrast in material properties between the two components. We present wrinkling instabilities for 3D hyperelastic bilayer systems, including auxetics (materials…
▽ More
Bilayers (soft substrates coated with stiff films) are commonly found in nature with examples including skin tissue, vesicles, or organ membranes. They exhibit various types of instabilities when subjected to compression, depending on the contrast in material properties between the two components. We present wrinkling instabilities for 3D hyperelastic bilayer systems, including auxetics (materials with negative Poisson's ratio), under uni-axial tension. In tension, a soft bilayer can experience large lateral contraction, and we find that with an adequate contrast in the Poisson ratios, compressive stresses may develop and generate wrinkles aligned with the tensile direction. We rely on an analytic modelling of the phenomenon, and validate it with a user-defined Python script with periodic boundary conditions and constitutive relation implementation in advanced Finite Element simulations. Our findings reveal that wrinkles are observed when the Poisson ratio of the substrate is greater than that of the film. As the two Poisson ratios converge to a common value, the critical stretch of instability shoots up rapidly, and the wrinkling disappears. We also confirm these results by asymptotic analysis. This wrinkling analysis has significant potential in controlling surface patterns of auxetic skin grafts and hydrogel organ patches under mechanical loads. Moreover, the asymptotic expressions in this work can be used under finite strain for buckling-based metrology applications.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Transport scaling in porous media convection
Authors:
Xiaojue Zhu,
Yifeng Fu,
Marco De Paoli
Abstract:
We present a theory to describe the Nusselt number ($Nu$), corresponding to the heat or mass flux, as a function of the Rayleigh--Darcy number ($Ra$), the ratio of buoyant driving force over diffusive dissipation, in convective porous media flows. First, we derive exact relationships within the system for the kinetic energy and the thermal dissipation rate. Second, by segregating the thermal dissi…
▽ More
We present a theory to describe the Nusselt number ($Nu$), corresponding to the heat or mass flux, as a function of the Rayleigh--Darcy number ($Ra$), the ratio of buoyant driving force over diffusive dissipation, in convective porous media flows. First, we derive exact relationships within the system for the kinetic energy and the thermal dissipation rate. Second, by segregating the thermal dissipation rate into contributions from the boundary layer and the bulk, which is inspired by the ideas of the Grossmann and Lohse theory (J. Fluid Mech., vol. 407, 2000; Phys. Rev. Lett., vol. 86, 2001), we derive the scaling relation for $Nu$ as a function of $Ra$ and provide a robust theoretical explanation to the empirical relations proposed in previous studies. Specifically, by incorporating the length scale of the flow structure into the theory, we demonstrate why heat or mass transport differs between two-dimensional and three-dimensional porous media convection. Our model is in excellent agreement with the data obtained from numerical simulations, affirming its validity and predictive capabilities.
△ Less
Submitted 24 May, 2024; v1 submitted 18 April, 2024;
originally announced April 2024.