-
Generalized all-optical complex exponential operator
Authors:
Baiqiao Chen,
Qi Jia,
Rui Feng,
Fangkui Sun,
Yongyin Cao,
Jian Wang,
Weiqiang Ding
Abstract:
Euler's formula, an extraordinary mathematical formula, establishes a vital link between complex-valued operations and trigonometric functions, finding widespread application in various fields. With the end of Moore's Law, electronic computing methods are encountering developmental bottlenecks. With its enviable potential, optical computing has successfully achieved high-speed operation of designe…
▽ More
Euler's formula, an extraordinary mathematical formula, establishes a vital link between complex-valued operations and trigonometric functions, finding widespread application in various fields. With the end of Moore's Law, electronic computing methods are encountering developmental bottlenecks. With its enviable potential, optical computing has successfully achieved high-speed operation of designed complex numbers. However, the challenge of processing and manipulating arbitrary complex numbers persists. This study introduces a generalized complex exponential operator (GCEO), utilizing a diffractive optical neural network (DONN) for the computation of the complex exponential through Euler's formula. Experiments validate a series of complex exponential calculations using the GCEO. The GCEO has demonstrated generalizability and can compute inputs of any precision within an appropriate error margin. The proposed operator highlights the immense potential of DONN in optical computation and is poised to significantly contribute to the development of computational methods for optoelectronic integration.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
A Declarative System for Optimizing AI Workloads
Authors:
Chunwei Liu,
Matthew Russo,
Michael Cafarella,
Lei Cao,
Peter Baille Chen,
Zui Chen,
Michael Franklin,
Tim Kraska,
Samuel Madden,
Gerardo Vitagliano
Abstract:
A long-standing goal of data management systems has been to build systems which can compute quantitative insights over large corpora of unstructured data in a cost-effective manner. Until recently, it was difficult and expensive to extract facts from company documents, data from scientific papers, or metrics from image and video corpora. Today's models can accomplish these tasks with high accuracy…
▽ More
A long-standing goal of data management systems has been to build systems which can compute quantitative insights over large corpora of unstructured data in a cost-effective manner. Until recently, it was difficult and expensive to extract facts from company documents, data from scientific papers, or metrics from image and video corpora. Today's models can accomplish these tasks with high accuracy. However, a programmer who wants to answer a substantive AI-powered query must orchestrate large numbers of models, prompts, and data operations. For even a single query, the programmer has to make a vast number of decisions such as the choice of model, the right inference method, the most cost-effective inference hardware, the ideal prompt design, and so on. The optimal set of decisions can change as the query changes and as the rapidly-evolving technical landscape shifts. In this paper we present Palimpzest, a system that enables anyone to process AI-powered analytical queries simply by defining them in a declarative language. The system uses its cost optimization framework to implement the query plan with the best trade-offs between runtime, financial cost, and output data quality. We describe the workload of AI-powered analytics tasks, the optimization methods that Palimpzest uses, and the prototype system itself. We evaluate Palimpzest on tasks in Legal Discovery, Real Estate Search, and Medical Schema Matching. We show that even our simple prototype offers a range of appealing plans, including one that is 3.3x faster and 2.9x cheaper than the baseline method, while also offering better data quality. With parallelism enabled, Palimpzest can produce plans with up to a 90.3x speedup at 9.1x lower cost relative to a single-threaded GPT-4 baseline, while obtaining an F1-score within 83.5% of the baseline. These require no additional work by the user.
△ Less
Submitted 29 May, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
RaFe: Ranking Feedback Improves Query Rewriting for RAG
Authors:
Shengyu Mao,
Yong Jiang,
Boli Chen,
Xiao Li,
Peng Wang,
Xinyu Wang,
Pengjun Xie,
Fei Huang,
Huajun Chen,
Ningyu Zhang
Abstract:
As Large Language Models (LLMs) and Retrieval Augmentation Generation (RAG) techniques have evolved, query rewriting has been widely incorporated into the RAG system for downstream tasks like open-domain QA. Many works have attempted to utilize small models with reinforcement learning rather than costly LLMs to improve query rewriting. However, current methods require annotations (e.g., labeled re…
▽ More
As Large Language Models (LLMs) and Retrieval Augmentation Generation (RAG) techniques have evolved, query rewriting has been widely incorporated into the RAG system for downstream tasks like open-domain QA. Many works have attempted to utilize small models with reinforcement learning rather than costly LLMs to improve query rewriting. However, current methods require annotations (e.g., labeled relevant documents or downstream answers) or predesigned rewards for feedback, which lack generalization, and fail to utilize signals tailored for query rewriting. In this paper, we propose ours, a framework for training query rewriting models free of annotations. By leveraging a publicly available reranker, ours~provides feedback aligned well with the rewriting objectives. Experimental results demonstrate that ours~can obtain better performance than baselines.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
One-shot Training for Video Object Segmentation
Authors:
Baiyu Chen,
Sixian Chan,
Xiaoqin Zhang
Abstract:
Video Object Segmentation (VOS) aims to track objects across frames in a video and segment them based on the initial annotated frame of the target objects. Previous VOS works typically rely on fully annotated videos for training. However, acquiring fully annotated training videos for VOS is labor-intensive and time-consuming. Meanwhile, self-supervised VOS methods have attempted to build VOS syste…
▽ More
Video Object Segmentation (VOS) aims to track objects across frames in a video and segment them based on the initial annotated frame of the target objects. Previous VOS works typically rely on fully annotated videos for training. However, acquiring fully annotated training videos for VOS is labor-intensive and time-consuming. Meanwhile, self-supervised VOS methods have attempted to build VOS systems through correspondence learning and label propagation. Still, the absence of mask priors harms their robustness to complex scenarios, and the label propagation paradigm makes them impractical in terms of efficiency. To address these issues, we propose, for the first time, a general one-shot training framework for VOS, requiring only a single labeled frame per training video and applicable to a majority of state-of-the-art VOS networks. Specifically, our algorithm consists of: i) Inferring object masks time-forward based on the initial labeled frame. ii) Reconstructing the initial object mask time-backward using the masks from step i). Through this bi-directional training, a satisfactory VOS network can be obtained. Notably, our approach is extremely simple and can be employed end-to-end. Finally, our approach uses a single labeled frame of YouTube-VOS and DAVIS datasets to achieve comparable results to those trained on fully labeled datasets. The code will be released.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
GMMFormer v2: An Uncertainty-aware Framework for Partially Relevant Video Retrieval
Authors:
Yuting Wang,
**peng Wang,
Bin Chen,
Tao Dai,
Ruisheng Luo,
Shu-Tao Xia
Abstract:
Given a text query, partially relevant video retrieval (PRVR) aims to retrieve untrimmed videos containing relevant moments. Due to the lack of moment annotations, the uncertainty lying in clip modeling and text-clip correspondence leads to major challenges. Despite the great progress, existing solutions either sacrifice efficiency or efficacy to capture varying and uncertain video moments. What's…
▽ More
Given a text query, partially relevant video retrieval (PRVR) aims to retrieve untrimmed videos containing relevant moments. Due to the lack of moment annotations, the uncertainty lying in clip modeling and text-clip correspondence leads to major challenges. Despite the great progress, existing solutions either sacrifice efficiency or efficacy to capture varying and uncertain video moments. What's worse, few methods have paid attention to the text-clip matching pattern under such uncertainty, exposing the risk of semantic collapse. To address these issues, we present GMMFormer v2, an uncertainty-aware framework for PRVR. For clip modeling, we improve a strong baseline GMMFormer with a novel temporal consolidation module upon multi-scale contextual features, which maintains efficiency and improves the perception for varying moments. To achieve uncertainty-aware text-clip matching, we upgrade the query diverse loss in GMMFormer to facilitate fine-grained uniformity and propose a novel optimal matching loss for fine-grained text-clip alignment. Their collaboration alleviates the semantic collapse phenomenon and neatly promotes accurate correspondence between texts and moments. We conduct extensive experiments and ablation studies on three PRVR benchmarks, demonstrating remarkable improvement of GMMFormer v2 compared to the past SOTA competitor and the versatility of uncertainty-aware text-clip matching for PRVR. Code is available at \url{https://github.com/huangmozhi9527/GMMFormer_v2}.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Task-agnostic Decision Transformer for Multi-type Agent Control with Federated Split Training
Authors:
Zhiyuan Wang,
Bokui Chen,
Xiaoyang Qu,
Zhenhou Hong,
**g Xiao,
Jianzong Wang
Abstract:
With the rapid advancements in artificial intelligence, the development of knowledgeable and personalized agents has become increasingly prevalent. However, the inherent variability in state variables and action spaces among personalized agents poses significant aggregation challenges for traditional federated learning algorithms. To tackle these challenges, we introduce the Federated Split Decisi…
▽ More
With the rapid advancements in artificial intelligence, the development of knowledgeable and personalized agents has become increasingly prevalent. However, the inherent variability in state variables and action spaces among personalized agents poses significant aggregation challenges for traditional federated learning algorithms. To tackle these challenges, we introduce the Federated Split Decision Transformer (FSDT), an innovative framework designed explicitly for AI agent decision tasks. The FSDT framework excels at navigating the intricacies of personalized agents by harnessing distributed data for training while preserving data privacy. It employs a two-stage training process, with local embedding and prediction models on client agents and a global transformer decoder model on the server. Our comprehensive evaluation using the benchmark D4RL dataset highlights the superior performance of our algorithm in federated split learning for personalized agents, coupled with significant reductions in communication and computational overhead compared to traditional centralized training approaches. The FSDT framework demonstrates strong potential for enabling efficient and privacy-preserving collaborative learning in applications such as autonomous driving decision systems. Our findings underscore the efficacy of the FSDT framework in effectively leveraging distributed offline reinforcement learning data to enable powerful multi-type agent decision systems.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
AdpQ: A Zero-shot Calibration Free Adaptive Post Training Quantization Method for LLMs
Authors:
Alireza Ghaffari,
Sharareh Younesian,
Vahid Partovi Nia,
Boxing Chen,
Masoud Asgharian
Abstract:
The ever-growing computational complexity of Large Language Models (LLMs) necessitates efficient deployment strategies. The current state-of-the-art approaches for Post-training Quantization (PTQ) often require calibration to achieve the desired accuracy. This paper presents AdpQ, a novel zero-shot adaptive PTQ method for LLMs that achieves the state-of-the-art performance in low-precision quantiz…
▽ More
The ever-growing computational complexity of Large Language Models (LLMs) necessitates efficient deployment strategies. The current state-of-the-art approaches for Post-training Quantization (PTQ) often require calibration to achieve the desired accuracy. This paper presents AdpQ, a novel zero-shot adaptive PTQ method for LLMs that achieves the state-of-the-art performance in low-precision quantization (e.g. 3-bit) without requiring any calibration data. Inspired by Adaptive LASSO regression model, our proposed approach tackles the challenge of outlier activations by separating salient weights using an adaptive soft-thresholding method. Guided by Adaptive LASSO, this method ensures that the quantized weights distribution closely follows the originally trained weights and eliminates the need for calibration data entirely, setting our method apart from popular approaches such as SpQR and AWQ. Furthermore, our method offers an additional benefit in terms of privacy preservation by eliminating any calibration or training data. We also delve deeper into the information-theoretic underpinnings of the proposed method. We demonstrate that it leverages the Adaptive LASSO to minimize the Kullback-Leibler divergence between the quantized weights and the originally trained weights. This minimization ensures the quantized model retains the Shannon information content of the original model to a great extent, guaranteeing efficient deployment without sacrificing accuracy or information. Our results achieve the same accuracy as the existing methods on various LLM benchmarks while the quantization time is reduced by at least 10x, solidifying our contribution to efficient and privacy-preserving LLM deployment.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Study of the decays $χ_{cJ}\toΛ\barΛω$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using $(27.12\pm 0.14)\times10^{8}$ $ψ(3686)$ events collected with the BESIII detector, we present the first observation of the decays $χ_{cJ}\toΛ\barΛω$, where $J=0, 1, 2$, with statistical significances of $11.7 σ, 11.2 σ$, and $11.8 σ$. The branching fractions of these decays are determined to be $\mathcal{B}(χ_{c0}\toΛ\barΛω)=({2.37 \pm 0.22 \pm 0.23}) \times 10^{-4}$,…
▽ More
Using $(27.12\pm 0.14)\times10^{8}$ $ψ(3686)$ events collected with the BESIII detector, we present the first observation of the decays $χ_{cJ}\toΛ\barΛω$, where $J=0, 1, 2$, with statistical significances of $11.7 σ, 11.2 σ$, and $11.8 σ$. The branching fractions of these decays are determined to be $\mathcal{B}(χ_{c0}\toΛ\barΛω)=({2.37 \pm 0.22 \pm 0.23}) \times 10^{-4}$, $\mathcal{B}(χ_{c1}\toΛ\barΛω)=({1.01 \pm 0.10 \pm 0.11}) \times 10^{-4}$, and $\mathcal{B}(χ_{c2}\toΛ\barΛω)=({1.40 \pm 0.13 \pm 0.17}) \times 10^{-4}$, where the first uncertainties are statistical and the second are systematic. We observe no clear intermediate structures.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Precision measurement of the branching fraction of \boldmath $J/ψ\rightarrow K^+K^-$ via $ψ(2S)\rightarrow π^+π^-J/ψ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (604 additional authors not shown)
Abstract:
Using a sample of $448.1 \times 10^6$ $ψ(2S)$ events collected with the BESIII detector, we perform a study of the decay $J/ψ\rightarrow K^+K^-$ via $ψ(2S)\rightarrow π^+π^-J/ψ$.
The branching fraction of $J/ψ\rightarrow K^+K^-$ is determined to be $\mathcal{B}_{K^+K^-}=(3.072\pm 0.023({\rm stat.})\pm 0.050({\rm syst.}))\times 10^{-4}$, which is consistent with previous measurements but with sig…
▽ More
Using a sample of $448.1 \times 10^6$ $ψ(2S)$ events collected with the BESIII detector, we perform a study of the decay $J/ψ\rightarrow K^+K^-$ via $ψ(2S)\rightarrow π^+π^-J/ψ$.
The branching fraction of $J/ψ\rightarrow K^+K^-$ is determined to be $\mathcal{B}_{K^+K^-}=(3.072\pm 0.023({\rm stat.})\pm 0.050({\rm syst.}))\times 10^{-4}$, which is consistent with previous measurements but with significantly improved precision.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Improved measurement of the branching fraction of $h_{c}\rightarrowγη^\prime/η$ and search for $h_{c}\rightarrowγπ^0$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (645 additional authors not shown)
Abstract:
The processes $h_c\rightarrowγP(P = η^\prime,~η,~π^{0}))$ are studied with a sample of $(27.12\pm0.14)\times10^{8}$ $ψ(3686)$ events collected by the BESIII detector at the BEPCII collider. The branching fractions of $h_c\rightarrowγη^\prime$ and $h_c\rightarrowγη$ are measured to be $(1.40\pm0.11\pm0.04\pm0.10)\times10^{-3}$ and $(3.77\pm0.55\pm0.13\pm0.26)\times10^{-4}$, respectively, where the…
▽ More
The processes $h_c\rightarrowγP(P = η^\prime,~η,~π^{0}))$ are studied with a sample of $(27.12\pm0.14)\times10^{8}$ $ψ(3686)$ events collected by the BESIII detector at the BEPCII collider. The branching fractions of $h_c\rightarrowγη^\prime$ and $h_c\rightarrowγη$ are measured to be $(1.40\pm0.11\pm0.04\pm0.10)\times10^{-3}$ and $(3.77\pm0.55\pm0.13\pm0.26)\times10^{-4}$, respectively, where the first uncertainties are statistical, the second systematic, and the third from the branching fraction of $ψ(3686)\rightarrowπ^{0}h_c$. The ratio $R_{h_c}=\frac{\mathscr{B}(h_c\rightarrowγη)}{\mathscr{B}(h_c\rightarrowγη^\prime)}$ is calculated to be $(27.0\pm4.4\pm1.0)\%$. The measurements are consistent with the previous results with improved precision by a factor of 2. The results are valuable for gaining a deeper understanding of $η-η^\prime$ mixing, and its manifestation within quantum chromodynamics. No significant signal is found for the decay $h_c\rightarrowγπ^{0}$, and an upper limit is placed on its branching fraction of $\mathscr{B}(h_c\rightarrowγπ^{0})<5.0\times10^{-5}$, at the 90\% confidence level.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
A Multi-Perspective Analysis of Memorization in Large Language Models
Authors:
Bowen Chen,
Namgi Han,
Yusuke Miyao
Abstract:
Large Language Models (LLMs), trained on massive corpora with billions of parameters, show unprecedented performance in various fields. Though surprised by their excellent performances, researchers also noticed some special behaviors of those LLMs. One of those behaviors is memorization, in which LLMs can generate the same content used to train them. Though previous research has discussed memoriza…
▽ More
Large Language Models (LLMs), trained on massive corpora with billions of parameters, show unprecedented performance in various fields. Though surprised by their excellent performances, researchers also noticed some special behaviors of those LLMs. One of those behaviors is memorization, in which LLMs can generate the same content used to train them. Though previous research has discussed memorization, the memorization of LLMs still lacks explanation, especially the cause of memorization and the dynamics of generating them. In this research, we comprehensively discussed memorization from various perspectives and extended the discussion scope to not only just the memorized content but also less and unmemorized content. Through various studies, we found that: (1) Through experiments, we revealed the relation of memorization between model size, continuation size, and context size. Further, we showed how unmemorized sentences transition to memorized sentences. (2) Through embedding analysis, we showed the distribution and decoding dynamics across model size in embedding space for sentences with different memorization scores. The n-gram statistics analysis presents d (3) An analysis over n-gram and entropy decoding dynamics discovered a boundary effect when the model starts to generate memorized sentences or unmemorized sentences. (4)We trained a Transformer model to predict the memorization of different models, showing that it is possible to predict memorizations by context.
△ Less
Submitted 4 June, 2024; v1 submitted 19 May, 2024;
originally announced May 2024.
-
Energy Extraction from a Kerr Black Hole via Magnetic Reconnection within the Plunging Region
Authors:
Bin Chen,
Yehui Hou,
Junyi Li,
Ye Shen
Abstract:
Magnetic reconnection within a highly magnetized plasma has been seen as a viable mechanism to extract the energy from a rotating black hole, as it can generate negative energy plasmoids in the ergoregion. For a typical accreting black hole, the ergoregion is filled with bulk plasma plunging from the innermost-stable-circular orbit (ISCO). In this study, we present an analytical study of the energ…
▽ More
Magnetic reconnection within a highly magnetized plasma has been seen as a viable mechanism to extract the energy from a rotating black hole, as it can generate negative energy plasmoids in the ergoregion. For a typical accreting black hole, the ergoregion is filled with bulk plasma plunging from the innermost-stable-circular orbit (ISCO). In this study, we present an analytical study of the energy extraction via magnetic reconnection process in the plunging region. In contrast to the toroidal plasma, where the magnetic field cannot be derived from the MHD scheme, the magnetic field in the plunging plasma was determined by the ideal-MHD condition. We derive the global magnetic field structure in a fast reconnection model, and we read the expressions for the energies of plasmoids ejected from the reconnection region, for general stationary and axisymmetric spacetimes. Then, we demonstrate the behaviors of ejected energies varying with the reconnection locations in the Kerr spacetime, and identify the region where a negative-energy plasmoid can be produced. We find that for a certain magnetization there exists a critical value of the black hole spin, beyond which the energy extraction can occur, and the energy extraction is most efficient for the near-extreme black hole. Moreover, we study the conditions necessary for a plasmoid with positive energy to escape to the infinity, a crucial requirement for effective energy extractions. Considering the esca** conditions, we provide the parameter space in the radius-spin plane in which the energy extraction mechanism is effective.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Filling Missing Values Matters for Range Image-Based Point Cloud Segmentation
Authors:
Bike Chen,
Chen Gong,
Juha Röning
Abstract:
Point cloud segmentation (PCS) plays an essential role in robot perception and navigation tasks. To efficiently understand large-scale outdoor point clouds, their range image representation is commonly adopted. This image-like representation is compact and structured, making range image-based PCS models practical. However, undesirable missing values in the range images damage the shapes and patter…
▽ More
Point cloud segmentation (PCS) plays an essential role in robot perception and navigation tasks. To efficiently understand large-scale outdoor point clouds, their range image representation is commonly adopted. This image-like representation is compact and structured, making range image-based PCS models practical. However, undesirable missing values in the range images damage the shapes and patterns of objects. This problem creates difficulty for the models in learning coherent and complete geometric information from the objects. Consequently, the PCS models only achieve inferior performance. Delving deeply into this issue, we find that the use of unreasonable projection approaches and deskewing scans mainly leads to unwanted missing values in the range images. Besides, almost all previous works fail to consider filling in the unexpected missing values in the PCS task. To alleviate this problem, we first propose a new projection method, namely scan unfolding++ (SU++), to avoid massive missing values in the generated range images. Then, we introduce a simple yet effective approach, namely range-dependent $K$-nearest neighbor interpolation ($K$NNI), to further fill in missing values. Finally, we introduce the Filling Missing Values Network (FMVNet) and Fast FMVNet. Extensive experimental results on SemanticKITTI, SemanticPOSS, and nuScenes datasets demonstrate that by employing the proposed SU++ and $K$NNI, existing range image-based PCS models consistently achieve better performance than the baseline models. Besides, both FMVNet and Fast FMVNet achieve state-of-the-art performance in terms of the speed-accuracy trade-off. The proposed methods can be applied to other range image-based tasks and practical applications.
△ Less
Submitted 25 May, 2024; v1 submitted 16 May, 2024;
originally announced May 2024.
-
VirtualModel: Generating Object-ID-retentive Human-object Interaction Image by Diffusion Model for E-commerce Marketing
Authors:
Binghui Chen,
Chongyang Zhong,
Wangmeng Xiang,
Yifeng Geng,
Xuansong Xie
Abstract:
Due to the significant advances in large-scale text-to-image generation by diffusion model (DM), controllable human image generation has been attracting much attention recently. Existing works, such as Controlnet [36], T2I-adapter [20] and HumanSD [10] have demonstrated good abilities in generating human images based on pose conditions, they still fail to meet the requirements of real e-commerce s…
▽ More
Due to the significant advances in large-scale text-to-image generation by diffusion model (DM), controllable human image generation has been attracting much attention recently. Existing works, such as Controlnet [36], T2I-adapter [20] and HumanSD [10] have demonstrated good abilities in generating human images based on pose conditions, they still fail to meet the requirements of real e-commerce scenarios. These include (1) the interaction between the shown product and human should be considered, (2) human parts like face/hand/arm/foot and the interaction between human model and product should be hyper-realistic, and (3) the identity of the product shown in advertising should be exactly consistent with the product itself. To this end, in this paper, we first define a new human image generation task for e-commerce marketing, i.e., Object-ID-retentive Human-object Interaction image Generation (OHG), and then propose a VirtualModel framework to generate human images for product shown, which supports displays of any categories of products and any types of human-object interaction. As shown in Figure 1, VirtualModel not only outperforms other methods in terms of accurate pose control and image quality but also allows for the display of user-specified product objects by maintaining the product-ID consistency and enhancing the plausibility of human-object interaction. Codes and data will be released.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Semantic Gesticulator: Semantics-Aware Co-Speech Gesture Synthesis
Authors:
Zeyi Zhang,
Tenglong Ao,
Yuyao Zhang,
Qingzhe Gao,
Chuan Lin,
Baoquan Chen,
Libin Liu
Abstract:
In this work, we present Semantic Gesticulator, a novel framework designed to synthesize realistic gestures accompanying speech with strong semantic correspondence. Semantically meaningful gestures are crucial for effective non-verbal communication, but such gestures often fall within the long tail of the distribution of natural human motion. The sparsity of these movements makes it challenging fo…
▽ More
In this work, we present Semantic Gesticulator, a novel framework designed to synthesize realistic gestures accompanying speech with strong semantic correspondence. Semantically meaningful gestures are crucial for effective non-verbal communication, but such gestures often fall within the long tail of the distribution of natural human motion. The sparsity of these movements makes it challenging for deep learning-based systems, trained on moderately sized datasets, to capture the relationship between the movements and the corresponding speech semantics. To address this challenge, we develop a generative retrieval framework based on a large language model. This framework efficiently retrieves suitable semantic gesture candidates from a motion library in response to the input speech. To construct this motion library, we summarize a comprehensive list of commonly used semantic gestures based on findings in linguistics, and we collect a high-quality motion dataset encompassing both body and hand movements. We also design a novel GPT-based model with strong generalization capabilities to audio, capable of generating high-quality gestures that match the rhythm of speech. Furthermore, we propose a semantic alignment mechanism to efficiently align the retrieved semantic gestures with the GPT's output, ensuring the naturalness of the final animation. Our system demonstrates robustness in generating gestures that are rhythmically coherent and semantically explicit, as evidenced by a comprehensive collection of examples. User studies confirm the quality and human-likeness of our results, and show that our system outperforms state-of-the-art systems in terms of semantic appropriateness by a clear margin.
△ Less
Submitted 16 May, 2024; v1 submitted 16 May, 2024;
originally announced May 2024.
-
Search for the leptonic decays $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
M. Albrecht,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
R. Baldini Ferroli,
I. Balossino,
Y. Ban,
V. Batozskaya,
D. Becker,
K. Begzsuren,
N. Berger,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
J. Bloms,
A. Bortone,
I. Boyko
, et al. (559 additional authors not shown)
Abstract:
We present the first search for the leptonic decays $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$ by analyzing a data sample of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.178 and 4.226 GeV, corresponding to an integrated luminosity of 6.32~fb$^{-1}$. No significant signal is observed. The upper limits on the branching fractions for…
▽ More
We present the first search for the leptonic decays $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$ by analyzing a data sample of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.178 and 4.226 GeV, corresponding to an integrated luminosity of 6.32~fb$^{-1}$. No significant signal is observed. The upper limits on the branching fractions for $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$ are set to be $1.1 \times 10^{-5}$ and $4.3 \times 10^{-6}$ at 90\% confidence level, respectively.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Jacobian Regularizer-based Neural Granger Causality
Authors:
Wanqi Zhou,
Shuanghao Bai,
Shujian Yu,
Qibin Zhao,
Badong Chen
Abstract:
With the advancement of neural networks, diverse methods for neural Granger causality have emerged, which demonstrate proficiency in handling complex data, and nonlinear relationships. However, the existing framework of neural Granger causality has several limitations. It requires the construction of separate predictive models for each target variable, and the relationship depends on the sparsity…
▽ More
With the advancement of neural networks, diverse methods for neural Granger causality have emerged, which demonstrate proficiency in handling complex data, and nonlinear relationships. However, the existing framework of neural Granger causality has several limitations. It requires the construction of separate predictive models for each target variable, and the relationship depends on the sparsity on the weights of the first layer, resulting in challenges in effectively modeling complex relationships between variables as well as unsatisfied estimation accuracy of Granger causality. Moreover, most of them cannot grasp full-time Granger causality. To address these drawbacks, we propose a Jacobian Regularizer-based Neural Granger Causality (JRNGC) approach, a straightforward yet highly effective method for learning multivariate summary Granger causality and full-time Granger causality by constructing a single model for all target variables. Specifically, our method eliminates the sparsity constraints of weights by leveraging an input-output Jacobian matrix regularizer, which can be subsequently represented as the weighted causal matrix in the post-hoc analysis. Extensive experiments show that our proposed approach achieves competitive performance with the state-of-the-art methods for learning summary Granger causality and full-time Granger causality while maintaining lower model complexity and high scalability.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Certifying Robustness of Graph Convolutional Networks for Node Perturbation with Polyhedra Abstract Interpretation
Authors:
Boqi Chen,
Kristóf Marussy,
Oszkár Semeráth,
Gunter Mussbacher,
Dániel Varró
Abstract:
Graph convolutional neural networks (GCNs) are powerful tools for learning graph-based knowledge representations from training data. However, they are vulnerable to small perturbations in the input graph, which makes them susceptible to input faults or adversarial attacks. This poses a significant problem for GCNs intended to be used in critical applications, which need to provide certifiably robu…
▽ More
Graph convolutional neural networks (GCNs) are powerful tools for learning graph-based knowledge representations from training data. However, they are vulnerable to small perturbations in the input graph, which makes them susceptible to input faults or adversarial attacks. This poses a significant problem for GCNs intended to be used in critical applications, which need to provide certifiably robust services even in the presence of adversarial perturbations. We propose an improved GCN robustness certification technique for node classification in the presence of node feature perturbations. We introduce a novel polyhedra-based abstract interpretation approach to tackle specific challenges of graph data and provide tight upper and lower bounds for the robustness of the GCN. Experiments show that our approach simultaneously improves the tightness of robustness bounds as well as the runtime performance of certification. Moreover, our method can be used during training to further improve the robustness of GCNs.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Search for the radiative transition $χ_{c1}(3872)\toγψ_2(3823)$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (635 additional authors not shown)
Abstract:
Using 9.0 $\rm fb^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies from 4.178 to 4.278 GeV with the BESIII detector at the BEPCII collider, we perform the first search for the radiative transition $χ_{c1}(3872)\toγψ_2(3823)$. No $χ_{c1}(3872)\toγψ_2(3823)$ signal is observed. The upper limit on the ratio of branching fractions…
▽ More
Using 9.0 $\rm fb^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies from 4.178 to 4.278 GeV with the BESIII detector at the BEPCII collider, we perform the first search for the radiative transition $χ_{c1}(3872)\toγψ_2(3823)$. No $χ_{c1}(3872)\toγψ_2(3823)$ signal is observed. The upper limit on the ratio of branching fractions $\mathcal{B}(χ_{c1}(3872)\toγψ_2(3823), ψ_2(3823)\toγχ_{c1})/\mathcal{B}(χ_{c1}(3872)\toπ^+π^- J/ψ)$ is set as 0.075 at the 90\% confidence level. Our result contradicts theoretical predictions under the assumption that the $χ_{c1}(3872)$ is the pure charmonium state $χ_{c1}(2P)$.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Memory Mosaics
Authors:
Jianyu Zhang,
Niklas Nolte,
Ranajoy Sadhukhan,
Beidi Chen,
Léon Bottou
Abstract:
Memory Mosaics are networks of associative memories working in concert to achieve a prediction task of interest. Like transformers, memory mosaics possess compositional capabilities and in-context learning capabilities. Unlike transformers, memory mosaics achieve these capabilities in comparatively transparent ways. We demonstrate these capabilities on toy examples and we also show that memory mos…
▽ More
Memory Mosaics are networks of associative memories working in concert to achieve a prediction task of interest. Like transformers, memory mosaics possess compositional capabilities and in-context learning capabilities. Unlike transformers, memory mosaics achieve these capabilities in comparatively transparent ways. We demonstrate these capabilities on toy examples and we also show that memory mosaics perform as well or better than transformers on medium-scale language modeling tasks.
△ Less
Submitted 13 May, 2024; v1 submitted 10 May, 2024;
originally announced May 2024.
-
Measurement of the ${e}^{+}{e}^{-}\to p \bar{p}π^{0}$ cross section at $\sqrt{s}=2.1000-3.0800$ GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
The process $e^{+}e^{-}\to p\bar{p}π^{0}$ is studied at 20 center-of-mass energies ranging from 2.1000 to 3.0800 GeV using 636.8 pb$^{-1}$ of data collected with the BESIII detector operating at the BEPCII collider. The Born cross sections for $e^{+}e^{-}\to p\bar{p}π^{0}$ are measured with high precision. Since the lowest center-of-mass energy, 2.1000 GeV, is less than 90 MeV above the…
▽ More
The process $e^{+}e^{-}\to p\bar{p}π^{0}$ is studied at 20 center-of-mass energies ranging from 2.1000 to 3.0800 GeV using 636.8 pb$^{-1}$ of data collected with the BESIII detector operating at the BEPCII collider. The Born cross sections for $e^{+}e^{-}\to p\bar{p}π^{0}$ are measured with high precision. Since the lowest center-of-mass energy, 2.1000 GeV, is less than 90 MeV above the $p\bar{p}π^0$ energy threshold, we can probe the threshold behavior for this reaction. However, no anomalous threshold enhancement is found in the cross sections for $e^{+}e^{-}\to p\bar{p}π^{0}$.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
MGS-SLAM: Monocular Sparse Tracking and Gaussian Map** with Depth Smooth Regularization
Authors:
Pengcheng Zhu,
Yaoming Zhuang,
Baoquan Chen,
Li Li,
Chengdong Wu,
Zhanlin Liu
Abstract:
This letter introduces a novel framework for dense Visual Simultaneous Localization and Map** (VSLAM) based on Gaussian Splatting. Recently Gaussian Splatting-based SLAM has yielded promising results, but rely on RGB-D input and is weak in tracking. To address these limitations, we uniquely integrates advanced sparse visual odometry with a dense Gaussian Splatting scene representation for the fi…
▽ More
This letter introduces a novel framework for dense Visual Simultaneous Localization and Map** (VSLAM) based on Gaussian Splatting. Recently Gaussian Splatting-based SLAM has yielded promising results, but rely on RGB-D input and is weak in tracking. To address these limitations, we uniquely integrates advanced sparse visual odometry with a dense Gaussian Splatting scene representation for the first time, thereby eliminating the dependency on depth maps typical of Gaussian Splatting-based SLAM systems and enhancing tracking robustness. Here, the sparse visual odometry tracks camera poses in RGB stream, while Gaussian Splatting handles map reconstruction. These components are interconnected through a Multi-View Stereo (MVS) depth estimation network. And we propose a depth smooth loss to reduce the negative effect of estimated depth maps. Furthermore, the consistency in scale between the sparse visual odometry and the dense Gaussian map is preserved by Sparse-Dense Adjustment Ring (SDAR). We have evaluated our system across various synthetic and real-world datasets. The accuracy of our pose estimation surpasses existing methods and achieves state-of-the-art performance. Additionally, it outperforms previous monocular methods in terms of novel view synthesis fidelity, matching the results of neural SLAM systems that utilize RGB-D input.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Emergent Conformal Symmetry at the Multicritical Point of (2+1)D SO(5) Model with Wess-Zumino-Witten Term on Sphere
Authors:
Bin-Bin Chen,
Xu Zhang,
Zi Yang Meng
Abstract:
Novel critical phenomena beyond the Landau-Ginzburg-Wilson paradigm have been long sought after. Among many candidate scenarios, the deconfined quantum critical point (DQCP) constitutes the most fascinating one, and its lattice model realization has been debated over the past two decades. Following the pioneering works with the fuzzy sphere methods [W. Zhu, et al, Phys. Rev. X 13, 021009], we appl…
▽ More
Novel critical phenomena beyond the Landau-Ginzburg-Wilson paradigm have been long sought after. Among many candidate scenarios, the deconfined quantum critical point (DQCP) constitutes the most fascinating one, and its lattice model realization has been debated over the past two decades. Following the pioneering works with the fuzzy sphere methods [W. Zhu, et al, Phys. Rev. X 13, 021009], we apply the spherical Landau level regularization to study the effective (2+1)D SO(5) non-linear sigma model with a topological term and the potential DQCP therein. Utilizing the state-of-the-art density matrix renormalization group method with explicit $\text{SU(2)}_\text{spin}\times\text{U(1)}_\text{charge}\times\text{U(1)}_\text{angular-momentum}$ symmetry as well as exact diagonalization simulations, we provide a comprehensive phase diagram for the model with a SO(5) continuous transition line -- extension of the previous identified SO(5) multicritical point [arXiv:2307.05307] -- while tuning interaction length. The state-operator correspondence with the conformal tower structure is used to identify the emergent conformal symmetry with the best scaling dimension of relevant primary fields and they match well with the critical exponents obtained from the crossing point analysis of the correlation ratio. Our results thus further support the rich structure of the phase diagram of the SO(5) model.
△ Less
Submitted 15 May, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
Study of Particle Acceleration using Fine Structures and Oscillations in Microwaves from Electron Cyclotron Maser
Authors:
Rohit Sharma,
Marina Battaglia,
Sijie Yu,
Bin Chen,
Yingjie Luo,
Sam Krucker
Abstract:
The accelerated electrons during solar flares produce radio bursts and nonthermal X-ray signatures. The quasi-periodic pulsations (QPPs) and fine structures in spatial-spectral-temporal space in radio bursts depend on the emission mechanism and the local conditions, such as magnetic fields, electron density, and pitch angle distribution. Radio burst observations with high frequency-time resolution…
▽ More
The accelerated electrons during solar flares produce radio bursts and nonthermal X-ray signatures. The quasi-periodic pulsations (QPPs) and fine structures in spatial-spectral-temporal space in radio bursts depend on the emission mechanism and the local conditions, such as magnetic fields, electron density, and pitch angle distribution. Radio burst observations with high frequency-time resolution imaging provide excellent diagnostics. In converging magnetic field geometries, the radio bursts can be produced via the electron-cyclotron maser (ECM). Recently, using observations made by the Karl G. Jansky Very Large Array (VLA) at 1--2 GHz, \cite{Yu2023} reported a discovery of long-lasting auroral-like radio bursts persistent over a sunspot and interpreted them as ECM-generated emission. Here, we investigate the detailed second and sub-second temporal variability of this continuous ECM source. We study the association of 5-second period QPPs with a concurrent GOES C1.5-class flare, utilizing VLA's imaging spectroscopy capability with an extremely high temporal resolution (50 ms). We use the density and magnetic field extrapolation model to constrain the ECM emission to the second harmonic o-mode. Using the delay of QPPs from X-ray emission times, combined with X-ray spectroscopy and magnetic extrapolation, we constrain the energies and pitch angles of the ECM-emitting electrons to $\approx$4-8 keV and $>26^{\circ}$. Our analysis shows that the loss-cone diffusion continuously fuels the ECM via Coulomb collisions and magnetic turbulence between a 5 Mm--100 Mm length scale. We conclude that the QPP occurs via the Lotka-Volterra system, where the electron from solar flares saturates the continuously operating ECM and causes temporary oscillations.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Observation of Mermin-Wagner behavior in LaFeO$_3$/SrTiO$_3$ superlattices
Authors:
Michal Kiaba,
Andreas Suter,
Zaher Salman,
Thomas Prokscha,
Binbin Chen,
Gertjan Koster,
Adam Dubroka
Abstract:
Two-dimensional magnetic materials attract a lot of attention since they potentially exhibit new magnetic properties due to, e.g., strongly enhanced spin fluctuations. However, the suppression of the long-range magnetic order in two dimensions due to long-wavelength spin fluctuations, as suggested by the Mermin-Wagner theorem, has been questioned for finite-size laboratory samples. Here we study t…
▽ More
Two-dimensional magnetic materials attract a lot of attention since they potentially exhibit new magnetic properties due to, e.g., strongly enhanced spin fluctuations. However, the suppression of the long-range magnetic order in two dimensions due to long-wavelength spin fluctuations, as suggested by the Mermin-Wagner theorem, has been questioned for finite-size laboratory samples. Here we study the magnetic properties of a dimensional crossover in superlattices composed of the antiferromagnetic LaFeO$_3$ and SrTiO$_3$ that, thanks to their large lateral size, allowed examination using a sensitive magnetic probe - muon spin rotation spectroscopy. We show that the iron electronic moments in superlattices with 3 and 2 monolayers of LaFeO$_3$ exhibit a static antiferromagnetic order. In contrast, in the superlattices with single LaFeO$_3$ monolayer, the moments do not order and fluctuate to the lowest measured temperature as expected from the Mermin-Wagner theorem. Our work shows how dimensionality can be used to tune the magnetic properties of ultrathin films.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
On self-dual Carrollian conformal nonlinear electrodynamics
Authors:
Bin Chen,
Jue Hou,
Haowei Sun
Abstract:
In this work, we study the duality symmetry group of Carrollian (nonlinear) electrodynamics and propose a family of Carrollian ModMax theories, which are invariant under Carrollian $\text{SO}(2)$ electromagnetic (EM) duality transformations and conformal transformation. We define the Carrollian $\text{SO}(2)$ EM transformations, with the help of Hodge duality in Carrollian geometry, then we rederi…
▽ More
In this work, we study the duality symmetry group of Carrollian (nonlinear) electrodynamics and propose a family of Carrollian ModMax theories, which are invariant under Carrollian $\text{SO}(2)$ electromagnetic (EM) duality transformations and conformal transformation. We define the Carrollian $\text{SO}(2)$ EM transformations, with the help of Hodge duality in Carrollian geometry, then we rederive the Gaillard-Zumino consistency condition for EM duality of Carrollian nonlinear electrodynamics. Together with the traceless condition for the energy-momentum tensor, we are able to determine the Lagrangian of the Carrollian ModMax theories among pure electrodynamics. We furthermore study their behaviors under the $\sqrt{T\bar{T}}$ deformation flow, and show that these theories deform to each other and may reach two endpoints under the flow, with one of the endpoint being the Carrollian Maxwell theory. As a byproduct, we construct a family of two-dimensional Carrollian ModMax-like multiple scalar theories, which are closed under the $\sqrt{T\bar{T}}$ flow and may flow to a BMS free multi-scalar model.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Generalized Cauchy-Schwarz Divergence and Its Deep Learning Applications
Authors:
Mingfei Lu,
Chenxu Li,
Shujian Yu,
Robert Jenssen,
Badong Chen
Abstract:
Divergence measures play a central role and become increasingly essential in deep learning, yet efficient measures for multiple (more than two) distributions are rarely explored. This becomes particularly crucial in areas where the simultaneous management of multiple distributions is both inevitable and essential. Examples include clustering, multi-source domain adaptation or generalization, and m…
▽ More
Divergence measures play a central role and become increasingly essential in deep learning, yet efficient measures for multiple (more than two) distributions are rarely explored. This becomes particularly crucial in areas where the simultaneous management of multiple distributions is both inevitable and essential. Examples include clustering, multi-source domain adaptation or generalization, and multi-view learning, among others. While computing the mean of pairwise distances between any two distributions is a prevalent method to quantify the total divergence among multiple distributions, it is imperative to acknowledge that this approach is not straightforward and necessitates significant computational resources. In this study, we introduce a new divergence measure tailored for multiple distributions named the generalized Cauchy-Schwarz divergence (GCSD). Additionally, we furnish a kernel-based closed-form sample estimator, making it convenient and straightforward to use in various machine-learning applications. Finally, we explore its profound implications in the realm of deep learning by applying it to tackle two thoughtfully chosen machine-learning tasks: deep clustering and multi-source domain adaptation. Our extensive experimental investigations confirm the robustness and effectiveness of GCSD in both scenarios. The findings also underscore the innovative potential of GCSD and its capability to significantly propel machine learning methodologies that necessitate the quantification of multiple distributions.
△ Less
Submitted 5 June, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
Finite size effect on gluon dissociation of J/psi in relativistic heavy ion collisions
Authors:
**g**g Wang,
Baoyi Chen,
Yunpeng Liu
Abstract:
Thermal quantities, including the the entropy density and gluon spectrum, of quark matter within a box that is finite in the longitudinal direction are calculated using a bag model. Under the assumption of entropy conservation, the corresponding gluon dissociation rate of J/psi is studied. It reaches a maximum at a certain longitudinal size L_m, below which the suppression is weak even if the temp…
▽ More
Thermal quantities, including the the entropy density and gluon spectrum, of quark matter within a box that is finite in the longitudinal direction are calculated using a bag model. Under the assumption of entropy conservation, the corresponding gluon dissociation rate of J/psi is studied. It reaches a maximum at a certain longitudinal size L_m, below which the suppression is weak even if the temperature becomes higher than that without the finite size effect, and above which the dissociation rate approaches to the thermodynamic limit gradually with increasing longitudinal size of the fireball.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Invertible Residual Rescaling Models
Authors:
**min Li,
Tao Dai,
Yaohua Zha,
Yilu Luo,
Longfei Lu,
Bin Chen,
Zhi Wang,
Shu-Tao Xia,
**gyun Zhang
Abstract:
Invertible Rescaling Networks (IRNs) and their variants have witnessed remarkable achievements in various image processing tasks like image rescaling. However, we observe that IRNs with deeper networks are difficult to train, thus hindering the representational ability of IRNs. To address this issue, we propose Invertible Residual Rescaling Models (IRRM) for image rescaling by learning a bijection…
▽ More
Invertible Rescaling Networks (IRNs) and their variants have witnessed remarkable achievements in various image processing tasks like image rescaling. However, we observe that IRNs with deeper networks are difficult to train, thus hindering the representational ability of IRNs. To address this issue, we propose Invertible Residual Rescaling Models (IRRM) for image rescaling by learning a bijection between a high-resolution image and its low-resolution counterpart with a specific distribution. Specifically, we propose IRRM to build a deep network, which contains several Residual Downscaling Modules (RDMs) with long skip connections. Each RDM consists of several Invertible Residual Blocks (IRBs) with short connections. In this way, RDM allows rich low-frequency information to be bypassed by skip connections and forces models to focus on extracting high-frequency information from the image. Extensive experiments show that our IRRM performs significantly better than other state-of-the-art methods with much fewer parameters and complexity. Particularly, our IRRM has respectively PSNR gains of at least 0.3 dB over HCFlow and IRN in the x4 rescaling while only using 60% parameters and 50% FLOPs. The code will be available at https://github.com/THU-Kingmin/IRRM.
△ Less
Submitted 12 May, 2024; v1 submitted 5 May, 2024;
originally announced May 2024.
-
Predicting Open-Hole Laminates Failure Using Support Vector Machines With Classical and Quantum Kernels
Authors:
Giorgio Tosti Balducci,
Boyang Chen,
Matthias Möller,
Marc Gerritsma,
Roeland De Breuker
Abstract:
Modeling open hole failure of composites is a complex task, consisting in a highly nonlinear response with interacting failure modes. Numerical modeling of this phenomenon has traditionally been based on the finite element method, but requires to tradeoff between high fidelity and computational cost. To mitigate this shortcoming, recent work has leveraged machine learning to predict the strength o…
▽ More
Modeling open hole failure of composites is a complex task, consisting in a highly nonlinear response with interacting failure modes. Numerical modeling of this phenomenon has traditionally been based on the finite element method, but requires to tradeoff between high fidelity and computational cost. To mitigate this shortcoming, recent work has leveraged machine learning to predict the strength of open hole composite specimens. Here, we also propose using data-based models but to tackle open hole composite failure from a classification point of view. More specifically, we show how to train surrogate models to learn the ultimate failure envelope of an open hole composite plate under in-plane loading. To achieve this, we solve the classification problem via support vector machine (SVM) and test different classifiers by changing the SVM kernel function. The flexibility of kernel-based SVM also allows us to integrate the recently developed quantum kernels in our algorithm and compare them with the standard radial basis function (RBF) kernel. Finally, thanks to kernel-target alignment optimization, we tune the free parameters of all kernels to best separate safe and failure-inducing loading states. The results show classification accuracies higher than 90% for RBF, especially after alignment, followed closely by the quantum kernel classifiers.
△ Less
Submitted 9 June, 2024; v1 submitted 5 May, 2024;
originally announced May 2024.
-
Hiding Sensitive Information Using PDF Steganography
Authors:
Ryan Klemm,
Bo Chen
Abstract:
The use of steganography to transmit secret data is becoming increasingly common in security products and malware today. Despite being extremely popular, PDF files are not often the focus of steganography research, as most applications utilize digital image, audio, and video files as their cover data. However, the PDF file format is promising for usage in medium-capacity steganography applications…
▽ More
The use of steganography to transmit secret data is becoming increasingly common in security products and malware today. Despite being extremely popular, PDF files are not often the focus of steganography research, as most applications utilize digital image, audio, and video files as their cover data. However, the PDF file format is promising for usage in medium-capacity steganography applications. In this paper, we present a novel PDF steganography algorithm based upon least-significant bit insertion into the real-valued operands of PDF stream operators. Where prior research has only considered a small subset of these operators, we take an extensive look at all the possible operators defined in the Adobe PDF standard to evaluate their usability in our steganography algorithm. We also provide a case study which embeds malware into a given cover PDF document.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Revisiting the Adversarial Robustness of Vision Language Models: a Multimodal Perspective
Authors:
Wanqi Zhou,
Shuanghao Bai,
Qibin Zhao,
Badong Chen
Abstract:
Pretrained vision-language models (VLMs) like CLIP have shown impressive generalization performance across various downstream tasks, yet they remain vulnerable to adversarial attacks. While prior research has primarily concentrated on improving the adversarial robustness of image encoders to guard against attacks on images, the exploration of text-based and multimodal attacks has largely been over…
▽ More
Pretrained vision-language models (VLMs) like CLIP have shown impressive generalization performance across various downstream tasks, yet they remain vulnerable to adversarial attacks. While prior research has primarily concentrated on improving the adversarial robustness of image encoders to guard against attacks on images, the exploration of text-based and multimodal attacks has largely been overlooked. In this work, we initiate the first known and comprehensive effort to study adapting vision-language models for adversarial robustness under the multimodal attack. Firstly, we introduce a multimodal attack strategy and investigate the impact of different attacks. We then propose a multimodal contrastive adversarial training loss, aligning the clean and adversarial text embeddings with the adversarial and clean visual features, to enhance the adversarial robustness of both image and text encoders of CLIP. Extensive experiments on 15 datasets across two tasks demonstrate that our method significantly improves the adversarial robustness of CLIP. Interestingly, we find that the model fine-tuned against multimodal adversarial attacks exhibits greater robustness than its counterpart fine-tuned solely against image-based attacks, even in the context of image attacks, which may open up new possibilities for enhancing the security of VLMs.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Soft Prompt Generation for Domain Generalization
Authors:
Shuanghao Bai,
Yuedi Zhang,
Wanqi Zhou,
Zhirong Luan,
Badong Chen
Abstract:
Large pre-trained vision language models (VLMs) have shown impressive zero-shot ability on downstream tasks with manually designed prompt, which are not optimal for specific domains. To further adapt VLMs to downstream tasks, soft prompt is proposed to replace manually designed prompt, which acts as a learning vector that undergoes fine-tuning based on specific domain data. Prior prompt learning m…
▽ More
Large pre-trained vision language models (VLMs) have shown impressive zero-shot ability on downstream tasks with manually designed prompt, which are not optimal for specific domains. To further adapt VLMs to downstream tasks, soft prompt is proposed to replace manually designed prompt, which acts as a learning vector that undergoes fine-tuning based on specific domain data. Prior prompt learning methods primarily learn a fixed prompt and residuled prompt from training samples. However, the learned prompts lack diversity and ignore information about unseen domains, potentially compromising the transferability of the prompts. In this paper, we reframe the prompt learning framework from a generative perspective and propose a simple yet efficient method for the Domain Generalization (DG) task, namely \textbf{S}oft \textbf{P}rompt \textbf{G}eneration (SPG). To the best of our knowledge, we are the first to introduce the generative model into prompt learning in VLMs and explore its potential for producing soft prompts by relying solely on the generative model, ensuring the diversity of prompts. Specifically, SPG consists of a two-stage training phase and an inference phase. During the training phase, we introduce soft prompt labels for each domain, aiming to incorporate the generative model domain knowledge. During the inference phase, the generator of the generative model is employed to obtain instance-specific soft prompts for the unseen target domain. Extensive experiments on five domain generalization benchmarks of three DG tasks demonstrate that our proposed SPG achieves state-of-the-art performance. The code will be available soon.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting
Authors:
Bo Chen,
Shoukang Hu,
Qi Chen,
Chenpeng Du,
Ran Yi,
Yanmin Qian,
Xie Chen
Abstract:
We present GStalker, a 3D audio-driven talking face generation model with Gaussian Splatting for both fast training (40 minutes) and real-time rendering (125 FPS) with a 3$\sim$5 minute video for training material, in comparison with previous 2D and 3D NeRF-based modeling frameworks which require hours of training and seconds of rendering per frame. Specifically, GSTalker learns an audio-driven Ga…
▽ More
We present GStalker, a 3D audio-driven talking face generation model with Gaussian Splatting for both fast training (40 minutes) and real-time rendering (125 FPS) with a 3$\sim$5 minute video for training material, in comparison with previous 2D and 3D NeRF-based modeling frameworks which require hours of training and seconds of rendering per frame. Specifically, GSTalker learns an audio-driven Gaussian deformation field to translate and transform 3D Gaussians to synchronize with audio information, in which multi-resolution hashing grid-based tri-plane and temporal smooth module are incorporated to learn accurate deformation for fine-grained facial details. In addition, a pose-conditioned deformation field is designed to model the stabilized torso. To enable efficient optimization of the condition Gaussian deformation field, we initialize 3D Gaussians by learning a coarse static Gaussian representation. Extensive experiments in person-specific videos with audio tracks validate that GSTalker can generate high-fidelity and audio-lips synchronized results with fast training and real-time rendering speed.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
RSCaMa: Remote Sensing Image Change Captioning with State Space Model
Authors:
Chenyang Liu,
Keyan Chen,
Bowen Chen,
Haotian Zhang,
Zhengxia Zou,
Zhenwei Shi
Abstract:
Remote Sensing Image Change Captioning (RSICC) aims to describe surface changes between multi-temporal remote sensing images in language, including the changed object categories, locations, and dynamics of changing objects (e.g., added or disappeared). This poses challenges to spatial and temporal modeling of bi-temporal features. Despite previous methods progressing in the spatial change percepti…
▽ More
Remote Sensing Image Change Captioning (RSICC) aims to describe surface changes between multi-temporal remote sensing images in language, including the changed object categories, locations, and dynamics of changing objects (e.g., added or disappeared). This poses challenges to spatial and temporal modeling of bi-temporal features. Despite previous methods progressing in the spatial change perception, there are still weaknesses in joint spatial-temporal modeling. To address this, in this paper, we propose a novel RSCaMa model, which achieves efficient joint spatial-temporal modeling through multiple CaMa layers, enabling iterative refinement of bi-temporal features. To achieve efficient spatial modeling, we introduce the recently popular Mamba (a state space model) with a global receptive field and linear complexity into the RSICC task and propose the Spatial Difference-aware SSM (SD-SSM), overcoming limitations of previous CNN- and Transformer-based methods in the receptive field and computational complexity. SD-SSM enhances the model's ability to capture spatial changes sharply. In terms of efficient temporal modeling, considering the potential correlation between the temporal scanning characteristics of Mamba and the temporality of the RSICC, we propose the Temporal-Traversing SSM (TT-SSM), which scans bi-temporal features in a temporal cross-wise manner, enhancing the model's temporal understanding and information interaction. Experiments validate the effectiveness of the efficient joint spatial-temporal modeling and demonstrate the outstanding performance of RSCaMa and the potential of the Mamba in the RSICC task. Additionally, we systematically compare three different language decoders, including Mamba, GPT-style decoder, and Transformer decoder, providing valuable insights for future RSICC research. The code will be available at \emph{\url{https://github.com/Chen-Yang-Liu/RSCaMa}}
△ Less
Submitted 21 May, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
Tunable coupling of a quantum phononic resonator to a transmon qubit with flip-chip architecture
Authors:
Xinhui Ruan,
Li Li,
Guihan Liang,
Silu Zhao,
Jia-heng Wang,
Yizhou Bu,
Bingjie Chen,
Xiaohui Song,
Xiang Li,
He Zhang,
**zhe Wang,
Qianchuan Zhao,
Kai Xu,
Heng Fan,
Yu-xi Liu,
**g Zhang,
Zhihui Peng,
Zhongcheng Xiang,
Dongning Zheng
Abstract:
A hybrid system with tunable coupling between phonons and qubits shows great potential for advancing quantum information processing. In this work, we demonstrate strong and tunable coupling between a surface acoustic wave (SAW) resonator and a transmon qubit based on galvanic-contact flip-chip technique. The coupling strength varies from $2π\times$7.0 MHz to -$2π\times$20.6 MHz, which is extracted…
▽ More
A hybrid system with tunable coupling between phonons and qubits shows great potential for advancing quantum information processing. In this work, we demonstrate strong and tunable coupling between a surface acoustic wave (SAW) resonator and a transmon qubit based on galvanic-contact flip-chip technique. The coupling strength varies from $2π\times$7.0 MHz to -$2π\times$20.6 MHz, which is extracted from different vacuum Rabi oscillation frequencies. The phonon-induced ac Stark shift of the qubit at different coupling strengths is also shown. Our approach offers a good experimental platform for exploring quantum acoustics and hybrid systems.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Retrieval-Oriented Knowledge for Click-Through Rate Prediction
Authors:
Huanshuo Liu,
Bo Chen,
Menghui Zhu,
Jianghao Lin,
Jiarui Qin,
Yang Yang,
Hao Zhang,
Ruiming Tang
Abstract:
Click-through rate (CTR) prediction plays an important role in personalized recommendations. Recently, sample-level retrieval-based models (e.g., RIM) have achieved remarkable performance by retrieving and aggregating relevant samples. However, their inefficiency at the inference stage makes them impractical for industrial applications. To overcome this issue, this paper proposes a universal plug-…
▽ More
Click-through rate (CTR) prediction plays an important role in personalized recommendations. Recently, sample-level retrieval-based models (e.g., RIM) have achieved remarkable performance by retrieving and aggregating relevant samples. However, their inefficiency at the inference stage makes them impractical for industrial applications. To overcome this issue, this paper proposes a universal plug-and-play Retrieval-Oriented Knowledge (ROK) framework. Specifically, a knowledge base, consisting of a retrieval-oriented embedding layer and a knowledge encoder, is designed to preserve and imitate the retrieved & aggregated representations in a decomposition-reconstruction paradigm. Knowledge distillation and contrastive learning methods are utilized to optimize the knowledge base, and the learned retrieval-enhanced representations can be integrated with arbitrary CTR models in both instance-wise and feature-wise manners. Extensive experiments on three large-scale datasets show that ROK achieves competitive performance with the retrieval-based CTR models while reserving superior inference efficiency and model compatibility.
△ Less
Submitted 28 April, 2024;
originally announced April 2024.
-
Exploring Transport Properties of Quark-Gluon Plasma with a Machine-Learning assisted Holographic Approach
Authors:
Bing Chen,
Xun Chen,
Xiaohua Li,
Zhou-Run Zhu,
Kai Zhou
Abstract:
Based on the holographic model, which incorporates the equation of state (EoS) and baryon number susceptibility for different flavors, we calculate the drag force, jet quenching parameter, and diffusion coefficient of the heavy quark at finite temperature and chemical potential. The holographic results for the diffusion coefficient align with lattice data for $N_f = 0$ and $N_f = 2+1$, falling wit…
▽ More
Based on the holographic model, which incorporates the equation of state (EoS) and baryon number susceptibility for different flavors, we calculate the drag force, jet quenching parameter, and diffusion coefficient of the heavy quark at finite temperature and chemical potential. The holographic results for the diffusion coefficient align with lattice data for $N_f = 0$ and $N_f = 2+1$, falling within their error margins. The holographic diffusion coefficient for heavy quark in the systems of different flavors agrees with the fitting results of ALICE data well. The jet quenching parameter in our model demonstrates strong consistency with the results from both RHIC and LHC for different flavors. By comparing the experimental data with other models, we can confirm the validity of our holographic model calculation. The work reinforces the potential of bottom-up holographic model in advancing our understanding of transport properties of hot and dense quark-gluon plasma.
△ Less
Submitted 22 May, 2024; v1 submitted 28 April, 2024;
originally announced April 2024.
-
ResVR: Joint Rescaling and Viewport Rendering of Omnidirectional Images
Authors:
Weiqi Li,
Shijie Zhao,
Bin Chen,
Xinhua Cheng,
Junlin Li,
Li Zhang,
Jian Zhang
Abstract:
With the advent of virtual reality technology, omnidirectional image (ODI) rescaling techniques are increasingly embraced for reducing transmitted and stored file sizes while preserving high image quality. Despite this progress, current ODI rescaling methods predominantly focus on enhancing the quality of images in equirectangular projection (ERP) format, which overlooks the fact that the content…
▽ More
With the advent of virtual reality technology, omnidirectional image (ODI) rescaling techniques are increasingly embraced for reducing transmitted and stored file sizes while preserving high image quality. Despite this progress, current ODI rescaling methods predominantly focus on enhancing the quality of images in equirectangular projection (ERP) format, which overlooks the fact that the content viewed on head mounted displays (HMDs) is actually a rendered viewport instead of an ERP image. In this work, we emphasize that focusing solely on ERP quality results in inferior viewport visual experiences for users. Thus, we propose ResVR, which is the first comprehensive framework for the joint Rescaling and Viewport Rendering of ODIs. ResVR allows obtaining LR ERP images for transmission while rendering high-quality viewports for users to watch on HMDs. In our ResVR, a novel discrete pixel sampling strategy is developed to tackle the complex map** between the viewport and ERP, enabling end-to-end training of ResVR pipeline. Furthermore, a spherical pixel shape representation technique is innovatively derived from spherical differentiation to significantly improve the visual quality of rendered viewports. Extensive experiments demonstrate that our ResVR outperforms existing methods in viewport rendering tasks across different fields of view, resolutions, and view directions while kee** a low transmission overhead.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
Authors:
Mostafa Elhoushi,
Akshat Shrivastava,
Diana Liskovich,
Basil Hosmer,
Bram Wasti,
Liangzhen Lai,
Anas Mahmoud,
Bilge Acun,
Saurabh Agarwal,
Ahmed Roman,
Ahmed A Aly,
Beidi Chen,
Carole-Jean Wu
Abstract:
We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs). First, during training we apply layer dropout, with low dropout rates for earlier layers and higher dropout rates for later layers, and an early exit loss where all transformer layers share the same exit. Second, during inference, we show that this training recipe increases the accuracy of early exi…
▽ More
We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs). First, during training we apply layer dropout, with low dropout rates for earlier layers and higher dropout rates for later layers, and an early exit loss where all transformer layers share the same exit. Second, during inference, we show that this training recipe increases the accuracy of early exit at earlier layers, without adding any auxiliary layers or modules to the model. Third, we present a novel self-speculative decoding solution where we exit at early layers and verify and correct with remaining layers of the model. Our proposed self-speculative decoding approach has less memory footprint than other speculative decoding approaches and benefits from shared compute and activations of the draft and verification stages. We run experiments on different Llama model sizes on different types of training: pretraining from scratch, continual pretraining, finetuning on specific data domain, and finetuning on specific task. We implement our inference solution and show speedups of up to 2.16x on summarization for CNN/DM documents, 1.82x on coding, and 2.0x on TOPv2 semantic parsing task.
△ Less
Submitted 29 April, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
A self-improving property of Riesz potentials in BMO
Authors:
You-Wei Benson Chen
Abstract:
In this paper we prove that for non-negative measurable functions $f$,
\begin{align*} I_αf \in BMO(\mathbb{R}^n) \text{ if and only if } I_αf \in BMO^β(\mathbb{R}^n) \text{ for } β\in (n-α,n].
\end{align*} Here $I_α$ denotes the Riesz potential of order $α$ and $BMO^β$ represents the space of functions of bounded $β$-dimensional mean oscillation.
In this paper we prove that for non-negative measurable functions $f$,
\begin{align*} I_αf \in BMO(\mathbb{R}^n) \text{ if and only if } I_αf \in BMO^β(\mathbb{R}^n) \text{ for } β\in (n-α,n].
\end{align*} Here $I_α$ denotes the Riesz potential of order $α$ and $BMO^β$ represents the space of functions of bounded $β$-dimensional mean oscillation.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
NTIRE 2024 Quality Assessment of AI-Generated Content Challenge
Authors:
Xiaohong Liu,
Xiongkuo Min,
Guangtao Zhai,
Chunyi Li,
Tengchuan Kou,
Wei Sun,
Haoning Wu,
Yixuan Gao,
Yuqin Cao,
Zicheng Zhang,
Xiele Wu,
Radu Timofte,
Fei Peng,
Huiyuan Fu,
Anlong Ming,
Chuanming Wang,
Huadong Ma,
Shuai He,
Zifei Dou,
Shu Chen,
Huacong Zhang,
Haiyi Xie,
Chengwei Wang,
Baoying Chen,
Jishen Zeng
, et al. (89 additional authors not shown)
Abstract:
This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte…
▽ More
This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Content (AIGC). The challenge is divided into the image track and the video track. The image track uses the AIGIQA-20K, which contains 20,000 AI-Generated Images (AIGIs) generated by 15 popular generative models. The image track has a total of 318 registered participants. A total of 1,646 submissions are received in the development phase, and 221 submissions are received in the test phase. Finally, 16 participating teams submitted their models and fact sheets. The video track uses the T2VQA-DB, which contains 10,000 AI-Generated Videos (AIGVs) generated by 9 popular Text-to-Video (T2V) models. A total of 196 participants have registered in the video track. A total of 991 submissions are received in the development phase, and 185 submissions are received in the test phase. Finally, 12 participating teams submitted their models and fact sheets. Some methods have achieved better results than baseline methods, and the winning methods in both tracks have demonstrated superior prediction performance on AIGC.
△ Less
Submitted 7 May, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
Quantum Imaginarity-Mixedness Trade-off: Characterizing Maximally Imaginary Mixed States
Authors:
Bin Chen,
Shao-Ming Fei
Abstract:
We investigate the trade-off relations between imaginarity and mixedness in arbitrary $d$-dimensional quantum systems. For given mixedness, a quantum state with maximum imaginarity is defined to be a "maximally imaginary mixed state" (MIMS). By using the $l_{1}$ norm of imaginarity and the normalized linear entropy, we conclusively identify the MIMSs for both qubit and qutrit systems. For high-dim…
▽ More
We investigate the trade-off relations between imaginarity and mixedness in arbitrary $d$-dimensional quantum systems. For given mixedness, a quantum state with maximum imaginarity is defined to be a "maximally imaginary mixed state" (MIMS). By using the $l_{1}$ norm of imaginarity and the normalized linear entropy, we conclusively identify the MIMSs for both qubit and qutrit systems. For high-dimensional quantum systems, we present a comprehensive class of MIMSs, which also gives rise to complementarity relations between the $1$-norm of imaginarity and the $1$-norm of mixedness, as well as between the relative entropy of imaginarity and the von Neumann entropy. Furthermore, we examine the evolution of the trade-off relation for single-qubit states under four specific Markovian channels: bit flip channel, phase dam** channel, depolarizing channel and amplitude dam** channel.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Fine-grained Spatial-temporal MLP Architecture for Metro Origin-Destination Prediction
Authors:
Yang Liu,
Binglin Chen,
Yongsen Zheng,
Guanbin Li,
Liang Lin
Abstract:
Accurate prediction of metro traffic is crucial for optimizing metro scheduling and enhancing overall transport efficiency. Analyzing fine-grained and comprehensive relations among stations effectively is imperative for metro Origin-Destination (OD) prediction. However, existing metro OD models either mix information from multiple OD pairs from the station's perspective or exclusively focus on a s…
▽ More
Accurate prediction of metro traffic is crucial for optimizing metro scheduling and enhancing overall transport efficiency. Analyzing fine-grained and comprehensive relations among stations effectively is imperative for metro Origin-Destination (OD) prediction. However, existing metro OD models either mix information from multiple OD pairs from the station's perspective or exclusively focus on a subset of OD pairs. These approaches may overlook fine-grained relations among OD pairs, leading to difficulties in predicting potential anomalous conditions. To address these challenges, we analyze traffic variations from the perspective of all OD pairs and propose a fine-grained spatial-temporal MLP architecture for metro OD prediction, namely ODMixer. Specifically, our ODMixer has double-branch structure and involves the Channel Mixer, the Multi-view Mixer, and the Bidirectional Trend Learner. The Channel Mixer aims to capture short-term temporal relations among OD pairs, the Multi-view Mixer concentrates on capturing relations from both origin and destination perspectives. To model long-term temporal relations, we introduce the Bidirectional Trend Learner. Extensive experiments on two large-scale metro OD prediction datasets HZMOD and SHMO demonstrate the advantages of our ODMixer. The code will be available.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Sparse Bayesian Correntropy Learning for Robust Muscle Activity Reconstruction from Noisy Brain Recordings
Authors:
Yuanhao Li,
Badong Chen,
Natsue Yoshimura,
Yasuharu Koike,
Okito Yamashita
Abstract:
Sparse Bayesian learning has promoted many effective frameworks for brain activity decoding, especially for the reconstruction of muscle activity. However, existing sparse Bayesian learning mainly employs Gaussian distribution as error assumption in the reconstruction task, which is not necessarily the truth in the real-world application. On the other hand, brain recording is known to be highly no…
▽ More
Sparse Bayesian learning has promoted many effective frameworks for brain activity decoding, especially for the reconstruction of muscle activity. However, existing sparse Bayesian learning mainly employs Gaussian distribution as error assumption in the reconstruction task, which is not necessarily the truth in the real-world application. On the other hand, brain recording is known to be highly noisy and contains many non-Gaussian noises, which could lead to significant performance degradation for sparse Bayesian learning method. The goal of this paper is to propose a new robust implementation for sparse Bayesian learning, so that robustness and sparseness can be realized simultaneously. Motivated by the great robustness of maximum correntropy criterion (MCC), we proposed an integration of MCC into the sparse Bayesian learning regime. To be specific, we derived the explicit error assumption inherent in the MCC and then leveraged it for the likelihood function. Meanwhile, we used the automatic relevance determination (ARD) technique for the sparse prior distribution. To fully evaluate the proposed method, a synthetic dataset and a real-world muscle activity reconstruction task with two different brain modalities were employed. Experimental results showed that our proposed sparse Bayesian correntropy learning framework improves significantly the robustness in a noisy regression task. The proposed method can realize higher correlation coefficient and lower root mean squared error in the real-world muscle activity reconstruction tasks. Sparse Bayesian correntropy learning provides a powerful tool for neural decoding which can promote the development of brain-computer interfaces.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
A Joint Microwave and Hard X-Ray Study Towards Understanding the Transport of Accelerated Electrons during an Eruptive Solar Flare
Authors:
Surajit Mondal,
Andrea F. Battaglia,
Bin Chen,
Sijie Yu
Abstract:
The standard flare model, despite its success, is limited in comprehensively explaining the various processes involving nonthermal particles. One such missing ingredient is a detailed understanding of the various processes involved during the transport of accelerated electrons from their site of acceleration to different parts of the flare region. Here we use simultaneous radio and X-ray observati…
▽ More
The standard flare model, despite its success, is limited in comprehensively explaining the various processes involving nonthermal particles. One such missing ingredient is a detailed understanding of the various processes involved during the transport of accelerated electrons from their site of acceleration to different parts of the flare region. Here we use simultaneous radio and X-ray observations from the Expanded Owens Valley Solar Array (EOVSA) and Spectrometer/Telescope for Imaging X-rays (STIX) onboard the Solar Orbiter (SolO), respectively, from two distinct viewing perspectives to study the electron transport processes. Through detailed spectral modeling of the coronal source using radio data and footpoint sources using X-ray spectra, we compare the nonthermal electron distribution at the coronal and footpoint sources. We find that the flux of nonthermal electrons precipitated at the footpoint is an order of magnitude greater than that trapped in the looptop, consistent with earlier works which primarily used X-ray for their studies. In addition, we find that the electron spectral indices obtained from X-ray footpoints is significantly softer than the spectral hardness of the nonthermal electron distribution in the corona. We interpret these differences based on transport effects and the difference in sensitivity of microwave and X-ray observations to different regimes of electron energies. Such an understanding is crucial for leveraging different diagnostic methods of nonthermal electrons simultaneously to achieve a more comprehensive understanding of the electron acceleration and transport processes of solar flares.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Study of $e^+e^-\toωX(3872)$ and $γX(3872)$ from 4.66 to 4.95 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (634 additional authors not shown)
Abstract:
Using data samples with an integrated luminosity of $4.5~\text{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies ranging from 4.66 to 4.95 GeV, we study the processes of $e^+e^-\toωX(3872)$ and $e^+e^-\toγX(3872)$. With the $e^+e^-\toωX(3872)$ process, the branching fraction ratio $R\equiv\frac{\mathcal{B}(X(3872)\toγJ/ψ)}{\mathcal{B}(X(3872)\toπ^+π^- J/ψ)}$ is measured to be…
▽ More
Using data samples with an integrated luminosity of $4.5~\text{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies ranging from 4.66 to 4.95 GeV, we study the processes of $e^+e^-\toωX(3872)$ and $e^+e^-\toγX(3872)$. With the $e^+e^-\toωX(3872)$ process, the branching fraction ratio $R\equiv\frac{\mathcal{B}(X(3872)\toγJ/ψ)}{\mathcal{B}(X(3872)\toπ^+π^- J/ψ)}$ is measured to be $0.38\pm0.20_\text{stat.}\pm0.01_\text{syst.}$ ($R< 0.83$ at 90\% confidence level). In addition, we measure the ratio of the average cross section of $e^+e^-\toωX(3872)$ to $e^+e^-\toωχ_{c1}(ωχ_{c2})$ to be $σ_{ωX(3872)}/σ_{ωχ_{c1}}~(σ_{ωX(3872)}/σ_{ωχ_{c2}})=5.2\pm1.0_\text{stat.}\pm1.9_\text{syst.}~ (5.5\pm1.1_\text{stat.}\pm2.4_\text{syst.})$. Finally, we search for the process of $e^+e^-\toγX(3872)$, and no obvious signal is observed. The upper limit on the ratio of the average cross section of $e^+e^-\toγX(3872)$ to $e^+e^-\toωX(3872)$ is set as $σ_{γX(3872)}/σ_{ωX(3872)}<0.23$ at 90\% confidence level.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
Photometric Re-calibration of VPHAS+ $u$-band Photometry with the Stellar Colour Regression Method and Gaia DR3
Authors:
Bing-Qiu Chen,
Hai-Bo Yuan,
Bo-Wen Huang
Abstract:
The u band magnitude is vital for determining stellar parameters and investigating specific astronomical objects. However, flux calibration in the u band for stars in the Galactic disk presents significant challenges. In this study, we introduce a comprehensive re-calibration of $u$-band photometric magnitudes of the VPHAS+ Data Release 4 (DR4), employing the Stellar Colour Regression (SCR) techni…
▽ More
The u band magnitude is vital for determining stellar parameters and investigating specific astronomical objects. However, flux calibration in the u band for stars in the Galactic disk presents significant challenges. In this study, we introduce a comprehensive re-calibration of $u$-band photometric magnitudes of the VPHAS+ Data Release 4 (DR4), employing the Stellar Colour Regression (SCR) technique. By leveraging the expansive set of XP spectra and $G_{\rm BP}$ photometry from Gaia Data Release 3 (DR3), as well as the individual stellar extinction values provided by the literature, we have obtained precise model magnitudes of nearly 3 million stars. Our analysis identifies systematic magnitude offsets that exhibit a standard deviation of 0.063 mag across different observational visits, 0.022 mag between various CCDs, and 0.009 mag within pixel bins. We have implemented precise corrections for these observational visits, CCD chips, and pixel bins-dependent magnitude offsets. These corrections have led to a reduction in the standard deviation between the observed magnitudes and the model magnitudes from 0.088 mag to 0.065 mag, ensuring that the calibrated magnitudes are independent of stellar magnitude, colour, and extinction. The enhanced precision of these magnitudes substantially improves the quality of astrophysical research and offers substantial potential for furthering our understanding of stellar astrophysics.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
New Evidence of Binarity in Young α-Rich Turn-off and Subgiant Stars: Fast Rotation and Strong Magnetic Activity
Authors:
Jie Yu,
Luca Casagrande,
Ioana Ciucă,
Yuan-Sen Ting,
Simon J. Murphy,
Boquan Chen
Abstract:
Young α-rich (YAR) stars within the old Galactic thick disk exhibit a dual characteristic of relative youth determined with asteroseismology and abundance enhancement in α elements measured from high-resolution spectroscopy. The youth origin of YAR stars has been proposed to be binary evolution via mass transfer or stellar mergers. If that is the case, YAR stars should spin rapidly and thus be mag…
▽ More
Young α-rich (YAR) stars within the old Galactic thick disk exhibit a dual characteristic of relative youth determined with asteroseismology and abundance enhancement in α elements measured from high-resolution spectroscopy. The youth origin of YAR stars has been proposed to be binary evolution via mass transfer or stellar mergers. If that is the case, YAR stars should spin rapidly and thus be magnetically active, because they are mass and angular momentum gainers. In this study, to seek this binary footprint we select YAR stars on the main-sequence turn-off or the subgiant branch (MSTO-SGB) from APOGEE DR17, whose ages and projected rotation velocities (vsini) can be precisely measured. With APOGEE vsini and LAMOST spectra, we find that YAR stars are indeed fast rotators and magnetically active. In addition, we observe low [C/N] ratios and high Gaia RUWE in some YAR stars, suggesting that these MSTO-SGB stars probably have experienced mass transfer from red-giant companions. Our findings underscore that magnetic activity can serve as a valuable tool for probing the binary evolution for other chemically peculiar stars, such as red giants with lithium anomalies and carbon-enhanced metal-poor stars.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Authors:
Hanshi Sun,
Zhuoming Chen,
Xinyu Yang,
Yuandong Tian,
Beidi Chen
Abstract:
With large language models (LLMs) widely deployed in long content generation recently, there has emerged an increasing demand for efficient long-sequence inference support. However, key-value (KV) cache, which is stored to avoid re-computation, has emerged as a critical bottleneck by growing linearly in size with the sequence length. Due to the auto-regressive nature of LLMs, the entire KV cache w…
▽ More
With large language models (LLMs) widely deployed in long content generation recently, there has emerged an increasing demand for efficient long-sequence inference support. However, key-value (KV) cache, which is stored to avoid re-computation, has emerged as a critical bottleneck by growing linearly in size with the sequence length. Due to the auto-regressive nature of LLMs, the entire KV cache will be loaded for every generated token, resulting in low utilization of computational cores and high latency. While various compression methods for KV cache have been proposed to alleviate this issue, they suffer from degradation in generation quality. We introduce TriForce, a hierarchical speculative decoding system that is scalable to long sequence generation. This approach leverages the original model weights and dynamic sparse KV cache via retrieval as a draft model, which serves as an intermediate layer in the hierarchy and is further speculated by a smaller model to reduce its drafting latency. TriForce not only facilitates impressive speedups for Llama2-7B-128K, achieving up to 2.31$\times$ on an A100 GPU but also showcases scalability in handling even longer contexts. For the offloading setting on two RTX 4090 GPUs, TriForce achieves 0.108s/token$\unicode{x2014}$only half as slow as the auto-regressive baseline on an A100, which attains 7.78$\times$ on our optimized offloading system. Additionally, TriForce performs 4.86$\times$ than DeepSpeed-Zero-Inference on a single RTX 4090 GPU. TriForce's robustness is highlighted by its consistently outstanding performance across various temperatures. The code is available at https://github.com/Infini-AI-Lab/TriForce.
△ Less
Submitted 22 April, 2024; v1 submitted 18 April, 2024;
originally announced April 2024.